Se connecter

Bibliographies thématiques / Parallel programs / Thèses

Pour voir les autres types de publications sur ce sujet consultez le lien suivant : Parallel programs.

Thèses sur le sujet « Parallel programs »

Auteur : Grafiati

Publié le 4 juin 2021

Mis à jour le 30 mai 2022

Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres

Choisissez une source :

Consultez les 50 meilleures thèses pour votre recherche sur le sujet « Parallel programs ».

À côté de chaque source dans la liste de références il y a un bouton « Ajouter à la bibliographie ». Cliquez sur ce bouton, et nous générerons automatiquement la référence bibliographique pour la source choisie selon votre style de citation préféré : APA, MLA, Harvard, Vancouver, Chicago, etc.

Vous pouvez aussi télécharger le texte intégral de la publication scolaire au format pdf et consulter son résumé en ligne lorsque ces informations sont inclues dans les métadonnées.

Parcourez les thèses sur diverses disciplines et organisez correctement votre bibliographie.

1

Smith, Edmund. « Parallel solution of linear programs ». Thesis, University of Edinburgh, 2013. http://hdl.handle.net/1842/8833.

Texte intégral

Résumé :

The factors limiting the performance of computer software periodically undergo sudden shifts, resulting from technological progress, and these shifts can have profound implications for the design of high performance codes. At the present time, the speed with which hardware can execute a single stream of instructions has reached a plateau. It is now the number of instruction streams that may be executed concurrently which underpins estimates of compute power, and with this change, a critical limitation on the performance of software has come to be the degree to which it can be parallelised. The research in this thesis is concerned with the means by which codes for linear programming may be adapted to this new hardware. For the most part, it is codes implementing the simplex method which will be discussed, though these have typically lower performance for single solves than those implementing interior point methods. However, the ability of the simplex method to rapidly re-solve a problem makes it at present indispensable as a subroutine for mixed integer programming. The long history of the simplex method as a practical technique, with applications in many industries and government, has led to such codes reaching a great level of sophistication. It would be unexpected in a research project such as this one to match the performance of top commercial codes with many years of development behind them. The simplex codes described in this thesis are, however, able to solve real problems of small to moderate size, rather than being confined to random or otherwise artificially generated instances. The remainder of this thesis is structured as follows. The rest of this chapter gives a brief overview of the essential elements of modern parallel hardware and of the linear programming problem. Both the simplex method and interior point methods are discussed, along with some of the key algorithmic enhancements required for such systems to solve real-world problems. Some background on the parallelisation of both types of code is given. The next chapter describes two standard simplex codes designed to exploit the current generation of hardware. i6 is a parallel standard simplex solver capable of being applied to a range of real problems, and showing exceptional performance for dense, square programs. i8 is also a parallel, standard simplex solver, but now implemented for graphics processing units (GPUs).

Styles APA, Harvard, Vancouver, ISO, etc.

2

D'Paola, Oscar Naim. « Performance visualization of parallel programs ». Thesis, University of Southampton, 1995. https://eprints.soton.ac.uk/365532/.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

3

Busvine, David John. « Detecting parallel structures in functional programs ». Thesis, Heriot-Watt University, 1993. http://hdl.handle.net/10399/1415.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

4

Justo, George Roger Ribeiro. « Configuration-oriented development of parallel programs ». Thesis, University of Kent, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.333965.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

5

Mukherjee, Joy. « A Runtime Framework for Parallel Programs ». Diss., Virginia Tech, 2006. http://hdl.handle.net/10919/28756.

Texte intégral

Résumé :

This dissertation proposes the Weaves runtime framework for the execution of large scale parallel programs over lightweight intra-process threads. The goal of the Weaves framework is to help process-based legacy parallel programs exploit the scalability of threads without any modifications. The framework separates global variables used by identical, but independent, threads of legacy parallel programs without resorting to thread-based re-programming. At the same time, it also facilitates low-overhead collaboration among threads of a legacy parallel program through multi-granular selective sharing of global variables. Applications that follow the tenets of the Weaves framework can load multiple identical, but independent, copies of arbitrary object files within a single process. They can compose the runtime images of these object files in graph-like ways and run intra-process threads through them to realize various degrees of multi-granular selective sharing or separation of global variables among the threads. Using direct runtime control over the resolution of individual references to functions and variables, they can also manipulate program composition at fine granularities. Most importantly, the Weaves framework does not entail any modifications to either the source codes or the native codes of the object files. The framework is completely transparent. Results from experiments with a real-world process-based parallel application show that the framework can correctly execute a thousand parallel threads containing non-threadsafe global variables on a single machine - nearly twice as many as the traditional process-based approach can - without any code modifications. On increasing the number of machines, the application experiences super-linear speedup, which illustrates scalability. Results from another similar application, chosen from a different software area to emphasize the breadth of this research, show that the framework's facilities for low-overhead collaboration among parallel threads allows for significantly greater scales of achievable parallelism than technologies for inter-process collaboration allow. Ultimately, larger scales of parallelism enable more accurate software modeling of real-world parallel systems, such as computer networks and multi-physics natural phenomena.
Ph. D.

Styles APA, Harvard, Vancouver, ISO, etc.

6

Hinz, Peter. « Visualizing the performance of parallel programs ». Master's thesis, University of Cape Town, 1996. http://hdl.handle.net/11427/16141.

Texte intégral

Résumé :

Bibliography: pages 110-115.
The performance analysis of parallel programs is a complex task, particularly if the program has to be efficient over a wide range of parallel machines. We have designed a performance analysis system called Chiron that uses scientific visualization techniques to guide and help the user in performance analysis activities. The aim of Chiron is to give the user full control over what section of the data he/she wants to investigate in detail. Chiron uses interactive three-dimensional graphics techniques to display large amounts of data in a compact and easy to understand/ conceptualize way. The system assists in the tracking of performance bottlenecks by showing data in 10 different views and allowing the user to interact with the data. In this thesis the design and implementation of Chiron are described, and its effectiveness illustrated by means of three case studies.

Styles APA, Harvard, Vancouver, ISO, etc.

7

Hayashi, Yasushi. « Shape-based cost analysis of skeletal parallel programs ». Thesis, University of Edinburgh, 2001. http://hdl.handle.net/1842/14029.

Texte intégral

Résumé :

This work presents an automatic cost-analysis system for an implicitly parallel skeletal programming language. Although deducing interesting dynamic characteristics of parallel programs (and in particular, run time) is well known to be an intractable problem in the general case, it can be alleviated by placing restrictions upon the programs which can be expressed. By combining two research threads, the “skeletal” and “shapely” paradigms which take this route, we produce a completely automated, computation and communication sensitive cost analysis system. This builds on earlier work in the area by quantifying communication as well as computation costs, with the former being derived for the Bulk Synchronous Parallel (BSP) model. We present details of our shapely skeletal language and its BSP implementation strategy together with an account of the analysis mechanism by which program behaviour information (such as shape and cost) is statically deduced. This information can be used at compile-time to optimise a BSP implementation and to analyse computation and communication costs. The analysis has been implemented in Haskell. We consider different algorithms expressed in our language for some example problems and illustrate each BSP implementation, contrasting the analysis of their efficiency by traditional, intuitive methods with that achieved by our cost calculator. The accuracy of cost predictions by our cost calculator against the run time of real parallel programs is tested experimentally. Previous shape-based cost analysis required all elements of a vector (our nestable bulk data structure) to have the same shape. We partially relax this strict requirement on data structure regularity by introducing new shape expressions in our analysis framework. We demonstrate that this allows us to achieve the first automated analysis of a complete derivation, the well known maximum segment sum algorithm of Skillicorn and Cai.

Styles APA, Harvard, Vancouver, ISO, etc.

8

Wei, Jiesheng. « Hardware error detection in multicore parallel programs ». Thesis, University of British Columbia, 2012. http://hdl.handle.net/2429/42961.

Texte intégral

Résumé :

The scaling of Silicon devices has exacerbated the unreliability of modern computer systems, and power constraints have necessitated the involvement of software in hardware error detection. Simultaneously, the multi-core revolution has impelled software to become parallel. Therefore, there is a compelling need to protect parallel programs from hardware errors. Parallel programs’ tasks have significant similarity in control data due to the use of high-level programming models. In this thesis, we propose BlockWatch to leverage the similarity in parallel program’s control data for detecting hardware errors. BlockWatch statically extracts the similarity among different threads of a parallel program and checks the similarity at runtime. We evaluate BlockWatch on eight SPLASH-2 benchmarks to measure its performance overhead and error detection coverage. We find that BlockWatch incurs an average overhead of 15% across all programs, and provides an average SDC coverage of 97% for faults in the control data.

Styles APA, Harvard, Vancouver, ISO, etc.

9

Zhu, Yingchun 1968. « Optimizing parallel programs with dynamic data structures ». Thesis, McGill University, 2000. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=36745.

Texte intégral

Résumé :

Distributed memory parallel architectures support a memory model where some memory accesses are local, and thus inexpensive, while other memory accesses are remote, and potentially quite expensive. In order to achieve efficiency on such architectures, we need to reduce remote accesses. This is particularly challenging for applications that use dynamic data structures.
In this thesis, I present two compiler techniques to reduce the overhead of remote memory accesses for dynamic data structure based applications: locality techniques and communication optimizations. Locality techniques include a static locality analysis, which statically estimates when an indirect reference via a pointer can be safely assumed to be a local access, and dynamic locality checks, which consists of runtime tests to identify local accesses. Communication techniques include: (1) code movement to issue remote reads earlier and writes later; (2) code transformations to replace repeated/redundant remote accesses with one access; and (3) transformations to block or pipeline a group of remote requests together. Both locality and communication techniques have been implemented and incorporated into our EARTH-McCAT compiler framework, and a series of experiments have been conducted to evaluate these techniques. The experimental results show that we are able to achieve up to 26% performance improvement with each technique alone, and up to 29% performance improvement when both techniques are applied together.

Styles APA, Harvard, Vancouver, ISO, etc.

10

Grove, Duncan A. « Performance modelling of message-passing parallel programs ». Title page, contents and abstract only, 2003. http://web4.library.adelaide.edu.au/theses/09PH/09phg8832.pdf.

Texte intégral

Résumé :

This dissertation describes a new performance modelling system, called the Performance Evaluating Virtual Parallel Machine (PEVPM). It uses a novel bottom-up approach, where submodels of individual computation and communication events are dynamically constructed from data-dependencies, current contention levels and the performance distributions of low-level operations, which define performance variability in the face of contention.

Styles APA, Harvard, Vancouver, ISO, etc.

11

Simpson, David John. « Space-efficient execution of parallel functional programs ». Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape7/PQDD_0028/NQ51922.pdf.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

12

Novillo, Diego. « Analysis and optimization of explicitly parallel programs ». Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape3/PQDD_0012/NQ60007.pdf.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

13

Zhu, Yingchun. « Optimizing parallel programs with dynamic data structures ». Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape3/PQDD_0035/NQ64708.pdf.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

14

Grant-Duff, Zulena Noemi. « Systematic construction and mapping of parallel programs ». Thesis, Imperial College London, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.265009.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

15

Wilson, Gregory V. « Structuring and supporting programs on parallel computers ». Thesis, University of Edinburgh, 1992. http://hdl.handle.net/1842/12151.

Texte intégral

Résumé :

Distributed-memory multicomputers will not find acceptance outwith the specialised field of high-performance numerical applications until programs written for them using systems similar to those found on conventional uniprocessors can be run with an efficiency comparable to that achieved on those uniprocessors. This work argues that the key to constructing upwardly-compatible programming systems for multicomputers based on message passing which are both efficient and usable, and which allow effective monitoring, is to require those systems to be structured, in the same way that modern programming languages require programs to be structured. It is further argued that even a well-structured message-passing system is too low-level for most applications programming, and that some more abstract system is required. The merits of one such abstraction, called generative communciation, are considered, and suggestions made for enriching standard implementations in order to improve their usability and efficiency. Finally, it is argued that the performance of any programming system for distributed-memory multicomputers, regardless of its degree of abstraction, is largely determined by the degree to which it eliminates or avoids contention. A technique for doing this, based on opportunistic combining networks, is introduced, and its effect on performance investigated using simulations.

Styles APA, Harvard, Vancouver, ISO, etc.

16

Eaton, Anthony John. « Graphical debugging and validation of parallel programs ». Thesis, University of Liverpool, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.357386.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

17

Gadkari, Sanjay Sadanand. « Layered encapsulations for distributed and parallel programs / ». The Ohio State University, 1992. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487779914826072.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

18

Chong, Nathan. « Scalable verification techniques for data-parallel programs ». Thesis, Imperial College London, 2014. http://hdl.handle.net/10044/1/24437.

Texte intégral

Résumé :

This thesis is about scalable formal verification techniques for software. A verification technique is scalable if it is able to scale to reasoning about real (rather than synthetic or toy) programs. Scalable verification techniques are essential for practical program verifiers. In this work, we consider three key characteristics of scalability: precision, performance and automation. We explore trade-offs between these factors by developing verification techniques in the context of data-parallel programs, as exemplified by graphics processing unit (GPU) programs (called kernels). This thesis makes three original contributions to the field of program verification: 1. An empirical study of candidate-based invariant generation that explores the trade-offs between precision and performance. An invariant is a property that captures program behaviours by expressing a fact that always holds at a particular program point. The generation of invariants is critical for automatic and precise verification. Over a benchmark suite comprising 356 GPU kernels, we find that candidate-based invariant generation allows precise reasoning for 256 (72%) kernels. 2. Barrier invariants: a new abstraction for precise and scalable reasoning about data-dependent GPU kernels, an important class of kernel beyond the scope of existing techniques. Our evaluation shows that barrier invariants enable us to capture a functional specification for three distinct prefix sum implementations for problem sizes using hundreds of threads and race-freedom for a real-world stream compaction example. 3. The interval of summations: a new abstraction for precise and scalable reasoning for parallel prefix sums, an important data-parallel primitive. We give theoretical results showing that the interval of summations is, surprisingly, both sound and complete. That is, all correct prefix sums can be precisely captured by this abstraction. Our evaluation shows that the interval of summations allow us to automatically prove full functional correctness of four distinct prefix sum implementations for all power-of-two problem sizes up to 2^{20}.

Styles APA, Harvard, Vancouver, ISO, etc.

19

Bratvold, Tore Andreas. « Skeleton-based parallelisation of functional programs ». Thesis, Heriot-Watt University, 1994. http://hdl.handle.net/10399/1355.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

20

Cordova, Gabriel. « A distributed reconstruction of EKG signals ». To access this resource online via ProQuest Dissertations and Theses @ UTEP, 2008. http://0-proquest.umi.com.lib.utep.edu/login?COPT=REJTPTU0YmImSU5UPTAmVkVSPTI=&clientId=2515.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

21

Luo, Zegang. « Real time cloth modeling using parallel computing / ». View abstract or full-text, 2005. http://library.ust.hk/cgi/db/thesis.pl?MECH%202005%20LUO.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

22

Sharma, Atul. « Scheduling and granularity decisions in parallel functional programs ». Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp01/MQ38408.pdf.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

23

Nwana, Vincent Lebga. « Parallel algorithms for solving mixed integer linear programs ». Thesis, Brunel University, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.368540.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

24

George, Robert Tyler. « Static scheduling of conditional branches in parallel programs ». Thesis, Monterey, California. Naval Postgraduate School, 1996. http://hdl.handle.net/10945/9007.

Texte intégral

Résumé :

Approved for public release; distribution is unlimited
The problem of scheduling parallel program tasks on multiprocessor systems is known to be NP-complete in its general form. When non-determinism is added to the scheduling problem through loops and conditional branching, an optimal solution is even harder to obtain. The intractability of obtaining an optimal solution for the general scheduling problem has led to the introduction of a large number of scheduling heuristics, These heuristics consider many real world factors, such as communication overhead, target machine topology, and the tradeoff between exploiting the parallelism in a parallel program and the resulting scheduling overhead. We present the probabilistic merge heuristic--in which a unified schedule of all possible execution instances is generated by successively scheduling tasks in order of their execution probabilities. When a conditional task is scheduled, we first attempt to merge the task with the time slot of a previously scheduled task which is a member of a different execution instance. We have found that the merge scheduler produces schedules which are 10% faster than previous techniques. More importantly, however, we show that the probabilistic merge heuristic is significantly more scalable -- being able to schedule branch and precedence graphs which exceed 50 modes.

Styles APA, Harvard, Vancouver, ISO, etc.

25

Nakade, Radha Vi. « Verification of Task Parallel Programs Using Predictive Analysis ». BYU ScholarsArchive, 2016. https://scholarsarchive.byu.edu/etd/6176.

Texte intégral

Résumé :

Task parallel programming languages provide a way for creating asynchronous tasks that can run concurrently. The advantage of using task parallelism is that the programmer can write code that is independent of the underlying hardware. The runtime determines the number of processor cores that are available and the most efficient way to execute the tasks. When two or more concurrently executing tasks access a shared memory location and if at least one of the accesses is for writing, data race is observed in the program. Data races can introduce non-determinism in the program output making it important to have data race detection tools. To detect data races in task parallel programs, a new Sound and Complete technique based on computation graphs is presented in this work. The data race detection algorithm runs in O(N2) time where N is number of nodes in the graph. A computation graph is a directed acyclic graph that represents the execution of the program. For detecting data races, the computation graph stores shared heap locations accessed by the tasks. An algorithm for creating computation graphs augmented with memory locations accessed by the tasks is also described here. This algorithm runs in O(N) time where N is the number of operations performed in the tasks. This work also presents an implementation of this technique for the Java implementation of the Habanero programming model. The results of this data race detector are compared to Java Pathfinder's precise race detector extension and permission regions based race detector extension. The results show a significant reduction in the time required for data race detection using this technique.

Styles APA, Harvard, Vancouver, ISO, etc.

26

Sanz-Marco, Vicent. « Fault tolerance for stream programs on parallel platforms ». Thesis, University of Hertfordshire, 2015. http://hdl.handle.net/2299/17110.

Texte intégral

Résumé :

A distributed system is defined as a collection of autonomous computers connected by a network, and with the appropriate distributed software for the system to be seen by users as a single entity capable of providing computing facilities. Distributed systems with centralised control have a distinguished control node, called leader node. The main role of a leader node is to distribute and manage shared resources in a resource-efficient manner. A distributed system with centralised control can use stream processing networks for communication. In a stream processing system, applications typically act as continuous queries, ingesting data continuously, analyzing and correlating the data, and generating a stream of results. Fault tolerance is the ability of a system to process the information, even if it happens any failure or anomaly in the system. Fault tolerance has become an important requirement for distributed systems, due to the possibility of failure has currently risen to the increase in number of nodes and the runtime of applications in distributed system. Therefore, to resolve this problem, it is important to add fault tolerance mechanisms order to provide the internal capacity to preserve the execution of the tasks despite the occurrence of faults. If the leader on a centralised control system fails, it is necessary to elect a new leader. While leader election has received a lot of attention in message-passing systems, very few solutions have been proposed for shared memory systems, as we propose. In addition, rollback-recovery strategies are important fault tolerance mechanisms for distributed systems, since that it is based on storing information into a stable storage in failure-free state and when a failure affects a node, the system uses the information stored to recover the state of the node before the failure appears. In this thesis, we are focused on creating two fault tolerance mechanisms for distributed systems with centralised control that uses stream processing for communication. These two mechanism created are leader election and log-based rollback-recovery, implemented using LPEL. The leader election method proposed is based on an atomic Compare-And-Swap (CAS) instruction, which is directly available on many processors. Our leader election method works with idle nodes, meaning that only the non-busy nodes compete to become the new leader while the busy nodes can continue with their tasks and later update their leader reference. Furthermore, this leader election method has short completion time and low space complexity. The log-based rollback-recovery method proposed for distributed systems with stream processing networks is a novel approach that is free from domino effect and does not generate orphan messages accomplishing the always-no-orphans consistency condition. Additionally, this approach has lower overhead impact into the system compared to other approaches, and it is a mechanism that provides scalability, because it is insensitive to the number of nodes in the system.

Styles APA, Harvard, Vancouver, ISO, etc.

27

McPherson, Andrew John. « Ensuring performance and correctness for legacy parallel programs ». Thesis, University of Edinburgh, 2015. http://hdl.handle.net/1842/14179.

Texte intégral

Résumé :

Modern computers are based on manycore architectures, with multiple processors on a single silicon chip. In this environment programmers are required to make use of parallelism to fully exploit the available cores. This can either be within a single chip, normally using shared-memory programming or at a larger scale on a cluster of chips, normally using message-passing. Legacy programs written using either paradigm face issues when run on modern manycore architectures. In message-passing the problem is performance related, with clusters based on manycores introducing necessarily tiered topologies that unaware programs may not fully exploit. In shared-memory it is a correctness problem, with modern systems employing more relaxed memory consistency models, on which legacy programs were not designed to operate. Solutions to this correctness problem exist, but introduce a performance problem as they are necessarily conservative. This thesis focuses on addressing these problems, largely through compile-time analysis and transformation. The first technique proposed is a method for statically determining the communication graph of an MPI program. This is then used to optimise process placement in a cluster of CMPs. Using the 64-process versions of the NAS parallel benchmarks, we see an average of 28% (7%) improvement in communication localisation over by-rank scheduling for 8-core (12-core) CMP-based clusters, representing the maximum possible improvement. Secondly, we move into the shared-memory paradigm, identifying and proving necessary conditions for a read to be an acquire. This can be used to improve solutions in several application areas, two of which we then explore. We apply our acquire signatures to the problem of fence placement for legacy well-synchronised programs. We find that applying our signatures, we can reduce the number of fences placed by an average of 62%, leading to a speedup of up to 2.64x over an existing practical technique. Finally, we develop a dynamic synchronisation detection tool known as SyncDetect. This proof of concept tool leverages our acquire signatures to more accurately detect ad hoc synchronisations in running programs and provides the programmer with a report of their locations in the source code. The tool aims to assist programmers with the notoriously difficult problem of parallel debugging and in manually porting legacy programs to more modern (relaxed) memory consistency models.

Styles APA, Harvard, Vancouver, ISO, etc.

28

Grewe, Dominik. « Mapping parallel programs to heterogeneous multi-core systems ». Thesis, University of Edinburgh, 2014. http://hdl.handle.net/1842/8852.

Texte intégral

Résumé :

Heterogeneous computer systems are ubiquitous in all areas of computing, from mobile to high-performance computing. They promise to deliver increased performance at lower energy cost than purely homogeneous, CPU-based systems. In recent years GPU-based heterogeneous systems have become increasingly popular. They combine a programmable GPU with a multi-core CPU. GPUs have become flexible enough to not only handle graphics workloads but also various kinds of general-purpose algorithms. They are thus used as a coprocessor or accelerator alongside the CPU. Developing applications for GPU-based heterogeneous systems involves several challenges. Firstly, not all algorithms are equally suited for GPU computing. It is thus important to carefully map the tasks of an application to the most suitable processor in a system. Secondly, current frameworks for heterogeneous computing, such as OpenCL, are low-level, requiring a thorough understanding of the hardware by the programmer. This high barrier to entry could be lowered by automatically generating and tuning this code from a high-level and thus more user-friendly programming language. Both challenges are addressed in this thesis. For the task mapping problem a machine learning-based approach is presented in this thesis. It combines static features of the program code with runtime information on input sizes to predict the optimal mapping of OpenCL kernels. This approach is further extended to also take contention on the GPU into account. Both methods are able to outperform competing mapping approaches by a significant margin. Furthermore, this thesis develops a method for targeting GPU-based heterogeneous systems from OpenMP, a directive-based framework for parallel computing. OpenMP programs are translated to OpenCL and optimized for GPU performance. At runtime a predictive model decides whether to execute the original OpenMP code on the CPU or the generated OpenCL code on the GPU. This approach is shown to outperform both a competing approach as well as hand-tuned code.

Styles APA, Harvard, Vancouver, ISO, etc.

29

Lourenço, João. « A debugging engine for parallel and distributed programs ». Doctoral thesis, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2003. http://hdl.handle.net/10362/2314.

Texte intégral

Résumé :

Dissertação apresentada para a obtenção do Grau de Doutor em Informática pela Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia.
In the last decade a considerable amount of research work has focused on distributed debugging, one of the crucial fields in the parallel software development cycle. The productivity of the software development process strongly depends on the adequate definition of what debugging tools should be provided, and what debugging methodologies and functionalities should these tools support. The work described in this dissertation was initiated in 1995, in the context of two research projects, the SEPP (Software Engineering for Parallel Processing) and HPCTI (High-Performance Computing Tools for Industry), both sponsored by the European Union in the Copernicus programme, which aimed at the design and implementation of an integrated parallel software development environment. In the context of these projects, two independent toolsets have been developed, the GRADE and EDPEPPS parallel software development environments. Our contribution to these projects was in the debugging support. We have designed a debugging engine and developed a prototype, which was integrated the both toolsets (it was the only tool developed in the context of the SEPP and HPCTI projects which achieved such a result). Even after the closing of those research projects, further research work on distributed debugger has been carried on, which conducted to the re-design and re-implementation of the debugging engine. This dissertation describes the debugging engine according to its most up-to-date design and implementation stages. It also reposts some of the experimentalworkmade with both the initial and the current implementations, and how it contributed to validate the design and implementations of the debugging engine.

Styles APA, Harvard, Vancouver, ISO, etc.

30

Aronsson, Peter. « Automatic Parallelization of Equation-Based Simulation Programs ». Doctoral thesis, Linköping : Department of Computer and Information Science, Linköping University, 2006. http://www.bibl.liu.se/liupubl/disp/disp2006/tek1022s.pdf.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

31

Korndorfer, Jonas Henrique Muller. « High performance trace replay event simulation of parallel programs behavior ». reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2016. http://hdl.handle.net/10183/149310.

Texte intégral

Résumé :

Sistemas modernos de alto desempenho compreendem milhares a milhões de unidades de processamento. O desenvolvimento de uma aplicação paralela escalável para tais sistemas depende de um mapeamento preciso da utilização recursos disponíveis. A identificação de recursos não utilizados e os gargalos de processamento requere uma boa análise desempenho. A observação de rastros de execução é uma das técnicas mais úteis para esse fim. Infelizmente, o rastreamento muitas vezes produz grandes arquivos de rastro, atingindo facilmente gigabytes de dados brutos. Portanto ferramentas para análise de desempenho baseadas em rastros precisam processar esses dados para uma forma legível e serem eficientes a fim de permitirem uma análise rápida e útil. A maioria das ferramentas existentes, tais como Vampir, Scalasca e TAU, focam no processamento de formatos de rastro com semântica associada, geralmente definidos para lidar com programas desenvolvidos com bibliotecas populares como OpenMP, MPI e CUDA. No entanto, nem todas aplicações paralelas utilizam essas bibliotecas e assim, algumas vezes, essas ferramentas podem não ser úteis. Felizmente existem outras ferramentas que apresentam uma abordagem mais dinâmica, utilizando um formato de arquivo de rastro aberto e sem semântica específica. Algumas dessas ferramentas são Paraver, Pajé e PajeNG. Por outro lado, ser genérico tem custo e assim tais ferramentas frequentemente apresentam baixo desempenho para o processamento de grandes rastros. O objetivo deste trabalho é apresentar otimizações feitas para o conjunto de ferramentas PajeNG. São apresentados o desenvolvimento de um estratégia de paralelização para o PajeNG e uma análise de desempenho para demonstrar nossos ganhos. O PajeNG original funciona sequencialmente, processando um único arquivo de rastro que contém todos os dados do programa rastreado. Desta forma, a escalabilidade da ferramenta fica muito limitada pela leitura dos dados. Nossa estratégia divide o arquivo em pedaços permitindo seu processamento em paralelo. O método desenvolvido para separar os rastros permite que cada pedaço execute em um fluxo de execução separado. Nossos experimentos foram executados em máquinas com acesso não uniforme à memória (NUMA).Aanálise de desempenho desenvolvida considera vários aspectos como localidade das threads, o número de fluxos, tipo de disco e também comparações entre os nós NUMA. Os resultados obtidos são muito promissores, escalando o PajeNG cerca de oito a onze vezes, dependendo da máquina.
Modern high performance systems comprise thousands to millions of processing units. The development of a scalable parallel application for such systems depends on an accurate mapping of application processes on top of available resources. The identification of unused resources and potential processing bottlenecks requires good performance analysis. The trace-based observation of a parallel program execution is one of the most helpful techniques for such purpose. Unfortunately, tracing often produces large trace files, easily reaching the order of gigabytes of raw data. Therefore tracebased performance analysis tools have to process such data to a human readable way and also should be efficient to allow an useful analysis. Most of the existing tools such as Vampir, Scalasca, TAU have focus on the processing of trace formats with a fixed and well-defined semantic. The corresponding file format are usually proposed to handle applications developed using popular libraries like OpenMP, MPI, and CUDA. However, not all parallel applications use such libraries and so, sometimes, these tools cannot be useful. Fortunately, there are other tools that present a more dynamic approach by using an open trace file format without specific semantic. Some of these tools are the Paraver, Pajé and PajeNG. However the fact of being generic comes with a cost. These tools very frequently present low performance for the processing of large traces. The objective of this work is to present performance optimizations made in the PajeNG tool-set. This comprises the development of a parallelization strategy and a performance analysis to set our gains. The original PajeNG works sequentially by processing a single trace file with all data from the observed application. This way, the scalability of the tool is very limited by the reading of the trace file. Our strategy splits such file to process several pieces in parallel. The created method to split the traces allows the processing of each piece in each thread. The experiments were executed in non-uniform memory access (NUMA) machines. The performance analysis considers several aspects like threads locality, number of flows, disk type and also comparisons between the NUMA nodes. The obtained results are very promising, scaling up the PajeNG about eight to eleven times depending on the machine.

Styles APA, Harvard, Vancouver, ISO, etc.

32

Castro, David. « Structured arrows : a type-based framework for structured parallelism ». Thesis, University of St Andrews, 2018. http://hdl.handle.net/10023/16093.

Texte intégral

Résumé :

This thesis deals with the important problem of parallelising sequential code. Despite the importance of parallelism in modern computing, writing parallel software still relies on many low-level and often error-prone approaches. These low-level approaches can lead to serious execution problems such as deadlocks and race conditions. Due to the non-deterministic behaviour of most parallel programs, testing parallel software can be both tedious and time-consuming. A way of providing guarantees of correctness for parallel programs would therefore provide significant benefit. Moreover, even if we ignore the problem of correctness, achieving good speedups is not straightforward, since this generally involves rewriting a program to consider a (possibly large) number of alternative parallelisations. This thesis argues that new languages and frameworks are needed. These language and frameworks must not only support high-level parallel programming constructs, but must also provide predictable cost models for these parallel constructs. Moreover, they need to be built around solid, well-understood theories that ensure that: (a) changes to the source code will not change the functional behaviour of a program, and (b) the speedup obtained by doing the necessary changes is predictable. Algorithmic skeletons are parametric implementations of common patterns of parallelism that provide good abstractions for creating new high-level languages, and also support frameworks for parallel computing that satisfy the correctness and predictability requirements that we require. This thesis presents a new type-based framework, based on the connection between structured parallelism and structured patterns of recursion, that provides parallel structures as type abstractions that can be used to statically parallelise a program. Specifically, this thesis exploits hylomorphisms as a single, unifying construct to represent the functional behaviour of parallel programs, and to perform correct code rewritings between alternative parallel implementations, represented as algorithmic skeletons. This thesis also defines a mechanism for deriving cost models for parallel constructs from a queue-based operational semantics. In this way, we can provide strong static guarantees about the correctness of a parallel program, while simultaneously achieving predictable speedups.

Styles APA, Harvard, Vancouver, ISO, etc.

33

Kondo, Boubacar. « An investigation of parallel algorithms developed for graph problems and their implementation on parallel computers ». Virtual Press, 1991. http://liblink.bsu.edu/uhtbin/catkey/770951.

Texte intégral

Résumé :

With the recent development of VLSI (Very Large Scale Integration) technology, research has increased considerably on the development of efficient parallel algorithms for solutions of practical graph problems. Varieties of algorithms have already been implemented on different models of parallel computers. But not too much is known yet about the question of which model of parallel computer will efficiently and definitely fit every graph problem. In this investigation the study will focus on a comparative analysis of speedup and efficiency of parallel algorithms with parallel model of computation, and with respect to some sequential algorithms.
Department of Computer Science

Styles APA, Harvard, Vancouver, ISO, etc.

34

Lin, Jason. « WebSearch : A configurable parallel multi-search web browser ». CSUSB ScholarWorks, 1999. https://scholarworks.lib.csusb.edu/etd-project/1948.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

35

DUARTE, Rafael Machado. « Parallelizing Java programs using transformation laws ». Universidade Federal de Pernambuco, 2008. https://repositorio.ufpe.br/handle/123456789/2357.

Texte intégral

Résumé :

Made available in DSpace on 2014-06-12T15:57:17Z (GMT). No. of bitstreams: 2 arquivo3161_1.pdf: 1109714 bytes, checksum: 63bb826b538cafab9528cb9cb8274bdc (MD5) license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2008
Conselho Nacional de Desenvolvimento Científico e Tecnológico
Com a adoção pelo mercado dos processadores de nucleos multiplos, o uso de threads em Java se torna cada vez mais proveitoso. O desenvolvimento de sistemas paralelos e, entretanto, uma tarefa que poucos desenvolvedores estão capacitados a enfrentar. Dado esse contexto, foi desenvolvida uma abordagem de paralelizaçaao de programas java baseada em leis de transformação, com o intuito de facilitar esse processo e permitir uma paralelização sistemática. O primeiro passo da abordagem utiliza leis de transformação para converter um programa Java em uma forma normal que utiliza um conjunto restrito de recursos da linguagem. Neste passo, foram definidas leis de transformação adaptadas de trabalhos anteriores, assim como novas leis foram propostas. A partir de um programa na forma normal, são utilizadas regras de transformação focadas em introduzir paralelismo. Após a aplicação dessas regras de acordo com a estretégia desenvolvida, um programa paralelo e produzido. Dois casos de estudo foram realizados para validar a abordagem: calculo de series de Fourier e o algoritmo de criptografia IDEA. Ambos códigos foram obtidos do Java Grande Benchmark Suite. A execução dos estudos de caso comprova o êxito da abordagem em melhorar a performance do cóodigo original

Styles APA, Harvard, Vancouver, ISO, etc.

36

Aksenov, Vitalii. « Synchronization costs in parallel programs and concurrent data structures ». Thesis, Sorbonne Paris Cité, 2018. http://www.theses.fr/2018USPCC025/document.

Texte intégral

Résumé :

Pour utiliser la puissance de calcul des ordinateurs modernes, nous devons écrire des programmes concurrents. L’écriture de programme concurrent efficace est notoirement difficile, principalement en raison de la nécessité de gérer les coûts de synchronisation. Dans cette thèse, nous nous concentrons sur les coûts de synchronisation dans les programmes parallèles et les structures de données concurrentes.D’abord, nous présentons une nouvelle technique de contrôle de la granularité pour les programmes parallèles conçus pour un environnement de multi-threading dynamique. Ensuite, dans le contexte des structures de données concurrentes, nous considérons la notion d’optimalité de concurrence (concurrency-optimality) et proposons la première implémentation concurrence-optimal d’un arbre binaire de recherche qui, intuitivement, accepte un ordonnancement concurrent si et seulement si l’ordonnancement est correct. Nous proposons aussi la combinaison parallèle (parallel combining), une technique qui permet l’implémentation efficace des structures de données concurrences à partir de leur version parallèle par lots. Nous validons les techniques proposées par une évaluation expérimentale, qui montre des performances supérieures ou comparables à celles des algorithmes de l’état de l’art.Dans une perspective plus formelle, nous considérons le phénomène d’assistance (helping) dans des structures de données concurrentes. On observe un phénomène d’assistance quand l’ordre d’une opération d’un processus dans une trace linéarisée est fixée par une étape d’un autre processus. Nous montrons qu’aucune implémentation sans attente (wait-free) linéarisable d’une pile utilisant les primitives read, write, compare&swap et fetch&add ne peut être “sans assistance” (help-free), corrigeant une erreur dans une preuve antérieure de Censor-Hillel et al. Finalement, nous proposons une façon simple de prédire analytiquement le débit (throughput) des structures de données basées sur des verrous à gros grains
To use the computational power of modern computing machines, we have to deal with concurrent programs. Writing efficient concurrent programs is notoriously difficult, primarily due to the need of harnessing synchronization costs. In this thesis, we focus on synchronization costs in parallel programs and concurrent data structures.First, we present a novel granularity control technique for parallel programs designed for the dynamic multithreading environment. Then in the context of concurrent data structures, we consider the notion of concurrency-optimality and propose the first implementation of a concurrency-optimal binary search tree that, intuitively, accepts a concurrent schedule if and only if the schedule is correct. Also, we propose parallel combining, a technique that enables efficient implementations of concurrent data structures from their parallel batched counterparts. We validate the proposed techniques via experimental evaluations showing superior or comparable performance with respect to state-of-the-art algorithms.From a more formal perspective, we consider the phenomenon of helping in concurrent data structures. Intuitively, helping is observed when the order of some operation in a linearization is fixed by a step of another process. We show that no wait-free linearizable implementation of stack using read, write, compare&swap and fetch&add primitives can be help-free, correcting a mistake in an earlier proof by Censor-Hillel et al. Finally, we propose a simple way to analytically predict the throughput of data structures based on coarse-grained locking

Styles APA, Harvard, Vancouver, ISO, etc.

37

Bischof, Holger [Verfasser]. « Systematic Development of Parallel Programs Using Skeletons / Holger Bischof ». Aachen : Shaker, 2005. http://d-nb.info/1186580348/34.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

38

Zhang, Yuan. « Static analyses and optimizations for parallel programs with synchronization ». Access to citation, abstract and download form provided by ProQuest Information and Learning Company ; downloadable PDF file, 169 p, 2008. http://proquest.umi.com/pqdweb?did=1601517931&sid=6&Fmt=2&clientId=8331&RQT=309&VName=PQD.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

39

Karl, Holger. « Responsive Execution of Parallel Programs in Distributed Computing Environments ». Doctoral thesis, Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II, 1999. http://dx.doi.org/10.18452/14455.

Texte intégral

Résumé :

Vernetzte Standardarbeitsplatzrechner (sog. Cluster) sind eine attraktive Umgebung zur Ausf"uhrung paralleler Programme; f"ur einige Anwendungsgebiete bestehen jedoch noch immer ungel"oste Probleme. Ein solches Problem ist die Verl"asslichkeit und Rechtzeitigkeit der Programmausf"uhrung: In vielen Anwendungen ist es wichtig, sich auf die rechtzeitige Fertigstellung eines Programms verlassen zu k"onnen. Mechanismen zur Kombination dieser Eigenschaften f"ur parallele Programme in verteilten Rechenumgebungen sind das Hauptanliegen dieser Arbeit. Zur Behandlung dieses Anliegens ist eine gemeinsame Metrik f"ur Verl"asslichkeit und Rechtzeitigkeit notwendig. Eine solche Metrik ist die Responsivit"at, die f"ur die Bed"urfnisse dieser Arbeit verfeinert wird. Als Fallstudie werden Calypso und Charlotte, zwei Systeme zur parallelen Programmierung, im Hinblick auf Responsivit"at untersucht und auf mehreren Abstraktionsebenen werden Ansatzpunkte zur Verbesserung ihrer Responsivit"at identifiziert. L"osungen f"ur diese Ansatzpunkte werden zu allgemeineren Mechanismen f"ur (parallele) responsive Dienste erweitert. Im Einzelnen handelt es sich um 1. eine Analyse der Responsivit"at von Calypsos ``eager scheduling'' (ein Verfahren zur Lastbalancierung und Fehlermaskierung), 2. die Behebung eines ``single point of failure,'' zum einen durch eine Responsivit"atsanalyse von Checkpointing, zum anderen durch ein auf Standardschnittstellen basierendes System zur Replikation bestehender Software, 3. ein Verfahren zur garantierten Ressourcenzuteilung f"ur parallele Programme und 4.die Einbeziehung semantischer Information "uber das Kommunikationsmuster eines Programms in dessen Ausf"uhrung zur Verbesserung der Leistungsf"ahigkeit. Die vorgeschlagenen Mechanismen sind kombinierbar und f"ur den Einsatz in Standardsystemen geeignet. Analyse und Experimente zeigen, dass diese Mechanismen die Responsivit"at passender Anwendungen verbessern.
Clusters of standard workstations have been shown to be an attractive environment for parallel computing. However, there remain unsolved problems to make them suitable to some application scenarios. One of these problems is a dependable and timely program execution: There are many applications in which a program should be successfully completed at a predictable point of time. Mechanisms to combine the properties of both dependable and timely execution of parallel programs in distributed computing environments are the main objective of this dissertation. Addressing these properties requires a joint metric for dependability and timeliness. Responsiveness is such a metric; it is refined for the purposes of this work. As a case study, Calypso and Charlotte, two parallel programming systems, are analyzed and their shortcomings on several abstraction levels with regard to responsiveness are identified. Solutions for them are presented and generalized, resulting in widely applicable mechanisms for (parallel) responsive services. Specifically, these solutions are: 1) a responsiveness analysis of Calypso's eager scheduling (a mechanism for load balancing and fault masking), 2) ameliorating a single point of failure by a responsiveness analysis of checkpointing and by a standard interface-based system for replication of legacy software, 3) managing resources in a way suitable for parallel programs, and 4) using semantical information about the communication pattern of a program to improve its performance. All proposed mechanisms can be combined and are suitable for use in standard environments. It is shown by analysis and experiments that these mechanisms improve the responsiveness of eligible applications.

Styles APA, Harvard, Vancouver, ISO, etc.

40

Railing, Brian Paul. « Collecting and representing parallel programs with high performance instrumentation ». Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54431.

Texte intégral

Résumé :

Computer architecture has looming challenges with finding program parallelism, process technology limits, and limited power budget. To navigate these challenges, a deeper understanding of parallel programs is required. I will discuss the task graph representation and how it enables programmers and compiler optimizations to understand and exploit dynamic aspects of the program. I will present Contech, which is a high performance framework for generating dynamic task graphs from arbitrary parallel programs. The Contech framework supports a variety of languages and parallelization libraries, and has been tested on both x86 and ARM. I will demonstrate how this framework encompasses a diversity of program analyses, particularly by modeling a dynamically reconfigurable, heterogeneous multi-core processor.

Styles APA, Harvard, Vancouver, ISO, etc.

41

Tang, Dezheng. « Mapping Programs to Parallel Architectures in the Real World ». PDXScholar, 1992. https://pdxscholar.library.pdx.edu/open_access_etds/4534.

Texte intégral

Résumé :

Mapping an application program to a parallel architecture can be described as a multidimensional optimization problem. To simplify the problem, we divide the overall mapping process into three sequential substeps: partitioning, allocating, and scheduling, with each step using a few details of the program and architecture description. Due to the difficulty in accurately describing the program and architecture and the fact that each substep uses incomplete information, inaccuracy is pervasive in the real-world mapping process. We hypothesize that the inaccuracy and the use of suboptimal, heuristic mapping methods may greatly affect the mapping or submapping performance and lead to a non-optimal solution. We do not discard the typical approach used by most researchers in which total execution time or speedup is the criterion to evaluate the quality of the mapping. However, we improve on this approach by including the effects of inaccuracy. We believe that, due to the presence of inaccuracy in the mapping process, investigating the impact of inaccuracy on the mapping quality is crucial to achieving good mappings. The motivation of this work is to identify the various inaccuracies during the mapping procedure and explore the sensitivity of mapping quality to the inaccurate parameters. To conduct the sensitivity examination, the Global Cluster partitioning algorithm and some models were used. The models use some program and architecture characteristics, or lower-level meters, to characterize the mapping solution space. The algorithm searches the solution space and makes the decision based on the information provided by the models. The experiments were implemented on a UNIX LAN of Sun workstations for different data flow graphs. The graphs use three parallel programming paradigms: fine grained, coarse-grained, and pipelined styles, to represent some high-level application programs: vector inner product calculation, matrix multiplication, and Gaussian elimination respectively. The experimental results show that varying system behavior affects the accuracy of lower-level meters, and the quality of the mapping algorithm is very sensitive to the inaccuracies.

Styles APA, Harvard, Vancouver, ISO, etc.

42

Polzin, Dieter Wilhelm. « Visualizing the memory performance of parallel programs with Chiron ». Master's thesis, University of Cape Town, 1996. http://hdl.handle.net/11427/9283.

Texte intégral

Résumé :

Bibliography: leaves 78-81.
This thesis describes Chiron, visualization system which helps programmers detect memory system bottlenecks in their shared-memory parallel applications. Chiron is different from most other performance debugging tools in that it uses three-dimensional graphics techniques to display vast amounts of memory-performance data. Both code-and data-oriented information can be presented in several views. These views have been designed to help the user detect problems which cause coherence interference or replacement interference. Chiron’s interactive user-interface enables the user to manipulate the views and home in on features which indicate memory system bottlenecks. The visualized data can be augmented with more detailed numerical and correlations between the separate views can be displayed. The effectiveness of Chiron is illustrated in this thesis by means of three case studies.

Styles APA, Harvard, Vancouver, ISO, etc.

43

Farooq, Mohammad Habibur Rahman &amp Qaisar. « Performance Prediction of Parallel Programs in a Linux Environment ». Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-1143.

Texte intégral

Résumé :

Context. Today’s parallel systems are widely used in different computational tasks. Developing parallel programs to make maximum use of the computing power of parallel systems is tricky and efficient tuning of parallel programs is often very hard. Objectives. In this study we present a performance prediction and visualization tool named VPPB for a Linux environment, which had already been introduced by Broberg et.al, [1] for a Solaris2.x environment. VPPB shows the predicted behavior of a multithreaded program using any number of processors and the behavior is shown on two different graphs. The prediction is based on a monitored uni-processor execution. Methods. An experimental evaluation was carried out to validate the prediction reliability of the developed tool. Results. Validation of prediction is conducted, using an Intel multiprocessor with 8 processors and PARSEC 2.0 benchmark suite application programs. The validation shows that the speed-up predictions are +/-7% of a real execution. Conclusions. The experimentation of the VPPB tool showed that the prediction of VPPB is reliable and the incurred overhead into the application programs is low.
contact: +46(0)736368336

Styles APA, Harvard, Vancouver, ISO, etc.

44

Florez-Larrahondo, German. « A trusted environment for MPI programs ». Master's thesis, Mississippi State : Mississippi State University, 2002. http://library.msstate.edu/etd/show.asp?etd=etd-10172002-103135.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

45

Kim, Minjang. « Dynamic program analysis algorithms to assist parallelization ». Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/45758.

Texte intégral

Résumé :

All market-leading processor vendors have started to pursue multicore processors as an alternative to high-frequency single-core processors for better energy and power efficiency. This transition to multicore processors no longer provides the free performance gain enabled by increased clock frequency for programmers. Parallelization of existing serial programs has become the most powerful approach to improving application performance. Not surprisingly, parallel programming is still extremely difficult for many programmers mainly because thinking in parallel is simply beyond the human perception. However, we believe that software tools based on advanced analyses can significantly reduce this parallelization burden. Much active research and many tools exist for already parallelized programs such as finding concurrency bugs. Instead we focus on program analysis algorithms that assist the actual parallelization steps: (1) finding parallelization candidates, (2) understanding the parallelizability and profits of the candidates, and (3) writing parallel code. A few commercial tools are introduced for these steps. A number of researchers have proposed various methodologies and techniques to assist parallelization. However, many weaknesses and limitations still exist. In order to assist the parallelization steps more effectively and efficiently, this dissertation proposes Prospector, which consists of several new and enhanced program analysis algorithms. First, an efficient loop profiling algorithm is implemented. Frequently executed loop can be candidates for profitable parallelization targets. The detailed execution profiling for loops provides a guide for selecting initial parallelization targets. Second, an efficient and rich data-dependence profiling algorithm is presented. Data dependence is the most essential factor that determines parallelizability. Prospector exploits dynamic data-dependence profiling, which is an alternative and complementary approach to traditional static-only analyses. However, even state-of-the-art dynamic dependence analysis algorithms can only successfully profile a program with a small memory footprint. Prospector introduces an efficient data-dependence profiling algorithm to support large programs and inputs as well as provides highly detailed profiling information. Third, a new speedup prediction algorithm is proposed. Although the loop profiling can give a qualitative estimate of the expected profit, obtaining accurate speedup estimates needs more sophisticated analysis. Prospector introduces a new dynamic emulation method to predict parallel speedups from annotated serial code. Prospector also provides a memory performance model to predict speedup saturation due to increased memory traffic. Compared to the latest related work, Prospector significantly improves both prediction accuracy and coverage. Finally, Prospector provides algorithms that extract hidden parallelism and advice on writing parallel code. We present a number of case studies how Prospector assists manual parallelization in particular cases including privatization, reduction, mutex, and pipelining.

Styles APA, Harvard, Vancouver, ISO, etc.

46

Collins, Alexander James. « Cooperative auto-tuning of parallel skeletons ». Thesis, University of Edinburgh, 2015. http://hdl.handle.net/1842/15791.

Texte intégral

Résumé :

Improving program performance through the use of multiple homogeneous processing elements, or cores, is common-place. However, these architectures increase the complexity required at the software level. Existing work is focused on optimising programs that run in isolation on these systems, but ignores the fact that, in reality, these systems run multiple parallel programs concurrently with programs competing for system resources. In order to improve performance in this shared environment, cooperative tuning of multiple, concurrently running parallel programs is required. Moreover, the set of programs running on the system – the system workload – is dynamic and rapidly changing. This makes cooperative tuning a challenge, as it must react rapidly to changes in the system workload. This thesis explores the scope for performance improvement from cooperatively tuning skeleton parallel programs, and techniques that can be used to cooperatively auto-tune parallel programs. Parallel skeletons provide a clear separation between algorithm description and implementation, and provide tuning knobs that the system can use to make high-level changes to a programs implementation. This work is in three parts: (i) how many threads should be allocated to each program running on the system, (ii) on which cores should a programs threads be executed and (iii) what values should be chosen for high-level parameters of the parallel skeletons. We demonstrate that significant performance improvements are available in each of these areas, compared to the current state-of-the-art.

Styles APA, Harvard, Vancouver, ISO, etc.

47

Leth, Lone. « Functional programs as reconfigurable networks of communicating processes ». Thesis, Imperial College London, 1991. http://hdl.handle.net/10044/1/46884.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

48

Jones, Philip E. C. « Common subexpression detection in dataflow programs / ». Title page, contents and summary only, 1989. http://web4.library.adelaide.edu.au/theses/09SM/09smj78.pdf.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

49

Junaidu, Sahalu B. « A parallel functional language compiler for message-passing multicomputers ». Thesis, University of St Andrews, 1998. http://hdl.handle.net/10023/13450.

Texte intégral

Résumé :

The research presented in this thesis is about the design and implementation of Naira, a parallel, parallelising compiler for a rich, purely functional programming language. The source language of the compiler is a subset of Haskell 1.2. The front end of Naira is written entirely in the Haskell subset being compiled. Naira has been successfully parallelised and it is the largest successfully parallelised Haskell program having achieved good absolute speedups on a network of SUN workstations. Having the same basic structure as other production compilers of functional languages, Naira's parallelisation technology should carry forward to other functional language compilers. The back end of Naira is written in C and generates parallel code in the C language which is envisioned to be run on distributed-memory machines. The code generator is based on a novel compilation scheme specified using a restricted form of Milner's 7r-calculus which achieves asynchronous communication. We present the first working implementation of this scheme on distributed-memory message-passing multicomputers with split-phase transactions. Simulated assessment of the generated parallel code indicates good parallel behaviour. Parallelism is introduced using explicit, advisory user annotations in the source' program and there are two major aspects of the use of annotations in the compiler. First, the front end of the compiler is parallelised so as to improve its efficiency at compilation time when it is compiling input programs. Secondly, the input programs to the compiler can themselves contain annotations based on which the compiler generates the multi-threaded parallel code. These, therefore, make Naira, unusually and uniquely, both a parallel and a parallelising compiler. We adopt a medium-grained approach to granularity where function applications form the unit of parallelism and load distribution. We have experimented with two different task distribution strategies, deterministic and random, and have also experimented with thread-based and quantum- based scheduling policies. Our experiments show that there is little efficiency difference for regular programs but the quantum-based scheduler is the best in programs with irregular parallelism. The compiler has been successfully built, parallelised and assessed using both idealised and realistic measurement tools: we obtained significant compilation speed-ups on a variety of simulated parallel architectures. The simulated results are supported by the best results obtained on real hardware for such a large program: we measured an absolute speedup of 2.5 on a network of 5 SUN workstations. The compiler has also been shown to have good parallelising potential, based on popular test programs. Results of assessing Naira's generated unoptimised parallel code are comparable to those produced by other successful parallel implementation projects.

Styles APA, Harvard, Vancouver, ISO, etc.

50

Broberg, Magnus. « Performance Prediction and Improvement Techniques for Parallel Programs in Multiprocessors ». Doctoral thesis, Karlskrona : Department of Software Engineering and Computer Science, Blekinge Institute of Technology, 2002. http://www.bth.se/fou/forskinfo.nsf/01f1d3898cbbd490c12568160037fb62/2bf3ca6a32368b72c1256b98003d7466!OpenDocument.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Nous offrons des réductions sur tous les plans premium pour les auteurs dont les œuvres sont incluses dans des sélections littéraires thématiques. Contactez-nous pour obtenir un code promo unique!