To see the other types of publications on this topic, follow the link: Parallel and dynamic reconfigurable computing.

Dissertations / Theses on the topic 'Parallel and dynamic reconfigurable computing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Parallel and dynamic reconfigurable computing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Viswanathan, Venkatasubramanian. "Une architecture évolutive flexible et reconfigurable dynamiquement pour les systèmes embarqués haute performance." Thesis, Valenciennes, 2015. http://www.theses.fr/2015VALE0029.

Full text
Abstract:
Dans cette thèse, nous proposons une architecture reconfigurable scalable et flexible, avec un réseau de communication parallèle « full-duplex switched » ainsi que le modèle d’exécution approprié ce qui nous a permis de redéfinir les paradigmes de calcul, de communication et de reconfiguration dans les systèmes embarqués à haute performance (HPEC). Ces systèmes sont devenus très sophistiqués et consommant des ressources pour trois raisons. Premièrement, ils doivent capturer et traiter des données en temps réel à partir de plusieurs sources d’E/S parallèles. Deuxièmement, ils devraient adapter leurs fonctionnalités selon l’application ou l’environnement. Troisièmement, à cause du parallélisme potentiel des applications, multiples instances de calcul réparties sur plusieurs nœuds sont nécessaires, ce qui rend ces systèmes massivement parallèles. Grace au parallélisme matériel offert par les FPGAs, la logique d’une fonction peut être reproduite plusieurs fois pour traiter des E/S parallèles, faisant du modèle d’exécution « Single Program Multiple Data » (SPMD) un modèle préféré pour les concepteurs d’architectures parallèles sur FPGA. En plus, la fonctionnalité de reconfiguration dynamique est un autre attrait des composants FPGA permettant la réutilisation efficace des ressources matérielles limitées. Le défi avec les systèmes HPEC actuels est qu’ils sont généralement conçus pour répondre à des besoins spécifiques d’une application engendrant l’obsolescence rapide du matériel. Dans cette thèse, nous proposons une architecture qui permet la personnalisation des nœuds de calcul (FPGA), la diffusion des données (E/S, bitstreams) et la reconfiguration de plusieurs nœuds de calcul en parallèle. L’environnement logiciel exploite les attraits du réseau de communication pour implémenter le modèle d’exécution SPMD.Enfin, afin de démontrer les avantages de notre architecture, nous avons mis en place une application d’encodage H.264 sécurisé distribué évolutif avec plusieurs protocoles de communication avioniques pour les données et le contrôle. Nous avons utilisé le protocole « serial Front Panel Data Port (sFPDP) » d’acquisition de données à haute vitesse basé sur le standard FMC pour capturer, encoder et de crypter le flux vidéo. Le système mis en œuvre s’appuie sur 3 FPGA différents, en respectant le modèle d’exécution SPMD. En outre, nous avons également mis en place un système d’E/S modulaire en échangeant des protocoles dynamiquement selon les besoins du système. Nous avons ainsi conçu une architecture évolutive et flexible et un modèle d’exécution parallèle afin de gérer plusieurs sources vidéo d’entrée parallèles
In this thesis, we propose a scalable and customizable reconfigurable computing platform, with a parallel full-duplex switched communication network, and a software execution model to redefine the computation, communication and reconfiguration paradigms in High Performance Embedded Systems. High Performance Embedded Computing (HPEC) applications are becoming highly sophisticated and resource consuming for three reasons. First, they should capture and process real-time data from several I/O sources in parallel. Second, they should adapt their functionalities according to the application or environment variations within given Size Weight and Power (SWaP) constraints. Third, since they process several parallel I/O sources, applications are often distributed on multiple computing nodes making them highly parallel. Due to the hardware parallelism and I/O bandwidth offered by Field Programmable Gate Arrays (FPGAs), application can be duplicated several times to process parallel I/Os, making Single Program Multiple Data (SPMD) the favorite execution model for designers implementing parallel architectures on FPGAs. Furthermore Dynamic Partial Reconfiguration (DPR) feature allows efficient reuse of limited hardware resources, making FPGA a highly attractive solution for such applications. The problem with current HPEC systems is that, they are usually built to meet the needs of a specific application, i.e., lacks flexibility to upgrade the system or reuse existing hardware resources. On the other hand, applications that run on such hardware architectures are constantly being upgraded. Thus there is a real need for flexible and scalable hardware architectures and parallel execution models in order to easily upgrade the system and reuse hardware resources within acceptable time bounds. Thus these applications face challenges such as obsolescence, hardware redesign cost, sequential and slow reconfiguration, and wastage of computing power.Addressing the challenges described above, we propose an architecture that allows the customization of computing nodes (FPGAs), broadcast of data (I/O, bitstreams) and reconfiguration several or a subset of computing nodes in parallel. The software environment leverages the potential of the hardware switch, to provide support for the SPMD execution model. Finally, in order to demonstrate the benefits of our architecture, we have implemented a scalable distributed secure H.264 encoding application along with several avionic communication protocols for data and control transfers between the nodes. We have used a FMC based high-speed serial Front Panel Data Port (sFPDP) data acquisition protocol to capture, encode and encrypt RAW video streams. The system has been implemented on 3 different FPGAs, respecting the SPMD execution model. In addition, we have also implemented modular I/Os by swapping I/O protocols dynamically when required by the system. We have thus demonstrated a scalable and flexible architecture and a parallel runtime reconfiguration model in order to manage several parallel input video sources. These results represent a conceptual proof of a massively parallel dynamically reconfigurable next generation embedded computers
APA, Harvard, Vancouver, ISO, and other styles
2

SURENDIRANATH, SUDHA. "ACCELERATING DNA SEQUENTIAL ANALYSIS EXPLOITING PARALLEL HARDWARE AND RECONFIGURABLE COMPUTING." University of Cincinnati / OhioLINK, 2005. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1131856327.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Jacob, Aju. "Distributed configuration management for reconfigurable cluster computing." [Gainesville, Fla.] : University of Florida, 2004. http://purl.fcla.edu/fcla/etd/UFE0007181.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Huang, Jian. "RECONFIGURABLE COMPUTING FOR VIDEO CODING." Doctoral diss., University of Central Florida, 2010. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4301.

Full text
Abstract:
Video coding is widely used in our daily life. Due to its high computational complexity, hardware implementation is usually preferred. In this research, we investigate both ASIC hardware design approach and reconfigurable hardware design approach for video coding applications. First, we present a unified architecture that can perform Discrete Cosine Transform (DCT), Inverse Discrete Cosine Transform (IDCT), DCT domain motion estimation and compensation (DCT-ME/MC). Our proposed architecture is a Wavefront Array-based Processor with a highly modular structure consisting of 8*8 Processing Elements (PEs). By utilizing statistical properties and arithmetic operations, it can be used as a high performance hardware accelerator for video transcoding applications. We show how different core algorithms can be mapped onto the same hardware fabric and can be executed through the pre-defined PEs. In addition to the simplified design process of the proposed architecture and savings of the hardware resources, we also demonstrate that high throughput rate can be achieved for IDCT and DCT-MC by fully utilizing the sparseness property of DCT coefficient matrix. Compared to fixed hardware architecture using ASIC design approach, reconfigurable hardware design approach has higher flexibility, lower cost, and faster time-to-market. We propose a self-reconfigurable platform which can reconfigure the architecture of DCT computations during run-time using dynamic partial reconfiguration. The scalable architecture for DCT computations can compute different number of DCT coefficients in the zig-zag scan order to adapt to different requirements, such as power consumption, hardware resource, and performance. We propose a configuration manager which is implemented in the embedded processor in order to adaptively control the reconfiguration of scalable DCT architecture during run-time. In addition, we use LZSS algorithm for compression of the partial bitstreams and on-chip BlockRAM as a cache to reduce latency overhead for loading the partial bitstreams from the off-chip memory for run-time reconfiguration. A hardware module is designed for parallel reconfiguration of the partial bitstreams. The experimental results show that our approach can reduce the external memory accesses by 69% and can achieve 400 MBytes/s reconfiguration rate. Detailed trade-offs of power, throughput, and quality are investigated, and used as a criterion for self-reconfiguration. Prediction algorithm of zero quantized DCT (ZQDCT) to control the run-time reconfiguration of the proposed scalable architecture has been used, and 12 different modes of DCT computations including zonal coding, multi-block processing, and parallel-sequential stage modes are supported to reduce power consumptions, required hardware resources, and computation time with a small quality degradation. Detailed trade-offs of power, throughput, and quality are investigated, and used as a criterion for self-reconfiguration to meet the requirements set by the users.
Ph.D.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Electrical Engineering PhD
APA, Harvard, Vancouver, ISO, and other styles
5

Varvarigos, Emmanouel A. "Static and dynamic communication in parallel computing." Thesis, Massachusetts Institute of Technology, 1992. http://hdl.handle.net/1721.1/12868.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1992.
Includes bibliographical references (p. 186-191).
by Emmanouel A. Varvarigos.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
6

Phan, Cong-Vinh. "Formal aspects of dynamic reconfigurability in reconfigurable computing systems." Thesis, London South Bank University, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.435200.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

PANDEY, ANKUR. "A MULTITHREADED RUNTIME SUPPORT ENVIRONMENT FOR DYNAMIC RECONFIGURABLE COMPUTING." University of Cincinnati / OhioLINK, 2002. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1026133065.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Surendiranath, Sudha. "Accelerating DNA sequential analysis through exploiting parallel hardware and reconfigurable computing." Cincinnati, Ohio : University of Cincinnati, 2005. http://www.ohiolink.edu/etd/view.cgi?acc%5Fnum=ucin1131856327.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Thorndike, David Andrew. "A Multicore Computing Platform for Benchmarking Dynamic Partial Reconfiguration Based Designs." Case Western Reserve University School of Graduate Studies / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=case1338933284.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Craven, Stephen Douglas. "Structured Approach to Dynamic Computing Application Development." Diss., Virginia Tech, 2008. http://hdl.handle.net/10919/27730.

Full text
Abstract:
The ability of some configurable logic devices to modify their hardware during operation has long held great potential to increase performance and reduce device cost. However, despite many research projects and a decade of research, the dynamic reconfiguration of Field Programmable Gate Arrays (FPGAs) is still very much an art practiced by few. Previous attempts to automate the many low-level details that complicate Run-Time Reconfigurable (RTR) application development suffer severe limitations. This dissertation describes a comprehensive approach to dynamic hardware development, providing a designer with appropriate models for computation, communication, and reconfiguration integrated with a high-level design environment. In this way, many manual and time consuming tasks associated with partial reconfiguration are hidden, permitting a designer to focus instead on a design's behavior. This design and implementation environment has been validated on a variety of relevant applications, quantifying the effects of high-level design.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
11

LEE, TAI-CHUN. "AN EVENT-BASED APPROACH TO DEMAND-DRIVEN DYNAMIC RECONFIGURABLE COMPUTING." University of Cincinnati / OhioLINK, 2001. http://rave.ohiolink.edu/etdc/view?acc_num=ucin990821256.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Murphy, Ciaron William. "Run time reconfigurable DSP parallel processing system using dynamic FPGAs." Thesis, Liverpool John Moores University, 2002. http://researchonline.ljmu.ac.uk/4924/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Lanore, Vincent. "On Scalable Reconfigurable Component Models for High-Performance Computing." Thesis, Lyon, École normale supérieure, 2015. http://www.theses.fr/2015ENSL1051/document.

Full text
Abstract:
La programmation à base de composants est un paradigme de programmation qui facilite la réutilisation de code et la séparation des préoccupations. Les modèles à composants dits « reconfigurables » permettent de modifier en cours d'exécution la structure d'une application. Toutefois, ces modèles ne sont pas adaptés au calcul haute performance (HPC) car ils reposent sur des mécanismes ne passant pas à l'échelle.L'objectif de cette thèse est de fournir des modèles, des algorithmes et des outils pour faciliter le développement d'applications HPC reconfigurables à base de composants. La principale contribution de la thèse est le modèle à composants formel DirectMOD qui facilite l'écriture et la réutilisation de code de transformation distribuée. Afin de faciliter l'utilisation de ce premier modèle, nous avons également proposé :• le modèle formel SpecMOD qui permet la spécialisation automatique d'assemblage de composants afin de fournir des fonctionnalités de génie logiciel de haut niveau ; • des mécanismes de reconfiguration performants à grain fin pour les applications AMR, une classe d'application importante en HPC.Une implémentation de DirectMOD, appelée DirectL2C, a été réalisée et a permis d'implémenter une série de benchmarks basés sur l'AMR pour évaluer notre approche. Des expériences sur grappes de calcul et supercalculateur montrent que notre approche passe à l'échelle. De plus, une analyse quantitative du code produit montre que notre approche est compacte et facilite la réutilisation
Component-based programming is a programming paradigm which eases code reuse and separation of concerns. Some component models, which are said to be "reconfigurable", allow the modification at runtime of an application's structure. However, these models are not suited to High-Performance Computing (HPC) as they rely on non-scalable mechanisms.The goal of this thesis is to provide models, algorithms and tools to ease the development of component-based reconfigurable HPC applications.The main contribution of the thesis is the DirectMOD component model which eases development and reuse of distributed transformations. In order to improve on this core model in other directions, we have also proposed:• the SpecMOD formal component model which allows automatic specialization of hierarchical component assemblies and provides high-level software engineering features;• mechanisms for efficient fine-grain reconfiguration for AMR applications, an important application class in HPC.An implementation of DirectMOD, called DirectL2C, as been developed so as to implement a series of benchmarks to evaluate our approach. Experiments on HPC architectures show our approach scales. Moreover, a quantitative analysis of the benchmark's codes show that our approach is compact and eases reuse
APA, Harvard, Vancouver, ISO, and other styles
14

Miller, Simon. "Parallel computing and the molecular dynamic simulation of ionic materials." Thesis, Keele University, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.260050.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Kerr, Andrew. "A model of dynamic compilation for heterogeneous compute platforms." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/47719.

Full text
Abstract:
Trends in computer engineering place renewed emphasis on increasing parallelism and heterogeneity. The rise of parallelism adds an additional dimension to the challenge of portability, as different processors support different notions of parallelism, whether vector parallelism executing in a few threads on multicore CPUs or large-scale thread hierarchies on GPUs. Thus, software experiences obstacles to portability and efficient execution beyond differences in instruction sets; rather, the underlying execution models of radically different architectures may not be compatible. Dynamic compilation applied to data-parallel heterogeneous architectures presents an abstraction layer decoupling program representations from optimized binaries, thus enabling portability without encumbering performance. This dissertation proposes several techniques that extend dynamic compilation to data-parallel execution models. These contributions include: - characterization of data-parallel workloads - machine-independent application metrics - framework for performance modeling and prediction - execution model translation for vector processors - region-based compilation and scheduling We evaluate these claims via the development of a novel dynamic compilation framework, GPU Ocelot, with which we execute real-world workloads from GPU computing. This enables the execution of GPU computing workloads to run efficiently on multicore CPUs, GPUs, and a functional simulator. We show data-parallel workloads exhibit performance scaling, take advantage of vector instruction set extensions, and effectively exploit data locality via scheduling which attempts to maximize control locality.
APA, Harvard, Vancouver, ISO, and other styles
16

Lloyd, G. Scott. "Accelerated Large-Scale Multiple Sequence Alignment with Reconfigurable Computing." BYU ScholarsArchive, 2011. https://scholarsarchive.byu.edu/etd/2729.

Full text
Abstract:
Multiple Sequence Alignment (MSA) is a fundamental analysis method used in bioinformatics and many comparative genomic applications. The time to compute an optimal MSA grows exponentially with respect to the number of sequences. Consequently, producing timely results on large problems requires more efficient algorithms and the use of parallel computing resources. Reconfigurable computing hardware provides one approach to the acceleration of biological sequence alignment. Other acceleration methods typically encounter scaling problems that arise from the overhead of inter-process communication and from the lack of parallelism. Reconfigurable computing allows a greater scale of parallelism with many custom processing elements that have a low-overhead interconnect. The proposed parallel algorithms and architecture accelerate the most computationally demanding portions of MSA. An overall speedup of up to 150 has been demonstrated on a large data set when compared to a single processor. The reduced runtime for MSA allows researchers to solve the larger problems that confront biologists today.
APA, Harvard, Vancouver, ISO, and other styles
17

Krishnan, Manoj Kumar. "ProLAS a novel dynamic load balancing library for advanced scientific computing /." Master's thesis, Mississippi State : Mississippi State University, 2003. http://library.msstate.edu/etd/show.asp?etd=etd-11102003-184622.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Bhardwaj, Prabhaav. "Framework for Hardware Agility on FPGAs." Thesis, Virginia Tech, 2010. http://hdl.handle.net/10919/36347.

Full text
Abstract:
As hardware applications become increasingly complex, the supporting technology needs to evolve and adapt to the demands. Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit, General Purpose Processor, and System on Chip are the preferred devices for solving computational problems. Each of these platforms has its own specific advantages and disadvantages, which need to be accounted for during application development. Flexible radio communications has been dominated by Software Defined Radios. However, research in industry and universities has successfully developed run-time reconfiguration tools to make FPGA designs more flexible and thus vastly reducing configuration times. Developers now have a more powerful platform with dense Digital Signal Processor resources and the flexibility of SDR. Xilinx offers tools such as partial reconfiguration, which is a special modification of the standard tool-flow that supports configuration of the selected partial regions on an FPGA. The AgileHW project improves on the Xilinx tools resource allocation and routing scheme to increase the design agility and productivity. This thesis advances the AgileHW reconfigurable platform so developers can use the newer technology to build enhanced designs.
Master of Science
APA, Harvard, Vancouver, ISO, and other styles
19

Hernandez, Jesus Israel. "Reactive scheduling of DAG applications on heterogeneous and dynamic distributed computing systems." Thesis, University of Edinburgh, 2008. http://hdl.handle.net/1842/2336.

Full text
Abstract:
Emerging technologies enable a set of distributed resources across a network to be linked together and used in a coordinated fashion to solve a particular parallel application at the same time. Such applications are often abstracted as directed acyclic graphs (DAGs), in which vertices represent application tasks and edges represent data dependencies between tasks. Effective scheduling mechanisms for DAG applications are essential to exploit the tremendous potential of computational resources. The core issues are that the availability and performance of resources, which are already by their nature heterogeneous, can be expected to vary dynamically, even during the course of an execution. In this thesis, we first consider the problem of scheduling DAG task graphs onto heterogeneous resources with changeable capabilities. We propose a list-scheduling heuristic approach, the Global Task Positioning (GTP) scheduling method, which addresses the problem by allowing rescheduling and migration of tasks in response to significant variations in resource characteristics. We observed from experiments with GTP that in an execution with relatively frequent migration, it may be that, over time, the results of some task have been copied to several other sites, and so a subsequent migrated task may have several possible sources for each of its inputs. Some of these copies may now be more quickly accessible than the original, due to dynamic variations in communication capabilities. To exploit this observation, we extended our model with a Copying Management(CM) function, resulting in a new version, the Global Task Positioning with copying facilities (GTP/c) system. The idea is to reuse such copies, in subsequent migration of placed tasks, in order to reduce the impact of migration cost on makespan. Finally, we believe that fault tolerance is an important issue in heterogeneous and dynamic computational environments as the availability of resources cannot be guaranteed. To address the problem of processor failure, we propose a rewinding mechanism which rewinds the progress of the application to a previous state, thereby preserving the execution in spite of the failed processor(s). We evaluate our mechanisms through simulation, since this allow us to generate repeatable patterns of resource performance variation. We use a standard benchmark set of DAGs, comparing performance against that of competing algorithms from the scheduling literature.
APA, Harvard, Vancouver, ISO, and other styles
20

Li, Shen Carmen C. Duren Russell Walker. "Evaluating Impulse C and multiple parallelism partitions for a low-cost reconfigurable computing system." Waco, Tex. : Baylor University, 2008. http://hdl.handle.net/2104/5280.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Zhang, Fanjiong. "Design and Verification of SOPC FDP2009 and Research of Reconfigurable Applications." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-66724.

Full text
Abstract:
In recent years, reconfigurable devices are developing fast because of its flexibility and less development cost. But intrinsic shortcomings of reconfigurable devices, for example, high power, low speed, etc. induce difficulties in complex designs realizations. So people began to consider combination of ASIC (Application-Specific Integrated Circuit) and reconfigurable device on a single chip, which is SOPC (System on Programmable Chip). SOPC can not only decrease development risk and timing to market, but also be used in different applications, especially of products that keep varying, for example, communication and network products. Dynamically reconfiguration means reconfigurable device of the chip can be reconfigured repeatable, and performs different functions at different times. Compared with static reconfiguration, dynamic reconfiguration can use the reconfigurable device more thoroughly. It‟s a hot spot of research in the world, especially in reconfigurable computing. This paper mainly concludes my research work in reconfigurable SOPC in 3 major parts: hardware, software and application. The following works and innovations are completed: 1. SOPC hardware system architecture design and discussion. Helps to define the system architecture and design goals. The design of EBI controller which is used in the SOPC. The integration of the blocks in the system. 2. The building-up of the SOPC system-level verification and block-level verification environment. The set-up of the hardware-software co-simulation environment. The post-layout simulation and formal verification tasks. We propose an innovative automated regression system. The system helps to achieve the same simulation coverage (95%) and the total simulation time is reduced by approximately 30%. 3. SOPC software design, including the OS kernel porting, drivers design and application design. The design of the PowerPC initialization program and UART (Universal Asynchronous Receiver/Transmitter), reconfiguring communication driver programs. Writing the test-cases which are specialized for the system verification and hardware testing. 4. Being the co-designer of the novel bus macro based on the FDP reconfigurable logic core. And we realize the whole reconfigurable system based on this bus macro. 5. The reconfigurable application research based on Reconfigurable Logic Core. The reconfigurable image filter designed implemented on FDP300K Reconfigurable Logic Core device. Using self-design Reconfigurable Logic Core internal bus macro to implement the partial reconfigurable system. The test results showed that the reconfigurable filter has the feature of fast configuration speed and good output image quality.
APA, Harvard, Vancouver, ISO, and other styles
22

Banihashemi, Seyed Parsa. "Parallel explicit FEM algorithms using GPU's." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54391.

Full text
Abstract:
The Explicit Finite Element Method is a powerful tool in nonlinear dynamic finite element analysis. Recent major developments in computational devices, in particular, General Purpose Graphical Processing Units (GPGPU's) now make it possible to increase the performance of the explicit FEM. This dissertation investigates existing explicit finite element method algorithms which are then redesigned for GPU's and implemented. The performance of these algorithms is assessed and a new asynchronous variational integrator spatial decomposition (AVISD) algorithm is developed which is flexible and encompasses all other methods and can be tuned based for a user-defined problem and the performance of the user's computer. The mesh-aware performance of the proposed explicit finite element algorithm is studied and verified by implementation. The current research also introduces the use of a Particle Swarm Optimization method to tune the performance of the proposed algorithm automatically given a finite element mesh and the performance characteristics of a user's computer. For this purpose, a time performance model is developed which depends on the finite element mesh and the machine performance. This time performance model is then used as an objective function to minimize the run-time cost. Also, based on the performance model provided in this research and predictions about the changes in GPU's in the near future, the performance of the AVISD method is predicted for future machines. Finally, suggestions and insights based on these results are proposed to help facilitate future explicit FEM development.
APA, Harvard, Vancouver, ISO, and other styles
23

Gil, Otero Rafael. "Characterisation of a reconfigurable free space optical interconnect system for parallel computing applications and experimental validation using rapid prototyping technology." Thesis, Heriot-Watt University, 2008. http://hdl.handle.net/10399/2141.

Full text
Abstract:
Free-space optical interconnects (FSOIs) are widely seen as a potential solution to present and future bandwidth bottlenecks for parallel processing applications. This thesis will be focused on the study of a particular FSOI system called Optical Highway (OH). The OH is a polarised beam routing system which uses Polarising Beam Splitters and Liquid Crystals (PBS/LC) assemblies to perform reconfigurable interconnection networks. The properties of the OH make it suitable for implementing different passive static networks. A technology known as Rapid Prototyping (RP) will be employed for the first time in order to create optomechanical structures at low cost and low production times. Off-theshelf optical components will also be characterised in order to implement the OH. Additionally, properties such as reconfigurability, scalability, tolerance to misalignment and polarisation losses will be analysed. The OH will be modelled at three levels: node, optical stage and architecture. Different designs will be proposed and a particular architecture, Optimised Cut-Through Ring (OCTR), will be experimentally implemented. Finally, based on this architecture, a new set of properties will be defined in order to optimise the efficiency of the optical channels.
APA, Harvard, Vancouver, ISO, and other styles
24

Iturbe, Xabier. "Design and implementation of a reliable reconfigurable real-time operating system (R3TOS)." Thesis, University of Edinburgh, 2013. http://hdl.handle.net/1842/9413.

Full text
Abstract:
Twenty-first century Field-Programmable Gate Arrays (FPGAs) are no longer used for implementing simple “glue logic” functions. They have become complex arrays of reconfigurable logic resources and memories as well as highly optimised functional blocks, capable of implementing large systems on a single chip. Moreover, Dynamic Partial Reconfiguration (DPR) capability permits to adjust some logic resources on the chip at runtime, whilst the rest are still performing active computations. During the last few years, DPR has become a hot research topic with the objective of building more reliable, efficient and powerful electronic systems. For instance, DPR can be used to mitigate spontaneously occurring bit upsets provoked by radiation, or to jiggle around the FPGA resources which progressively get damaged as the silicon ages. Moreover, DPR is the enabling technology for a new computing paradigm which combines computation in time and space. In Reconfigurable Computing (RC), a battery of computation-specific circuits (“hardware tasks”) are swapped in and out of the FPGA on demand to hold a continuous stream of input operands, computation and output results. Multitasking, adaptation and specialisation are key properties in RC, as multiple swappable tasks can run concurrently at different positions on chip, each with custom data-paths for efficient execution of specific computations. As a result, considerable computational throughput can be achieved even at low clock frequencies. However, DPR penetration in the commercial market is still testimonial, mainly due to the lack of suitable high-level design tools to exploit this technology. Indeed, currently, special skills are required to successfully develop a dynamically reconfigurable application. In light of the above, this thesis aims at bridging the gap between high-level application and low-level DPR technology. Its main objective is to develop Operating System (OS)-like support for high-level software-centric application developers in order to exploit the benefits brought about by DPR technology, without having to deal with the complex low-level hardware details. The developed solution in this thesis is named as R3TOS, which stands for Reliable Reconfigurable Real-Time Operating System. R3TOS defines a flexible infrastructure for reliably executing reconfigurable hardware-based applications under real-time constraints. In R3TOS, the hardware tasks are scheduled in order to meet their computation deadlines and allocated to non-damaged resources, keeping the system fault-free at all times. In addition, R3TOS envisages a computing framework whereby both hardware and software tasks coexist in a seamless manner, allowing the user to access the advanced computation capabilities of modern reconfigurable hardware from a software “look and feel” environment. This thesis covers all of the design and implementation aspects of R3TOS. The thesis proposes a novel EDF-based scheduling algorithm, two novel task allocation heuristics (EAC and EVC) and a novel task allocation strategy (called Snake), addressing many RC-related particularities as well as technological constraints imposed by current FPGA technology. Empirical results show that these approaches improve on the state of the art. Besides, the thesis describes a novel way to harness the internal reconfiguration mechanism of modern FPGAs to performinter-task communications and synchronisation regardless of the physical location of tasks on-chip. This paves the way for implementing more sophisticated RC solutions which were only possible in theory in the past. The thesis illustrates R3TOS through a proof-of-concept prototype with two demonstrator applications: (1) dependability oriented control of the power chain of a railway traction vehicle, and (2) datastreaming oriented Software Defined Radio (SDR).
APA, Harvard, Vancouver, ISO, and other styles
25

Lima, João Vicente Ferreira. "Controle de granularidade com threads em programas MPI dinâmicos." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2009. http://hdl.handle.net/10183/16132.

Full text
Abstract:
Nos últimos anos, a crescente demanda por alto desempenho tem favorecido o surgimento de arquiteturas e algoritmos cada vez mais eficientes. A popularidade das plataformas distribuídas levanta novas questões no desenvolvimento de algoritmos paralelos tais como comunicação, heterogeneidade e dinamismo de recursos. Estas questões podem resultar em aplicações com carga de trabalho conhecida somente em tempo de execução. A irregularidade do algoritmo ou da entrada de dados também pode influenciar na carga de trabalho da aplicação. Uma aplicação paralela pode solucionar estas questões por meio de algoritmos dinâmicos ao utilizar técnicas de programação que definam o trabalho de uma tarefa e possibilitem a utilização de recursos sob demanda. A granularidade, que é a razão entre processamento e comunicação, considera questões práticas de execução e é um fator importante no desempenho de algoritmos dinâmicos. A implementação de um controle de granularidade é complicada e depende do suporte dos ambientes de programação. Porém, os ambientes de programação possuem interfaces extensas e complicadas que dificultam sua utilização em PAD. Este trabalho propõe a implementação de uma biblioteca (libSpawn) que incorpora um controle de granularidade em aplicações MPI dinâmicas. A biblioteca controla a granularidade ao mapear tarefas entre processos ou threads de acordo com três parâmetros: cores da arquitetura, carga e recursos de sistema. Os tempos obtidos com processos e libSpawn demonstram ganhos significativos em benchmarks sintéticos utilizados por outros ambientes de programação. Não obstante, constata-se carências na implementação atual que produzem tempos anômalos, ainda que estes sejam insignificantes em relação aos tempos com processos.
In the last years, the demand for high performance enables the emergence of more efficient computing platforms and algorithms. The increase of distributed computing platforms rises new challenges for parallel algorithm development like communication, heterogeneity, and resource management. These factors can result in applications whose work load is unknown until runtime. An irregular behavior from algorithm or data can also affect the work load. A parallel application can solve these questions through a programming technique which predicts the work load of a task and offers resource on demand. The granularity, which is the ratio of computation to communication, considers more practical issues, and is an important factor in performance of dynamic algorithms. However, this control is difficult to be designed and the support of a programming tool is needed. Yet, the programming tools have extensive and complicated interfaces which difficult your usage in HPC. This work implements a library (libSpawn) which adds a granularity control on MPI dynamic programs. The library controls the granularity by mapping tasks between processes or threads with three parameters: cores of architecture, load and resources of the operating system. The results obtained between processes and libSpawn show significant gains on synthetic benchmarks from other programming tools.
APA, Harvard, Vancouver, ISO, and other styles
26

Chinnusamy, Malarvizhi. "Data and Processor Mapping Strategies for Dynamically Resizable Parallel Applications." Thesis, Virginia Tech, 2004. http://hdl.handle.net/10919/33868.

Full text
Abstract:

Due to the unpredictability in job arrival times in clusters and widely varying resource requirements, dynamic scheduling of parallel computing resources is necessary to increase system throughput. Dynamically resizable applications provide the flexibility needed for dynamic scheduling. These applications can expand to take advantage of additional free processors, or to meet a Quality of Service (QoS) deadline, or can shrink to accommodate a high priority application, without getting suspended.

This thesis is part of a larger effort to define a framework for dynamically resizable parallel applications. This framework includes a scheduler that supports resizing applications, an API to enable applications to interact with the scheduler, and libraries that make resizing viable. This thesis focuses on libraries for efficient resizing of parallel applications â efficient in terms of minimizing the cost of data redistribution, choosing and allocating the right set of additional processors, and focusing on the performance of the application after resizing. We explore the tradeoffs between these goals on both homogeneous and heterogeneous clusters. We focus on structured applications that have 2D data arrays distributed across a 2D processor grid.

Our library includes algorithms for processor selection and processor mapping. For homogeneous clusters, processor selection involves selecting the number of processors that needs to be added and processor mapping decides the placement of the new processors in the context of the given topology such that it minimizes the amount of data that is to be redistributed. For heterogeneous clusters, since the processing powers of the processors vary, there is also an additional problem of choosing the right set of processors that needs to be added. We also present results that demonstrate the effectiveness of our approach.


Master of Science
APA, Harvard, Vancouver, ISO, and other styles
27

Siverskog, Jacob. "Evaluation of partial reconfiguration for FPGA debugging." Thesis, Linköping University, Computer Engineering, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-57714.

Full text
Abstract:

Reconfigurable computing is an old concept that during the past couple of decades has become increasingly popular. The concept combines the flexibility of software with the performance of hardware. One important contributing factor to the uprising in popularity is the presence of FPGAs (field-programmable gate arrays), which realize the concept by allowing the hardware to be reconfigured dynamically. The current state of reconfigurable computing is discussed further in the thesis.

Debugging is a vital part in the development of a hardware design. It can be done in several ways depending on the situation. The most common way is to perform simulations but in some cases the fault-finding has to be done when the design is implemented in hardware.

In this thesis a framework concept is designed that utilizes and evaluates some of the reconfigurable computing ideas. The framework provides debugging possibilities for FPGA designs in a novel way, with a modular system where each module provide means to aid finding a specific fault. The framework is added to an existing design, and offers the user a glimpse into the design behavior and the hardware it runs on.

One of the debug modules will be released separately under a free license. It allows the developer to see the contents of the memories in a design without requiring special debugging equipment.

APA, Harvard, Vancouver, ISO, and other styles
28

Helal, Manal Computer Science &amp Engineering Faculty of Engineering UNSW. "Indexing and partitioning schemes for distributed tensor computing with application to multiple sequence alignment." Awarded by:University of New South Wales. Computer Science & Engineering, 2009. http://handle.unsw.edu.au/1959.4/44781.

Full text
Abstract:
This thesis investigates indexing and partitioning schemes for high dimensional scientific computational problems. Building on the foundation offered by Mathematics of Arrays (MoA) for tensor-based computation, the ultimate contribution of the thesis is a unified partitioning scheme that works invariant of the dataset dimension and shape. Consequently, portability is ensured between different high performance machines, cluster architectures, and potentially computational grids. The Multiple Sequence Alignment (MSA) problem in computational biology has an optimal dynamic programming based solution, but it becomes computationally infeasible as its dimensionality (the number of sequences) increases. Even sub-optimal approximations may be unmanageable for more than eight sequences. Furthermore, no existing MSA algorithms have been formulated in a manner invariant over the number of sequences. This thesis presents an optimal distributed MSA method based on MoA. The latter offers a set of constructs that help represent multidimensional arrays in memory in a linear, concise and efficient way. Using MoA allows the partitioning of the dynamic programming algorithm to be expressed independently of dimension. MSA is the highest dimensional scientific problem considered for MoA-based partitioning to date. Two partitioning schemes are presented: the first is a master/slave approach which is based on both master/slave scheduling and slave/slave coupling. The second approach is a peer-to-peer design, in which the scheduling and dependency communication are calculated independently by each process, with no need for a master scheduler. A search space reduction technique is introduced to cater for the exponential expansion as the problem dimensionality increases. This technique relies on defining a hyper-diagonal through the tensor space, and choosing a band of neighbouring partitions around the diagonal to score. In contrast, other sub-optimal methods in the literature only consider projections on the surface of the hyper-cube. The resulting massively parallel design produces a scalable solution that has been implemented on high performance machines and cluster architectures. Experimental results for these implementations are presented for both simulated and real datasets. Comparisons between the reduced search space technique of this thesis with other sub-optimal methods for the MSA problem are presented.
APA, Harvard, Vancouver, ISO, and other styles
29

Tesser, Rafael Keller. "A simulation workflow to evaluate the performance of dynamic load balancing with over decomposition for iterative parallel applications." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2018. http://hdl.handle.net/10183/180129.

Full text
Abstract:
Nesta tese é apresentado um novo workflow de simulação para avaliar o desempenho do balanceamento de carga dinâmico baseado em sobre-decomposição aplicado a aplicações paralelas iterativas. Seus objetivos são realizar essa avaliação com modificações mínimas da aplicação e a baixo custo em termos de tempo e de sua necessidade de recursos computacionais. Muitas aplicações paralelas sofrem com desbalanceamento de carga dinâmico (temporal) que não pode ser tratado a nível de aplicação. Este pode ser causado por características intrínsecas da aplicação ou por fatores externos de hardware ou software. Como demonstrado nesta tese, tal desbalanceamento é encontrado mesmo em aplicações cujo código não aparenta qualquer dinamismo. Portanto, faz-se necessário utilizar mecanismo de balanceamento de carga dinâmico a nível de runtime. Este trabalho foca no balanceamento de carga dinâmico baseado em sobre-decomposição. No entanto, avaliar e ajustar o desempenho de tal técnica pode ser custoso. Isso geralmente requer modificações na aplicação e uma grande quantidade de execuções para obter resultados estatisticamente significativos com diferentes combinações de parâmetros de balanceamento de carga Além disso, para que essas medidas sejam úteis, são usualmente necessárias grandes alocações de recursos em um sistema de produção. Simulated Adaptive MPI (SAMPI), nosso workflow de simulação, emprega uma combinação de emulação sequencial e replay de rastros para reduzir os custos dessa avaliação. Tanto emulação sequencial como replay de rastros requerem um único nó computacional. Além disso, o replay demora apenas uma pequena fração do tempo de uma execução paralela real da aplicação. Adicionalmente à simulação de balanceamento de carga, foram desenvolvidas técnicas de agregação espacial e rescaling a nível de aplicação, as quais aceleram o processo de emulação. Para demonstrar os potenciais benefícios do balanceamento de carga dinâmico com sobre-decomposição, foram avaliados os ganhos de desempenho empregando essa técnica a uma aplicação iterativa paralela da área de geofísica (Ondes3D). Adaptive MPI (AMPI) foi utilizado para prover o suporte a balanceamento de carga dinâmico, resultando em ganhos de desempenho de até 36.58% em 288 cores de um cluster Essa avaliação também é usada pra ilustrar as dificuldades encontradas nesse processo, assim justificando o uso de simulação para facilitá-la. Para implementar o workflow SAMPI, foi utilizada a interface SMPI do simulador SimGrid, tanto no modo de emulação, como no de replay de rastros. Para validar esse simulador, foram comparadas execuções simuladas (SAMPI) e reais (AMPI) da aplicação Ondes3D. As simulações apresentaram uma evolução do balanceamento de carga bastante similar às execuções reais. Adicionalmente, SAMPI estimou com sucesso a melhor heurística de balanceamento de carga para os cenários testados. Além dessa validação, nesta tese é demonstrado o uso de SAMPI para exploração de parâmetros de balanceamento de carga e para planejamento de capacidade computacional. Quanto ao desempenho da simulação, estimamos que o workflow completo é capaz de simular a execução do Ondes3D com 24 combinações de parâmetros de balanceamento de carga em 5 horas para o nosso cenário de terremoto mais pesado e 3 horas para o mais leve.
In this thesis we present a novel simulation workflow to evaluate the performance of dynamic load balancing with over-decomposition applied to iterative parallel applications at low-cost. Its goals are to perform such evaluation with minimal application modification and at a low cost in terms of time and of resource requirements. Many parallel applications suffer from dynamic (temporal) load imbalance that can not be treated at the application level. It may be caused by intrinsic characteristics of the application or by external software and hardware factors. As demonstrated in this thesis, such dynamic imbalance can be found even in applications whose codes do not hint at any dynamism. Therefore, we need to rely on runtime dynamic load balancing mechanisms, such as dynamic load balancing based on over-decomposition. The problem is that evaluating and tuning the performance of such technique can be costly. This usually entails modifications to the application and a large number of executions to get statistically sound performance measurements with different load balancing parameter combinations. Moreover, useful and accurate measurements often require big resource allocations on a production cluster. Our simulation workflow, dubbed Simulated Adaptive MPI (SAMPI), employs a combined sequential emulation and trace-replay simulation approach to reduce the cost of such an evaluation Both sequential emulation and trace-replay require a single computer node. Additionally, the trace-replay simulation lasts a small fraction of the real-life parallel execution time of the application. Besides the basic SAMPI simulation, we developed spatial aggregation and applicationlevel rescaling techniques to speed-up the emulation process. To demonstrate the real-life performance benefits of dynamic load balance with over-decomposition, we evaluated the performance gains obtained by employing this technique on a iterative parallel geophysics application, called Ondes3D. Dynamic load balancing support was provided by Adaptive MPI (AMPI). This resulted in up to 36.58% performance improvement, on 288 cores of a cluster. This real-life evaluation also illustrates the difficulties found in this process, thus justifying the use of simulation. To implement the SAMPI workflow, we relied on SimGrid’s Simulated MPI (SMPI) interface in both emulation and trace-replay modes.To validate our simulator, we compared simulated (SAMPI) and real-life (AMPI) executions of Ondes3D. The simulations presented a load balance evolution very similar to real-life and were also successful in choosing the best load balancing heuristic for each scenario. Besides the validation, we demonstrate the use of SAMPI for load balancing parameter exploration and for computational capacity planning. As for the performance of the simulation itself, we roughly estimate that our full workflow can simulate the execution of Ondes3D with 24 different load balancing parameter combinations in 5 hours for our heavier earthquake scenario and in 3 hours for the lighter one.
APA, Harvard, Vancouver, ISO, and other styles
30

Balasubramaniam, Mahadevan. "Performance analysis and evaluation of dynamic loop scheduling techniques in a competitive runtime environment for distributed memory architectures." Master's thesis, Mississippi State : Mississippi State University, 2003. http://library.msstate.edu/etd/show.asp?etd=etd-04022003-154254.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Templin, Joshua R. "Design of an Adaptable Run-Time Reconfigurable Software-Defined Radio Processing Architecture." DigitalCommons@USU, 2010. https://digitalcommons.usu.edu/etd/810.

Full text
Abstract:
Processing power is a key technical challenge holding back the development of a high-performance software defined radio (SDR). Traditionally, SDR has utilized digital signal processors (DSPs), but increasingly complex algorithms, higher data rates, and multi-tasking needs have exceed the processing capabilities of modern DSPs. Reconfigurable computers, such as field-programmable gate arrays (FPGAs), are popular alternatives because of their performance gains over software for streaming data applications like SDR. However, FPGAs have not yet realized the ideal SDR because architectures have not fully utilized their partial reconfiguration (PR) capabilities to bring needed flexibility. A reconfigurable processor architecture is proposed that utilizes PR in reconfigurable computers to achieve a more sophisticated SDR. The proposed processor contains run-time swappable blocks whose parameters and interconnects are programmable. The architecture is analyzed for performance and flexibility and compared with available alternate technologies. For a sample QPSK algorithm, hardware performance gains of at least 44x are seen over modern desktop processors and DSPs while most of their flexibility and extensibility is maintained.
APA, Harvard, Vancouver, ISO, and other styles
32

Afonso, Fernando Abrahão. "MPI2.NET : criação dinâmica de tarefas com orientação a objetos." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2010. http://hdl.handle.net/10183/26952.

Full text
Abstract:
Message Passing Interface (MPI) é o padrão de facto para o desenvolvimento de aplicações paralelas e de alto desempenho que executem em clusters. O padrão define APIs para as linguagens de programação Fortran, C e C++. Por outro lado a programação orientada a objetos é o paradigma de programação dominante atualmente, onde linguagens de programação como Java e C# têm se tornado muito populares. Isso se deve às abstrações voltadas para facilitar a programação oriundas dessas linguagens de programação, permitindo um ciclo de programação/manutenção mais eficiente. Devido a isso, diversas bibliotecas MPI para essas linguagens emergiram. Dentre elas, pode-se destacar a biblioteca MPI.NET, para a linguagem de programação C#, que possui a melhor relação entre abstração e desempenho. Na computação paralela, o modelo utilizado para o desenvolvimento das aplicações é muito importante, sendo que o modelo Divisão & Conquista é escalável, aplicável a diversos problemas e permite a execução eficiente de aplicações cuja carga de trabalho é desconhecida ou irregular. Para programar utilizando esse modelo é necessário que o ambiente de execução suporte dinamismo, o que não é suportado pela biblioteca MPI.NET. Desse cenário emerge a principal motivação desse trabalho, cujo objetivo é explorar a criação dinâmica de tarefas na biblioteca MPI.NET. Ao final, foi possível obter uma biblioteca com desempenho competitivo em relação ao desempenho das bibliotecas MPI para C++.
Message Passing Interface (MPI) is the de facto standard for the development of high performance applications executing on clusters. The standard defines APIs for the programming languages Fortran C and C++. On the other hand, object oriented programming has become the dominant programming paradigm, where programming languages as Java and C# are becoming very popular. This can be justified by the abstractions contained in these programming languages, allowing a more efficient programming/maintenance cycle. Because of this, several MPI libraries emerged for these programming languages. Among them, we can highlight the MPI.NET library for the C# programming language, which has the best relation between abstraction and performance. In parallel computing, the model used for the development of applications is very important, and the Divide and Conquer model is efficiently scalable, applicable to several problems and allows efficient execution of applications whose workload is unknown or irregular. To program using this model, the execution environment must provide dynamism, which is not provided by the MPI.NET library. From this scenario emerges the main goal of this work, which is to explore dynamic tasks creation on the MPI.NET library. In the end we where able to obtain a library with competitive performance against MPI C++ libraries.
APA, Harvard, Vancouver, ISO, and other styles
33

De, Grande Robson E. "Dynamic Load Balancing Schemes for Large-scale HLA-based Simulations." Thèse, Université d'Ottawa / University of Ottawa, 2012. http://hdl.handle.net/10393/23110.

Full text
Abstract:
Dynamic balancing of computation and communication load is vital for the execution stability and performance of distributed, parallel simulations deployed on shared, unreliable resources of large-scale environments. High Level Architecture (HLA) based simulations can experience a decrease in performance due to imbalances that are produced initially and/or during run-time. These imbalances are generated by the dynamic load changes of distributed simulations or by unknown, non-managed background processes resulting from the non-dedication of shared resources. Due to the dynamic execution characteristics of elements that compose distributed simulation applications, the computational load and interaction dependencies of each simulation entity change during run-time. These dynamic changes lead to an irregular load and communication distribution, which increases overhead of resources and execution delays. A static partitioning of load is limited to deterministic applications and is incapable of predicting the dynamic changes caused by distributed applications or by external background processes. Due to the relevance in dynamically balancing load for distributed simulations, many balancing approaches have been proposed in order to offer a sub-optimal balancing solution, but they are limited to certain simulation aspects, specific to determined applications, or unaware of HLA-based simulation characteristics. Therefore, schemes for balancing the communication and computational load during the execution of distributed simulations are devised, adopting a hierarchical architecture. First, in order to enable the development of such balancing schemes, a migration technique is also employed to perform reliable and low-latency simulation load transfers. Then, a centralized balancing scheme is designed; this scheme employs local and cluster monitoring mechanisms in order to observe the distributed load changes and identify imbalances, and it uses load reallocation policies to determine a distribution of load and minimize imbalances. As a measure to overcome the drawbacks of this scheme, such as bottlenecks, overheads, global synchronization, and single point of failure, a distributed redistribution algorithm is designed. Extensions of the distributed balancing scheme are also developed to improve the detection of and the reaction to load imbalances. These extensions introduce communication delay detection, migration latency awareness, self-adaptation, and load oscillation prediction in the load redistribution algorithm. Such developed balancing systems successfully improved the use of shared resources and increased distributed simulations' performance.
APA, Harvard, Vancouver, ISO, and other styles
34

Edirisinghe, Pathirannehelage Neranjan S. "Charge Transfer in Deoxyribonucleic Acid (DNA): Static Disorder, Dynamic Fluctuations and Complex Kinetic." Digital Archive @ GSU, 2011. http://digitalarchive.gsu.edu/phy_astr_diss/45.

Full text
Abstract:
The fact that loosely bonded DNA bases could tolerate large structural fluctuations, form a dissipative environment for a charge traveling through the DNA. Nonlinear stochastic nature of structural fluctuations facilitates rich charge dynamics in DNA. We study the complex charge dynamics by solving a nonlinear, stochastic, coupled system of differential equations. Charge transfer between donor and acceptor in DNA occurs via different mechanisms depending on the distance between donor and acceptor. It changes from tunneling regime to a polaron assisted hopping regime depending on the donor-acceptor separation. Also we found that charge transport strongly depends on the feasibility of polaron formation. Hence it has complex dependence on temperature and charge-vibrations coupling strength. Mismatched base pairs, such as different conformations of the G・A mispair, cause only minor structural changes in the host DNA molecule, thereby making mispair recognition an arduous task. Electron transport in DNA that depends strongly on the hopping transfer integrals between the nearest base pairs, which in turn are affected by the presence of a mispair, might be an attractive approach in this regard. I report here on our investigations, via the I –V characteristics, of the effect of a mispair on the electrical properties of homogeneous and generic DNA molecules. The I –V characteristics of DNA were studied numerically within the double-stranded tight-binding model. The parameters of the tight-binding model, such as the transfer integrals and on-site energies, are determined from first-principles calculations. The changes in electrical current through the DNA chain due to the presence of a mispair depend on the conformation of the G・A mispair and are appreciable for DNA consisting of up to 90 base pairs. For homogeneous DNA sequences the current through DNA is suppressed and the strongest suppression is realized for the G(anti)・A(syn) conformation of the G・A mispair. For inhomogeneous (generic) DNA molecules, the mispair result can be either suppression or an enhancement of the current, depending on the type of mispairs and actual DNA sequence.
APA, Harvard, Vancouver, ISO, and other styles
35

Bahcecioglu, Tunc. "Parallel Solution Of Soil-structure Interaction Problems On Pc Clusters." Master's thesis, METU, 2011. http://etd.lib.metu.edu.tr/upload/12612954/index.pdf.

Full text
Abstract:
Numerical assessment of soil structure interaction problems require heavy computational efforts because of the dynamic and iterative (nonlinear) nature of the problems. Furthermore, modeling soil-structure interaction may require
APA, Harvard, Vancouver, ISO, and other styles
36

Subbiah, Arun. "Design and evaluation of a distributed diagnosis algorithm for arbitrary network topologies in dynamic fault environments." Thesis, Georgia Institute of Technology, 2001. http://hdl.handle.net/1853/13273.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Bokhari, Saniyah S. "Parallel Solution of the Subset-sum Problem: An Empirical Study." The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1305898281.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Green, Oded. "High performance computing for irregular algorithms and applications with an emphasis on big data analytics." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/51860.

Full text
Abstract:
Irregular algorithms such as graph algorithms, sorting, and sparse matrix multiplication, present numerous programming challenges, including scalability, load balancing, and efficient memory utilization. In this age of Big Data we face additional challenges since the data is often streaming at a high velocity and we wish to make near real-time decisions for real-world events. For instance, we may wish to track Twitter for the pandemic spread of a virus. Analyzing such data sets requires combing algorithmic optimizations and utilization of massively multithreaded architectures, accelerator such as GPUs, and distributed systems. My research focuses upon designing new analytics and algorithms for the continuous monitoring of dynamic social networks. Achieving high performance computing for irregular algorithms such as Social Network Analysis (SNA) is challenging as the instruction flow is highly data dependent and requires domain expertise. The rapid changes in the underlying network necessitates understanding real-world graph properties such as the small world property, shrinking network diameter, power law distribution of edges, and the rate at which updates occur. These properties, with respect to a given analytic, can help design load-balancing techniques, avoid wasteful (redundant) computations, and create streaming algorithms. In the course of my research I have considered several parallel programming paradigms for a wide range systems of multithreaded platforms: x86, NVIDIA's CUDA, Cray XMT2, SSE-SIMD, and Plurality's HyperCore. These unique programming models require examination of the parallel programming at multiple levels: algorithmic design, cache efficiency, fine-grain parallelism, memory bandwidths, data management, load balancing, scheduling, control flow models and more. This thesis deals with these issues and more.
APA, Harvard, Vancouver, ISO, and other styles
39

Bauer, Heiner. "Dynamic instruction set extension of microprocessors with embedded FPGAs." Master's thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2017. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-222858.

Full text
Abstract:
Increasingly complex applications and recent shifts in technology scaling have created a large demand for microprocessors which can perform tasks more quickly and more energy efficient. Conventional microarchitectures exploit multiple levels of parallelism to increase instruction throughput and use application specific instruction sets or hardware accelerators to increase energy efficiency. Reconfigurable microprocessors adopt the same principle of providing application specific hardware, however, with the significant advantage of post-fabrication flexibility. Not only does this offer similar gains in performance but also the flexibility to configure each device individually. This thesis explored the benefit of a tight coupled and fine-grained reconfigurable microprocessor. In contrast to previous research, a detailed design space exploration of logical architectures for island-style field programmable gate arrays (FPGAs) has been performed in the context of a commercial 22nm process technology. Other research projects either reused general purpose architectures or spent little effort to design and characterize custom fabrics, which are critical to system performance and the practicality of frequently proposed high-level software techniques. Here, detailed circuit implementations and a custom area model were used to estimate the performance of over 200 different logical FPGA architectures with single-driver routing. Results of this exploration revealed similar tradeoffs and trends described by previous studies. The number of lookup table (LUT) inputs and the structure of the global routing network were shown to have a major impact on the area delay product. However, results suggested a much larger region of efficient architectures than before. Finally, an architecture with 5-LUTs and 8 logic elements per cluster was selected. Modifications to the microprocessor, whichwas based on an industry proven instruction set architecture, and its software toolchain provided access to this embedded reconfigurable fabric via custom instructions. The baseline microprocessor was characterized with estimates from signoff data for a 28nm hardware implementation. A modified academic FPGA tool flow was used to transform Verilog implementations of custom instructions into a post-routing netlist with timing annotations. Simulation-based verification of the system was performed with a cycle-accurate processor model and diverse application benchmarks, ranging from signal processing, over encryption to computation of elementary functions. For these benchmarks, a significant increase in performance with speedups from 3 to 15 relative to the baseline microprocessor was achieved with the extended instruction set. Except for one case, application speedup clearly outweighed the area overhead for the extended system, even though the modeled fabric architecturewas primitive and contained no explicit arithmetic enhancements. Insights into fundamental tradeoffs of island-style FPGA architectures, the developed exploration flow, and a concrete cost model are relevant for the development of more advanced architectures. Hence, this work is a successful proof of concept and has laid the basis for further investigations into architectural extensions and physical implementations. Potential for further optimizationwas identified on multiple levels and numerous directions for future research were described
Zunehmend komplexere Anwendungen und Besonderheiten moderner Halbleitertechnologien haben zu einer großen Nachfrage an leistungsfähigen und gleichzeitig sehr energieeffizienten Mikroprozessoren geführt. Konventionelle Architekturen versuchen den Befehlsdurchsatz durch Parallelisierung zu steigern und stellen anwendungsspezifische Befehlssätze oder Hardwarebeschleuniger zur Steigerung der Energieeffizienz bereit. Rekonfigurierbare Prozessoren ermöglichen ähnliche Performancesteigerungen und besitzen gleichzeitig den enormen Vorteil, dass die Spezialisierung auf eine bestimmte Anwendung nach der Herstellung erfolgen kann. In dieser Diplomarbeit wurde ein rekonfigurierbarer Mikroprozessor mit einem eng gekoppelten FPGA untersucht. Im Gegensatz zu früheren Forschungsansätzen wurde eine umfangreiche Entwurfsraumexploration der FPGA-Architektur im Zusammenhang mit einem kommerziellen 22nm Herstellungsprozess durchgeführt. Bisher verwendeten die meisten Forschungsprojekte entweder kommerzielle Architekturen, die nicht unbedingt auf diesen Anwendungsfall zugeschnitten sind, oder die vorgeschlagenen FGPA-Komponenten wurden nur unzureichend untersucht und charakterisiert. Jedoch ist gerade dieser Baustein ausschlaggebend für die Leistungsfähigkeit des gesamten Systems. Deshalb wurden im Rahmen dieser Arbeit über 200 verschiedene logische FPGA-Architekturen untersucht. Zur Modellierung wurden konkrete Schaltungstopologien und ein auf den Herstellungsprozess zugeschnittenes Modell zur Abschätzung der Layoutfläche verwendet. Generell wurden die gleichen Trends wie bei vorhergehenden und ähnlich umfangreichen Untersuchungen beobachtet. Auch hier wurden die Ergebnisse maßgeblich von der Größe der LUTs (engl. "Lookup Tables") und der Struktur des Routingnetzwerks bestimmt. Gleichzeitig wurde ein viel breiterer Bereich von Architekturen mit nahezu gleicher Effizienz identifiziert. Zur weiteren Evaluation wurde eine FPGA-Architektur mit 5-LUTs und 8 Logikelementen ausgewählt. Die Performance des ausgewählten Mikroprozessors, der auf einer erprobten Befehlssatzarchitektur aufbaut, wurde mit Ergebnissen eines 28nm Testchips abgeschätzt. Eine modifizierte Sammlung von akademischen Softwarewerkzeugen wurde verwendet, um Spezialbefehle auf die modellierte FPGA-Architektur abzubilden und eine Netzliste für die anschließende Simulation und Verifikation zu erzeugen. Für eine Reihe unterschiedlicher Anwendungs-Benchmarks wurde eine relative Leistungssteigerung zwischen 3 und 15 gegenüber dem ursprünglichen Prozessor ermittelt. Obwohl die vorgeschlagene FPGA-Architektur vergleichsweise primitiv ist und keinerlei arithmetische Erweiterungen besitzt, musste dabei, bis auf eine Ausnahme, kein überproportionaler Anstieg der Chipfläche in Kauf genommen werden. Die gewonnen Erkenntnisse zu den Abhängigkeiten zwischen den Architekturparametern, der entwickelte Ablauf für die Exploration und das konkrete Kostenmodell sind essenziell für weitere Verbesserungen der FPGA-Architektur. Die vorliegende Arbeit hat somit erfolgreich den Vorteil der untersuchten Systemarchitektur gezeigt und den Weg für mögliche Erweiterungen und Hardwareimplementierungen geebnet. Zusätzlich wurden eine Reihe von Optimierungen der Architektur und weitere potenziellen Forschungsansätzen aufgezeigt
APA, Harvard, Vancouver, ISO, and other styles
40

Hardyniec, Andrew B. "An Investigation of the Behavior of Structural Systems with Modeling Uncertainties." Diss., Virginia Tech, 2014. http://hdl.handle.net/10919/56635.

Full text
Abstract:
Recent advancements in earthquake engineering have caused a movement toward a probabilistic quantification of the behavior of structural systems. Analysis characteristics, such as ground motion records, material properties, and structural component behavior are defined by probabilistic distributions. The response is also characterized probabilistically, with distributions fitted to analysis results at intensity levels ranging from the maximum considered earthquake ground motion to collapse. Despite the progress toward a probabilistic framework, the variability in structural analysis results due to modeling techniques has not been considered. This work investigates the uncertainty associated with modeling geometric nonlinearities and Rayleigh damping models on the response of planar frames at multiple ground motion intensity levels. First, an investigation is presented on geometric nonlinearity approaches for planar frames, followed by a critical review of current damping models. Three frames, a four-story buckling restrained braced frame, a four-story steel moment resisting frame, and an eight-story steel moment resisting frame, are compared using two geometric nonlinearity approaches and five Rayleigh damping models. Static pushover analyses are performed on the models in the geometric nonlinearities study, and incremental dynamic analyses are performed on all models to compare the response at the design based earthquake ground motion (DBE), maximum considered earthquake ground motion (MCE), and collapse intensity levels. The results indicate noticeable differences in the responses at the DBE and MCE levels and significant differences in the responses at the collapse level. Analysis of the sidesway collapse mechanisms indicates a shift in the behavior corresponding to the different modeling assumptions, though the effects were specific to each frame. The FEMA P-695 Methodology provided a framework that defined the static and dynamic analyses performed during the modeling uncertainties studies. However, the Methodology is complex and the analyses are computationally expensive. To expedite the analyses and manage the results, a toolkit was created that streamlines the process using a set of interconnected modules. The toolkit provides a program that organizes data and reduces mistakes for those familiar with the process while providing an educational tool for novices of the Methodology by stepping new users through the intricacies of the process. The collapse margin ratio (CMR), calculated in the Methodology, was used to compare the collapse behavior of the models in the modeling uncertainties study. Though it provides a simple scalar quantity for comparison, calculation of the CMR typically requires determination of the full set of incremental dynamic analysis curves, which require prohibitively large analysis time for complex models. To reduce the computational cost of calculating the CMR, a new parallel computing method, referred to as the fragility search method, was devised that uses approximate collapse fragility curves to quickly converge on the median collapse intensity value. The new method is shown to have favorable attributes compared to other parallel computing methods for determining the CMR.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
41

Sreeram, Rohan. "Improved Framework for Fast and Efficient Memory-based Frame Data Reconfiguration for Multi-row Spanning Designs on Field Programmable Gate Arrays." DigitalCommons@USU, 2010. https://digitalcommons.usu.edu/etd/682.

Full text
Abstract:
Reconfigurable computing is an evolving paradigm in computer architecture where the ability to load different designs onto a field programmable gate array (FPGA) at execution time has proven useful in adapting FPGA prototypes to a wide range of applications. Reconfiguration techniques can be primarily categorized as Partial Dynamic Reconfiguration (PDR) and Partial Bitstream Relocation (PBR). PDR involves reconfiguring a single Partial Reconfiguration Region (PRR) with a partial bitstream, while PBR is targeted at reconfiguring multiple PRRs on the FPGA with a partial bitstream. Previous techniques have primarily focused on using either slower off-chip memory or on-chip memory-based solutions to store the partial bitstream, and then reconfigure a PRR on the FPGA. Another technique called Accelerated Relocation Circuit (ARC) provides a more efficient method where a PRR (active bitstream) is used to relocate to other PRRs on the fly using minimal on-chip memory. This thesis proposes a novel technique for Memory-based Frame Data Reconfiguration (M-FDR) of multi-row PRRs. ARC hardware was re-architected to provide an improved frame data reconfiguration framework, called Accelerated Memory-based Reconfiguration Circuit (AMRC) for use in MBR scenarios. A performance prediction model is also proposed that confirms the speedup achieved by AMRC, in comparison to ARC and earlier methods. This technique was found to be 26.6% faster than ARC in PRR-PRR relocation. In comparison to other relocation techniques like Bit Relocation Filter (BiRF), AMRC provides a speedup of 231x. The AMRC method was also able to dynamically parallelize multi-row designs with an average context switching time of 0.37 ms.
APA, Harvard, Vancouver, ISO, and other styles
42

Quadri, Imran Rafiq. "MARTE based model driven design methodology for targeting dynamically reconfigurable FPGA based SoCs." Phd thesis, Université des Sciences et Technologie de Lille - Lille I, 2010. http://tel.archives-ouvertes.fr/tel-00486483.

Full text
Abstract:
Les travaux présentés dans cette thèse sont effectuées dans le cadre des Systèmes sur puce (SoC, Systemon Chip) et la conception de systèmes embarqués en temps réel, notamment dédiés au domaine de la reconfiguration dynamique, liés à ces systèmes complexes. Dans ce travail, nous présentons un nouveau flot de conception basé sur l'Ingénierie Dirigée par les Modèles (IDM/MDE) et le profilMARTE pour la conception conjointe du SoC, la spécification et la mise en oeuvre de ces systèmes sur puce reconfigurables, afin d'élever les niveaux d'abstraction et de réduire la complexité du système. La première contribution relative à cette thèse est l'identification des parties de systèmes sur puce reconfigurable dynamiquement qui peuvent être modélisées au niveau d'abstraction élevé. Cette thèse adapte une approche dirigée par l'application et cible les modèles d'application de haut niveau pour être traités comme des régions dynamiques des SoCs reconfigurables. Nous proposons aussi des modèles de contrôle générique pour la gestion de ces régions au cours de l'exécution en temps réel. Bien que cette sémantique puisse être introduite à différents niveaux d'abstraction d'un environnent pour la conception conjointe du SoC, nous insistons tout particulièrement sur sa fusion au niveau du déploiement, qui relie la propriété intellectuelle avec les éléments modélisés à haut niveau de conception. En outre, ces concepts ont été intégrés dans le méta-modèleMARTE et le profil correspondant afin de fournir une extension adéquate pour exprimer les caractéristiques de reconfiguration à la modélisation de haut niveau. La seconde contribution est la proposition d'un méta-modèle intermédiaire, qui isole les concepts présents au niveau transfert de registre (RTL-Register Transfer Level). Ce méta-modèle intègre les concepts chargés de l'exécution matérielle des applications modélisées, tout en enrichissant la sémantique de contrôle, provoquant la création d'un accélérateur matériel reconfigurable dynamiquement avec plusieurs implémentations disponibles. Enfin, en utilisant les transformations de modèlesMDE et les principes correspondants, nous sommes en mesure de générer des codeHDL équivalents à différentes implémentations de l'accélérateur reconfigurable ainsi que différents codes source en langage C/C++ liés au contrôleur de reconfiguration, qui est finalement responsable de la commutation entre les différentes mplémentations. Enfin, notre flot de conception a été vérifié avec succès dans une étude de cas liée à un système anti-radar de détection de collision. Une composante clé intégrante de ce système a été modélisée en utilisant les spécifications MARTE étendu et le code généré a été utilisé dans la conception et la mise en oeuvre d'un SoC sur un FPGA reconfigurable dynamiquement.
APA, Harvard, Vancouver, ISO, and other styles
43

Sinha, Udayan Prabir. "Memory Management Error Detection in Parallel Software using a Simulated Hardware Platform." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-219606.

Full text
Abstract:
Memory management errors in concurrent software running on multi-core architectures can be difficult and costly to detect and repair. Examples of errors are usage of uninitialized memory, memory leaks, and data corruptions due to unintended overwrites of data that are not owned by the writing entity. If memory management errors could be detected at an early stage, for example when using a simulator before the software has been delivered and integrated in a product, significant savings could be achieved. This thesis investigates and develops methods for detection of usage of uninitialized memory in software that runs on a virtual hardware platform. The virtual hardware platform has models of Ericsson Radio Base Station hardware for baseband processing and digital radio processing. It is a bit-accurate representation of the underlying hardware, with models of processors and peripheral units, and it is used at Ericsson for software development and integration. There are tools available, such as Memcheck (Valgrind), and MemorySanitizer and AddressSanitizer (Clang), for memory management error detection. The features of such tools have been investigated, and memory management error detection algorithms were developed for a given processor’s instruction set. The error detection algorithms were implemented in a virtual platform, and issues and design considerations reflecting the application-specific instruction set architecture of the processor, were taken into account. A prototype implementation of memory error presentation with error locations mapped to the source code of the running program, and presentation of stack traces, was done, using functionality from a debugger. An experiment, using a purpose-built test program, was used to evaluate the error detection capability of the algorithms in the virtual platform, and for comparison with the error detection capability of Memcheck. The virtual platform implementation detects all known errors, except one, in the program and reports them to the user in an appropriate manner. There are false positives reported, mainly due to the limited awareness about the operating system used on the simulated processor
Minneshanteringsfel i parallell mjukvara som exekverar på flerkärniga arkitekturer kan vara svåra att detektera, samt kostsamma att åtgärda. Exempel på fel kan vara användning av ej initialiserat minne, minnesläckage, samt att data blir överskrivna av en process som inte är ägare till de data som skrivs över. Om minneshanteringsfel kan detekteras i ett tidigt skede, t ex genom att använda en simulator, som körs innan mjukvaran har levererats och integrerats i en produkt, skulle man kunna erhålla signifikanta kostnadsbesparingar. Detta examensarbete undersöker och utvecklar metoder för detektion av ej initialiserat minne i mjukvara som körs på en virtuell plattform. Den virtuella plattformen innehåller modeller av delar av den digitala hårdvara, för basband och radio, som finns i en Ericsson radiobasstation. Modellerna är bit-exakta representationer av motsvarande hårdvarublock, och innefattar processorer och periferienheter. Den virtuella plattformen används av Ericsson för utveckling och integration av mjukvara. Det finns verktyg, exempelvis Memcheck (Valgrind), samt MemorySanitizer och AddressSanitizer (Clang), som kan användas för att detektera minneshanteringsfel. Egenskaper hos sådana verktyg har undersökts, och algoritmer för detektion av minneshanteringsfel har utvecklats, för en specifik processor och dess instruktioner. Algoritmerna har implementerats i en virtuell plattform, och kravställningar och design-överväganden som speglar den tillämpnings-specifika instruktionsrepertoaren för den valda processorn, har behandlats. En prototyp-implementation av presentation av minneshanteringsfel, där källkodsraderna samt anropsstacken för de platser där fel har hittats pekas ut, har utvecklats, med användning av en debugger. Ett experiment, som använder sig av ett för ändamålet utvecklat program, har använts för att utvärdera feldetektions-förmågan för de algoritmer som implementerats i den virtuella plattformen, samt för att jämföra med feldetektions-förmågan hos Memcheck. De algoritmer som implementerats i den virtuella plattformen kan, för det program som används, detektera alla kända fel, förutom ett. Algoritmerna rapporterar också falska felindikeringar. Dessa rapporter är huvudsakligen ett resultat av att den aktuella implementationen har begränsad kunskap om det operativsystem som används på den simulerade processorn.
APA, Harvard, Vancouver, ISO, and other styles
44

Sanguinet, William Charles. "Various extensions in the theory of dynamic materials with a specific focus on the checkerboard geometry." Digital WPI, 2017. https://digitalcommons.wpi.edu/etd-dissertations/243.

Full text
Abstract:
This work is a numerical and analytical study of wave motion through dynamic materials (DM). This work focuses on showing several results that greatly extend the applicability of the checkerboard focusing effect. First, it is shown that it is possible to simultaneously focus dilatation and shear waves propagating through a linear elastic checkerboard structure. Next, it is shown that the focusing effect found for the original €œperfect€� checkerboard extends to the case of the checkerboard with smooth transitions between materials, this is termed a functionally graded (FG) checkerboard. With the additional assumption of a linear transition region, it is shown that there is a region of existence for limit cycles that takes the shape of a parallelogram in (m,n)-space. Similar to the perfect case, this is termed a €œplateau€� region. This shows that the robustness of the characteristic focusing effect is preserved even when the interfaces between materials are relaxed. Lastly, by using finite volume methods with limiting and adaptive mesh refinement, it is shown that energy accumulation is present for the functionally graded checkerboard as well as for the checkerboard with non-matching wave impedances. The main contribution of this work was to show that the characteristic focusing effect is highly robust and exists even under much more general assumptions than originally made. Furthermore, it provides a tool to assist future material engineers in constructing such structures. To this effect, exact bounds are given regarding how much the original perfect checkerboard structure can be spoiled before losing the expected characteristic focusing behavior.
APA, Harvard, Vancouver, ISO, and other styles
45

Silva, Hamilton Soares da. "Estudo para otimização do algoritmo Non-local means visando aplicações em tempo real." Universidade Federal da Paraí­ba, 2014. http://tede.biblioteca.ufpb.br:8080/handle/tede/5383.

Full text
Abstract:
Made available in DSpace on 2015-05-08T14:59:57Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 3935872 bytes, checksum: 5a4c90590e53b3ea1d71bbe61a628b56 (MD5) Previous issue date: 2014-07-25
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
The aim of this work is to study the non-local means algorithm and propose techniques to optimize and implement this algorithm for its application in real-time. Two alternatives are suggested for implementation. The first deals with the development of an accelerator card for computers, which has a PCI bus containing specialized hardware that implements the NLM filter. The second implementation uses densely GPU multiprocessor environment, which exists in the parent video. Both proposals significantly accelerates the NLM algorithm, while maintains the same visual quality of traditional software implementations, enabling real-time use. Image denoising is an important area for digital image processing. Recently, its use is becoming more popular due to improvements of of the new acquisition equipments and, thus, the increase of image resolution that favors the occurrence of such perturbations. It is widely studied in the fields of image processing, computer vision and predictive maintenance of electrical substations, motors, tires, building facilities, pipes and fittings, focusing on reducing the noise without removing details of the original image. Several approaches have been proposed for filtering noise. One of such approaches is the non-local method called Non-Local Means (NLM), which uses the entire image rather than local information and stands out as the state of the art. However, a problem in this method is its high computational complexity, which turns its application almost impossible in real time applications, even for small images
O propósito deste trabalho é estudar o algoritmo non-local means(NLM) e propor técnicas para otimizar e implementar o referido algoritmo visando sua aplicação em tempo real. Ao todo são sugeridas duas alternativas de implementação. A primeira trata do desenvolvimento de uma placa aceleradora para computadores que possuam Barramento PCI, contendo um hardware especializado que implementa o Filtro NLM. A segunda implementação utiliza o ambiente densamente multiprocessado GPU, existente nas controladoras de vídeo. As duas propostas aceleraram significativamente o algoritmo NLM, mantendo a mesma qualidade visual das implementações tradicionais em software, tornando possível sua utilização em tempo real. A filtragem de ruídos é uma área importante para o processamento digital de imagens, sendo cada vez mais utilizada devido as melhorias dos novos equipamentos de captação, e o consequente aumento da resolução da imagem, que favorece o aparecimento dessas perturbações. Ela é amplamente estudada nos campos de tratamento de imagens, visão computacional e manutenção preditiva de subestações elétricas, motores, pneus, instalações prediais, tubos e conexões, focando em reduzir os ruídos sem que se remova os detalhes da imagem original. Várias abordagens foram propostas para filtragem de ruídos, uma delas é o método não-local, chamado de Non-Local Means (NLM), que não só utiliza as informações locais, mas a imagem inteira, destaca-se como o estado da arte, porém, há um problema neste método, que é a sua alta complexidade computacional, que o torna praticamente inviável de ser utilizado em aplicações em tempo real, até mesmo para imagens pequenas
APA, Harvard, Vancouver, ISO, and other styles
46

Fanfakh, Ahmed Badri Muslim. "Energy consumption optimization of parallel applications with Iterations using CPU frequency scaling." Thesis, Besançon, 2016. http://www.theses.fr/2016BESA2021/document.

Full text
Abstract:
Au cours des dernières années, l'informatique “green” est devenue un sujet important dans le calcul intensif. Cependant, les plates-formes informatiques continuent de consommer de plus en plus d'énergie en raison de l'augmentation du nombre de noeuds qui les composent. Afin de minimiser les coûts d'exploitation de ces plates-formes de nombreuses techniques ont été étudiées, parmi celles-ci, il y a le changement de la fréquence dynamique des processeurs (DVFS en anglais). Il permet de réduire la consommation d'énergie d'un CPU, en abaissant sa fréquence. Cependant, cela augmente le temps d'exécution de l'application. Par conséquent, il faut trouver un seuil qui donne le meilleur compromis entre la consommation d'énergie et la performance d'une application. Cette thèse présente des algorithmes développés pour optimiser la consommation d'énergie et les performances des applications parallèles avec des itérations synchrones et asynchrones sur des clusters ou des grilles. Les modèles de consommation d'énergie et de performance proposés pour chaque type d'application parallèle permettent de prédire le temps d'exécution et la consommation d'énergie d'une application pour toutes les fréquences disponibles.La contribution de cette thèse peut être divisé en trois parties. Tout d'abord, il s'agit d'optimiser le compromis entre la consommation d'énergie et les performances des applications parallèles avec des itérations synchrones sur des clusters homogènes. Deuxièmement, nous avons adapté les modèles de performance énergétique aux plates-formes hétérogènes dans lesquelles chaque noeud peut avoir des spécifications différentes telles que la puissance de calcul, la consommation d'énergie, différentes fréquences de fonctionnement ou encore des latences et des bandes passantes réseaux différentes. L'algorithme d'optimisation de la fréquence CPU a également été modifié en fonction de l'hétérogénéité de la plate-forme. Troisièmement, les modèles et l'algorithme d'optimisation de la fréquence CPU ont été complètement repensés pour prendre en considération les spécificités des algorithmes itératifs asynchrones.Tous ces modèles et algorithmes ont été appliqués sur des applications parallèles utilisant la bibliothèque MPI et ont été exécutés avec le simulateur Simgrid ou sur la plate-forme Grid'5000. Les expériences ont montré que les algorithmes proposés sont plus efficaces que les méthodes existantes. Ils n’introduisent qu’un faible surcoût et ne nécessitent pas de profilage au préalable car ils sont exécutés au cours du déroulement de l’application
In recent years, green computing has become an important topic in the supercomputing research domain. However, the computing platforms are still consuming more and more energy due to the increase in the number of nodes composing them. To minimize the operating costs of these platforms many techniques have been used. Dynamic voltage and frequency scaling (DVFS) is one of them. It can be used to reduce the power consumption of the CPU while computing, by lowering its frequency. However, lowering the frequency of a CPU may increase the execution time of the application running on that processor. Therefore, the frequency that gives the best trade-off between the energy consumption and the performance of an application must be selected.This thesis, presents the algorithms developed to optimize the energy consumption and theperformance of synchronous and asynchronous message passing applications with iterations runningover clusters or grids. The energy consumption and performance models for each type of parallelapplication predicts its execution time and energy consumption for any selected frequency accordingto the characteristics of both the application and the architecture executing this application.The contribution of this thesis can be divided into three parts: Firstly, optimizing the trade-offbetween the energy consumption and the performance of the message passing applications withsynchronous iterations running over homogeneous clusters. Secondly, adapting the energy andperformance models to heterogeneous platforms where each node can have different specificationssuch as computing power, energy consumption, available frequency gears or network’s latency andbandwidth. The frequency scaling algorithm was also modified to suit the heterogeneity of theplatform. Thirdly, the models and the frequency scaling algorithm were completely rethought to takeinto considerations the asynchronism in the communication and computation. All these models andalgorithms were applied to message passing applications with iterations and evaluated over eitherSimGrid simulator or Grid’5000 platform. The experiments showed that the proposed algorithms areefficient and outperform existing methods such as the energy and delay product. They also introducea small runtime overhead and work online without any training or profiling
APA, Harvard, Vancouver, ISO, and other styles
47

Astolfi, Vitor Fiorotto. "ChipCflow - em hardware dinamicamente reconfigurável." Universidade de São Paulo, 2009. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-05032010-203142/.

Full text
Abstract:
Nos últimos anos, houve um grande avanço na computação reconfigurável, em particular em hardware que emprega Field-Programmable Gate Arrays. Porém, esse aumento de capacidade e desempenho aumentou a distância entre a capacidade de projeto e a disponibilidade de tecnologia para o desenvolvimento do projeto. As linguagens de programação imperativas de alto nível, como C, são mais apropriadas para o desenvolvimento de aplicativos complexos que as linguagens de descrição de hardware. Por isso, surgiram diversas ferramentas para o desenvolvimento de hardware a partir de código em C. A ferramenta ChipCflow, da qual faz parte este projeto, é uma delas. A execução dos programas por meio dessa ferramenta será completamente baseada em seu fluxo de dados, seguindo o modelo dinâmico encontrado nas arquiteturas de computadores a fluxo de dados, aproveitando ao máximo o paralelismo considerado natural desse modelo e as características do hardware parcialmente reconfigurável. Neste projeto em particular, o objetivo é a prova de conceito (proof of concept) para a criação de instâncias, em forma de operadores, de um algoritmo ChipCflow em hardware parcialmente reconfigurável, tendo como base a plataforma Virtex da Xilinx
In recent years, reconfigurable computing has become increasingly more advanced, especially in hardware that uses Field-Programmable Gate Arrays. However, the increase of performance in FPGAs accumulated the gap between design capacity and technology for the development of the design. Imperative high-level programming languages such as C are more appropriate for the development of complex algorithms than hardware description languages (HDL). For this reason, many ANSI C-like programming tools for the development of hardware came to existence. The ChipCflow project, of which this project is part, is one of these tools. The execution of algorithms through this tool will be completely directed by data flow, according to the dynamic model found on Dataflow Architectures, taking advantage of its natural high levels of parallelism and the characteristics of the partially reconfigurable hardware. In this project, the objective is a proof of concept for the creation of instances, in the form of operators, of a ChipCflow algorithm on a partially reconfigurable hardware, taking as reference the Xilinx Virtex boards
APA, Harvard, Vancouver, ISO, and other styles
48

Sun, Yi. "High Performance Simulation of DEVS Based Large Scale Cellular Space Models." Digital Archive @ GSU, 2009. http://digitalarchive.gsu.edu/cs_diss/40.

Full text
Abstract:
Cellular space modeling is becoming an increasingly important modeling paradigm for modeling complex systems with spatial-temporal behaviors. The growing demand for cellular space models has directed researchers to use different modeling formalisms, among which Discrete Event System Specification (DEVS) is widely used due to its formal modeling and simulation framework. The increasing complexity of systems to be modeled asks for cellular space models with large number of cells for modeling the systems¡¯ spatial-temporal behavior. Improving simulation performance becomes crucial for simulating large scale cellular space models. In this dissertation, we proposed a framework for improving simulation performance for large scale DEVS-based cellular space models. The framework has a layered structure, which includes modeling, simulation, and network layers corresponding to the DEVS-based modeling and simulation architecture. Based on this framework, we developed methods at each layer to overcome performance issues for simulating large scale cellular space models. Specifically, to increase the runtime and memory efficiency for simulating large number of cells, we applied Dynamic Structure DEVS (DSDEVS) to cellular space modeling and carried out comprehensive performance measurement. DSDEVS improves simulation performance by making the simulation focus only on those active models, and thus be more efficient than when the entire cellular space is loaded. To reduce the number of simulation cycles caused by extensive message passing among cells, we developed a pre-schedule modeling approach that exploits the model behavior for improving simulation performance. At the network layer, we developed a modified time-warp algorithm that supports parallel simulation of DEVS-based cellular space models. The developed methods have been applied to large scale wildfire spread simulations based on the DEVS-FIRE simulation environment and have achieved significant performance results.
APA, Harvard, Vancouver, ISO, and other styles
49

FERNANDEZ, BARRERO DIEGO. "Dynamic Soil-Structure Interactionof Soil-Steel Composite Bridges : A Frequency Domain Approach Using PML Elements and Model Updating." Thesis, KTH, Bro- och stålbyggnad, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-256033.

Full text
Abstract:
This master thesis covers the dynamic soil structure interaction of soil-steel culverts applyinga methodology based on the frequency domain response. At the first stage of this masterthesis, field tests were performed on one bridge using controlled excitation. Then, themethodology followed uses previous research, the field tests, finite element models (FEM)and perfectly matched layer (PML) elements.Firstly, a 2D model of the analysed bridge, Hårestorp, was made to compare the frequencyresponse functions (FRF) with the ones obtained from the field tests. Simultaneously, a 3Dmodel of the bridge is created for the following purposes: compare it against the 2D modeland the field tests, and to implement a model updating procedure with the particle swarmalgorithm to calibrate the model parameters. Both models use PML elements, which areverified against previous solution from the literature. The verification concludes that thePML behave correctly except for extreme parameter values.In the course of this master thesis, relatively advanced computation techniques were requiredto ensure the computational feasibility of the problem with the resources available.To do that, a literature review of theoretical aspects of parallel computing was performed, aswell as the practical aspects in Comsol. Then, in collaboration with Comsol Support and thehelp given by PDC at KTH it was possible to reduce the computational time to a feasiblepoint of around two weeks for the model updating of the 3D model.The results are inconclusive, in terms of searching for a perfectly fitting model. Therefore,further research is required to adequately face the problem. Nevertheless, there are some accelerometerswhich show a considerable level of agreement. This thesis concludes to discardthe 2D models due to their incapability of facing the reality correctly, and establishes a modeloptimisation methodology using Comsol in connection with Matlab.
APA, Harvard, Vancouver, ISO, and other styles
50

Rehme, Koy D. "An Internal Representation for Adaptive Online Parallelization." Diss., CLICK HERE for online access, 2009. http://contentdm.lib.byu.edu/ETD/image/etd2939.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography