Dissertations / Theses: 'Parallel Models'

1

Ramazi, Pouria. "Variance Analysis of Parallel Hammerstein Models." Thesis, KTH, Reglerteknik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-102169.

Full text

Abstract:

In this thesis we generalize some recent results on variance analysis of Hammerstein models. A variance formula for an arbitrary number of parallel blocks is derived. This expression shows that the variance increases in one block due to the estimation of parameters in other blocks but levels off when the number of parameters in other blocks reach the number of parameters in that block. As a second contribution, the problem of how to design the input so that the identification process leads to a more accurate model is considered. In other words, how to choose the input signal so that the model error described previously is minimized, is studied. The investigations show that the optimal input probability density function has a surprisingly simple format. In summary, some of the derived results can be used directly in practice, while some might be used for further research.

APA, Harvard, Vancouver, ISO, and other styles

2

Machado, Rui Mário da Silva. "Massivel y parallel declarative computational models." Doctoral thesis, Universidade de Évora, 2013. http://hdl.handle.net/10174/12063.

Full text

Abstract:

Current computer archictectures are parallel, with an increasing number of processors. Parallel programming is an error-prone task and declarative models such as those based on constraints relieve the programmer from some of its difficult aspects, because they abstract control away. In this work we study and develop techniques for declarative computational models based on constraints using GPI, aiming at large scale parallel execution. The main contributions of this work are: A GPI implementation of a scalable dynamic load balancing scheme based on work stealing, suitable for tree shaped computations and effective for systems with thousands of threads. A parallel constraint solver, MaCS, implemented to take advantage of the GPI programming model. Experimental evaluation shows very good scalability results on systems with hundreds of cores. A GPI parallel version of the Adaptive Search algorithm, including different variants. The study on different problems advances the understanding of scalability issues known to exist with large numbers of cores; ### SUMÁRIO: Actualmente as arquitecturas de computadores são paralelas, com um crescente número de processadores. A programação paralela é uma tarefa propensa a erros e modelos declarativos baseados em restrições aliviam o programador de aspectos difíceis dado que abstraem o controlo. Neste trabalho estudamos e desenvolvemos técnicas para modelos de computação declarativos baseados em restrições usando o GPI, uma ferramenta e modelo de programação recente. O Objectivo é a execução paralela em larga escala. As contribuições deste trabalho são as seguintes: a implementação de um esquema dinâmico para balanceamento da computação baseado no GPI. O esquema é adequado para computações em árvores e efectiva em sistemas compostos por milhares de unidades de computação. Uma abordagem à resolução paralela de restrições denominadas de MaCS, que tira partido do modelo de programação do GPI. A Avaliação experimental revelou boa escalabilidade num sistema com centenas de processadores. Uma versão paralela do algoritmo Adaptive Search baseada no GPI, que inclui diferentes variantes. O estudo de diversos problemas aumenta a compreensão de aspectos relacionados com a escalabilidade e presentes na execução deste tipo de algoritmos num grande número de processadores.

APA, Harvard, Vancouver, ISO, and other styles

3

Farreras, Esclusa Montse. "Optimizing programming models for massively parallel computers." Doctoral thesis, Universitat Politècnica de Catalunya, 2008. http://hdl.handle.net/10803/31776.

Full text

Abstract:

Since the invention of the transistor, clock frequency increase was the primary method of improving computing performance. As the reach of Moore's law came to an end, however, technology driven performance gains became increasingly harder to achieve, and the research community was forced to come up with innovative system architectures. Today increasing parallelism is the primary method of improving performance: single processors are being replaced by multiprocessor systems and multicore architectures. The challenge faced by computer architects is to increase performance while limited by cost and power consumption. The appearance of cheap and fast interconnection networks has promoted designs based on distributed memory computing. Most modern massively parallel computers, as reflected by the Top 500 list, are clusters of workstations using commodity processors connected by high speed interconnects. Today's massively parallel systems consist of hundreds of thousands of processors. Software technology to program these large systems is still in its infancy. Optimizing communication has become a key to overall system performance. To cope with the increasing burden of communication, the following methods have been explored: (i) Scalability in the messaging system: The messaging system itself needs to scale up to the 100K processor range. (ii) Scalable algorithms reducing communication: As the machine grows in size the amount of communication also increases, and the resulting overhead negatively impacts performance. New programming models and algorithms allow programmers to better exploit locality and reduce communication. (iii) Speed up communication: reducing and hiding communication latency, and improving bandwidth. Following the three items described above, this thesis contributes to the improvement of the communication system (i) by proposing a scalable memory management of the communication system, that guarantees the correct reception of data and control-data, (ii) by proposing a language extension that allows programmers to better exploit data locality to reduce inter-node communication, and (iii) by presenting and evaluating a cache of remote addresses that aims to reduce control-data and exploit the RDMA native network capabilities, resulting in latency reduction and better overlap of communication and computation. Our contributions are analyzed in two different parallel programming models: Message Passing Interface (MPI) and Unified Parallel C (UPC). Many different programing models exist today, and the programmer usually needs to choose one or another depending on the problem and the machine architecture. MPI has been chosen because it is the de facto standard for parallel programming in distributed memory machines. UPC was considered because it constitutes a promising easy-to-use approach to parallelism. Since parallelism is everywhere, programmability is becoming important and languages such as UPC are gaining attention as a potential future of high performance computing. Concerning the communication system, the languages chosen are relevant because, while MPI offers two-sided communication, UPC relays on a one-sided communication model. This difference potentially influences the communication system requirements of the language. These requirements as well as our contributions are analyzed and discussed for both programming models and we state whether they apply to both programming models.

APA, Harvard, Vancouver, ISO, and other styles

4

Knottenbelt, William John. "Parallel performance analysis of large Markov models." Thesis, Imperial College London, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.394536.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Alt, Aaron J. "Profile Driven Partitioning Of Parallel Simulation Models." University of Cincinnati / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1407406955.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Ravela, Srikar Chowdary. "Comparison of Shared memory based parallel programming models." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-3384.

Full text

Abstract:

Parallel programming models are quite challenging and emerging topic in the parallel computing era. These models allow a developer to port a sequential application on to a platform with more number of processors so that the problem or application can be solved easily. Adapting the applications in this manner using the Parallel programming models is often influenced by the type of the application, the type of the platform and many others. There are several parallel programming models developed and two main variants of parallel programming models classified are shared and distributed memory based parallel programming models. The recognition of the computing applications that entail immense computing requirements lead to the confrontation of the obstacle regarding the development of the efficient programming models that bridges the gap between the hardware ability to perform the computations and the software ability to support that performance for those applications [25][9]. And so a better programming model is needed that facilitates easy development and on the other hand porting high performance. To answer this challenge this thesis confines and compares four different shared memory based parallel programming models with respect to the development time of the application under a shared memory based parallel programming model to the performance enacted by that application in the same parallel programming model. The programming models are evaluated in this thesis by considering the data parallel applications and to verify their ability to support data parallelism with respect to the development time of those applications. The data parallel applications are borrowed from the Dense Matrix dwarfs and the dwarfs used are Matrix-Matrix multiplication, Jacobi Iteration and Laplace Heat Distribution. The experimental method consists of the selection of three data parallel bench marks and developed under the four shared memory based parallel programming models considered for the evaluation. Also the performance of those applications under each programming model is noted and at last the results are used to analytically compare the parallel programming models. Results for the study show that by sacrificing the development time a better performance is achieved for the chosen data parallel applications developed in Pthreads. On the other hand sacrificing a little performance data parallel applications are extremely easy to develop in task based parallel programming models. The directive models are moderate from both the perspectives and are rated in between the tasking models and threading models.
From this study it is clear that threading model Pthreads model is identified as a dominant programming model by supporting high speedups for two of the three different dwarfs but on the other hand the tasking models are dominant in the development time and reducing the number of errors by supporting high growth in speedup for the applications without any communication and less growth in self-relative speedup for the applications involving communications. The degrade of the performance by the tasking models for the problems based on communications is because task based models are designed and bounded to execute the tasks in parallel without out any interruptions or preemptions during their computations. Introducing the communications violates the purpose and there by resulting in less performance. The directive model OpenMP is moderate in both aspects and stands in between these models. In general the directive models and tasking models offer better speedup than any other models for the task based problems which are based on the divide and conquer strategy. But for the data parallelism the speedup growth however achieved is low (i.e. they are less scalable for data parallel applications) are equally compatible in execution times with threading models. Also the development times are considerably low for data parallel applications this is because of the ease of development supported by those models by introducing less number of functional routines required to parallelize the applications. This thesis is concerned about the comparison of the shared memory based parallel programming models in terms of the speedup. This type of work acts as a hand in guide that the programmers can consider during the development of the applications under the shared memory based parallel programming models. We suggest that this work can be extended in two different ways: one is from the developer‘s perspective and the other is a cross-referential study about the parallel programming models. The former can be done by using a similar study like this by a different programmer and comparing this study with the new study. The latter can be done by including multiple data points in the same programming model or by using a different set of parallel programming models for the study.
C/O K. Manoj Kumar; LGH 555; Lindbloms Vägan 97; 37233; Ronneby. Phone no: 0738743400 Home country phone no: +91 9948671552

APA, Harvard, Vancouver, ISO, and other styles

7

Ciesko, Jan. "On algorithmic reductions in task-parallel programming models." Doctoral thesis, Universitat Politècnica de Catalunya, 2017. http://hdl.handle.net/10803/457566.

Full text

Abstract:

Wide adoption of parallel processing hardware in mainstream computing as well as the interest for efficient parallel programming in developer communities increase the demand for programming models that offer support for common algorithmic patterns. An algorithmic pattern of particular interest are reductions. Reductions are iterative memory updates of a program variable and appear in many applications. While their definition is simple, their variety of implementations including the use of different loop constructs and calling patterns makes their support in parallel programming models difficult. Further, their characteristic update operation over arbitrary data types that requires atomicity makes their execution computationally expensive and scalable execution challenging. These challenges and their relevance makes reductions a benchmark for compilers, runtime systems and hardware architectures today. This work advances research on algorithmic reductions. It improves their programmability by adding support for task-parallel and array-type reductions. Task-parallel reductions occur in while-loops and recursive algorithms. While for each recursive algorithm an iterative formulation exists, while-loop programs represent a super class of for-loop computable programs and therefore cannot be transformed or substituted. This limitation requires an explicit support for reduction algorithms that fall within this class. Since tasks are suited for a concurrent formulation of these algorithms, the presented work focuses on language extension to the task construct in OmpSs and OpenMP. In the first section of this work we present a generic support for task-parallel reductions in OmpSs and OpenMP and introduce the ideas of reduction scope, reduction domains and static and on-demand memory allocation. With this foundation and the feedback received from the OpenMP language review board, we develop a formalized proposal to add support for task-parallel reductions in OpenMP. This engagement led to a fruitful outcome as our proposal has been accepted into OpenMP recently. As a first step towards support of array-type reduction in a task-parallel programming model, we present a landscape of support techniques and group them by their underlying strategy. Techniques follow either the strategy of direct access (atomics), redirection or iteration ordering. We call techniques that implement redirection into thread-private data containers as techniques with alternative memory layouts (AMLs) and techniques that are based on iteration ordering as techniques with alternative iteration space (AIS). A universal support of AML-based techniques in parallel programming models can be achieved by defining basic interface methods allocate, get and reduce. As examples for new techniques that implement this interface, we present CachedPrivate and PIBOR. CachedPrivate implements a software cache to reduce communication caused by irregular accesses to remote nodes on distributed memory systems. PIBOR implements Privatization with In-lined Block-ordering, a technique that improves data locality by redirecting accesses into thread-local bins. Both techniques implement a get-method that returns a private memory storage for each update operation of the reduction loop. As an example of a technique with an alternative iteration space (AIS), we present Commutative Reductions (ComRed). This technique uses an inspector-executor execution model to generate knowledge about memory access patterns and memory overlaps between participating tasks. This information is used during the execution phase to schedule tasks with overlaps commutatively. We show that this execution model requires only a small set of additional language constructs. Performance results obtained throughout different Chapters of this work demonstrate that software techniques can improve application performance by a factor of 2-4.
La amplia adopción de hardware de procesamiento paralelo para la computación de propósito general, así como el interés por una programación paralela eficiente en la comunidad de desarrolladores, han aumentado la demanda de modelos de programación que ofrezcan soporte para patrones algorítmicos comunes. Un patrón algorítmico de particular interés son las reducciones. Las reducciones son actualizaciones iterativas de memoria de una variable del programa y aparecen en muchas aplicaciones. Aunque su definición es simple, su variedad de implementaciones, incluyendo el uso de diferentes construcciones de bucle y patrones de llamada, hace que su soporte en los modelos de programación paralelos sea difícil y requiera un cuidadoso diseño en lo que respecta a programabilidad, transparencia y rendimiento. Además, la necesidad de atomicidad en la ejecución de estas operaciones hace que sean costosas desde el punto de vista computacional y difícilmente escalables. Estos desafíos y su relevancia convierten a esta clase de operaciones en una referencia para medir el rendimiento de compiladores, sistemas en tiempo de ejecución y arquitecturas de hardware actuales. Impulsados por la necesidad de disponer de una implementación eficiente en nuestro modelo de programación paralelo, hemos desarrollado nuevas ideas que presentamos en este trabajo. Nuestras contribuciones son las siguientes: en primer lugar, añadimos soporte para reducciones de tareas paralelas (para bucles while y funciones recursivas) en el modelo de programación OmpSs y desarrollamos una propuesta para su inclusión en la especificación de OpenMP. En segundo lugar, desarrollamos nuevas técnicas para acelerar las reducciones irregulares y casi-regulares de tipo array y evaluamos su impacto mediante diferentes aplicaciones en varias arquitecturas. En tercer lugar, mostramos cómo estas técnicas pueden ser soportadas en OmpSs y OpenMP. Asimismo, mostramos que las reducciones se benefician de sistemas en tiempo de ejecución inteligentes implementando un esquema inspector-ejecutor. Nuestra propuesta de reducción de tareas paralelas ha sido aceptada recientemente en el estándar OpenMP.

APA, Harvard, Vancouver, ISO, and other styles

8

Crone, Gilia Cornelia. "Parallel Lagrangian models for turbulent transport and chemistry." [S.l.] : Utrecht : [s.n.] ; Universiteitsbibliotheek Utrecht [Host], 1997. http://www.ubu.ruu.nl/cgi-bin/grsn2url?01763357.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Holliman, Nicolas S. "Visualising solid models : an exercise in parallel programming." Thesis, University of Leeds, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.277611.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

McLaughlin, Jared D. "Parallel Processing of Reactive Transport Models Using OpenMP." Diss., CLICK HERE for online access, 2008. http://contentdm.lib.byu.edu/ETD/image/etd2328.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Navarro, Guerrero Cristóbal Alejandro. "Parallel methods for classical and disordered Spin models." Tesis, Universidad de Chile, 2015. http://repositorio.uchile.cl/handle/2250/136491.

Full text

Abstract:

Doctor en Ciencias, Mención Computación
En las últimas décadas han crecido la cantidad de trabajos que buscan encontrar metodos eficientes que describan el comportamiento macroscópico de los sistemas de spin, a partir de una definición microscópica. Los resultados que se obtienen de estos sistemas no solo sirven a la comunidad fı́sica, sino también a otras áreas como dinámica molecular, redes sociales o problemas de optimización, entre otros. El hecho de que los sistemas de spin puedan explicar fenómenos de otras áreas ha generado un interés global en el tema. El problema es, sin embargo, que el costo computacional de los métodos involucrados llega a ser muy alto para fines prácticos. Por esto, es de gran interés estudiar como la computación paralela, combinada con nuevas estrategias algorı́tmicas, puede generar una mejora en velocidad y eficiencia sobre los metodos actuales. En esta tesis se presentan dos contribuciones; (1) un algoritmo exacto multi-core distribuido de tipo transfer matrix y (2) un método Monte Carlo multi-GPU para la sim- ulación del modelo 3D Random Field Ising Model (RFIM). La primera contribución toma ventaja de las relaciones jerárquicas encontradas en el espacio de configuraciones del problema para agruparlas en árboles de familias que se solucionan en paralelo. La segunda contribución extiende el método Exchange Monte Carlo como un algoritmo paralelo multi-GPU que in- cluye una fase de adaptación de temperaturas para mejorar la calidad de la simulación en las zonas de temperatura mas complejas de manera dinámica. Los resultados muestran que el nuevo algoritmo de transfer matrix reduce el espacio de configuraciones desde O(4^m ) a O(3^m ) y logra un fixed-size speedup casi lineal con aproxi- madamente 90% de eficiencia al solucionar los problemas de mayor tamaño. Para el método multi-GPU Monte Carlo, se proponen dos niveles de paralelismo; local, que escala con GPUs mas rápidas y global, que escala con múltiples GPUs. El método logra una aceleración de entre uno y dos ordenes de magnitud respecto a una implementación de referencia en CPU, y su paralelismo escala con aproximadamente 99% de eficiencia. La estrategia adaptativa de distribución de temperaturas incrementa la taza de intercambio en las zonas que estaban mas comprometidas sin aumentar la taza en el resto de las zonas, generando una simulación mas rápida aun y de mejor calidad a que si se usara una distribución uniforme de temperaturas. Las contribuciones logradas han permitido obtener nuevos resultados para el área de la fı́sica, como el calculo de la matriz transferencia para el kagome lattice en m = 9 y la simulación del modelo 3D Random Field Ising Model en L = {32, 64}.

APA, Harvard, Vancouver, ISO, and other styles

12

Hymel, Shawn. "Massively Parallel Hidden Markov Models for Wireless Applications." Thesis, Virginia Tech, 2011. http://hdl.handle.net/10919/36017.

Full text

Abstract:

Cognitive radio is a growing field in communications which allows a radio to automatically configure its transmission or reception properties in order to reduce interference, provide better quality of service, or allow for more users in a given spectrum. Such processes require several complex features that are currently being utilized in cognitive radio. Two such features, spectrum sensing and identification, have been implemented in numerous ways, however, they generally suffer from high computational complexity. Additionally, Hidden Markov Models (HMMs) are a widely used mathematical modeling tool used in various fields of engineering and sciences. In electrical and computer engineering, it is used in several areas, including speech recognition, handwriting recognition, artificial intelligence, queuing theory, and are used to model fading in communication channels. The research presented in this thesis proposes a new approach to spectrum identification using a parallel implementation of Hidden Markov Models. Algorithms involving HMMs are usually implemented in the traditional serial manner, which have prohibitively long runtimes. In this work, we study their use in parallel implementations and compare our approach to traditional serial implementations. Timing and power measurements are taken and used to show that the parallel implementation can achieve well over 100Ã speedup in certain situations. To demonstrate the utility of this new parallel algorithm using graphics processing units (GPUs), a new method for signal identification is proposed for both serial and parallel implementations using HMMs. The method achieved high recognition at -10 dB Eb/N0. HMMs can benefit from parallel implementation in certain circumstances, specifically, in models that have many states or when multiple models are used in conjunction.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

13

Eller, Paul Ray. "Development and Acceleration of Parallel Chemical Transport Models." Thesis, Virginia Tech, 2009. http://hdl.handle.net/10919/34044.

Full text

Abstract:

Improving chemical transport models for atmospheric simulations relies on future developments of mathematical methods and parallelization methods. Better mathematical methods allow simulations to more accurately model realistic processes and/or to run in a shorter amount of time. Parellization methods allow simulations to run in much shorter amounts of time, therefore allowing scientists to use more accurate or more detailed simulations (higher resolution grids, smaller time steps).

The state-of-the-science GEOS-Chem model is modified to use the Kinetic Pre-Processor, giving users access to an array of highly efficient numerical integration methods and to a wide variety of user options. Perl parsers are developed to interface GEOS-Chem with KPP in addition to modifications to KPP allowing KPP integrators to interface with GEOS-Chem. A variety of different numerical integrators are tested on GEOS-Chem, demonstrating that KPP provided chemical integrators produce more accurate solutions in a given amount of time than the original GEOS-Chem chemical integrator.

The STEM chemical transport model provides a large scale end-to-end application to experiment with running chemical integration methods and transport methods on GPUs. GPUs provide high computational power at a fairly cheap cost. The CUDA programming environment simplifies the GPU development process by providing access to powerful functions to execute parallel code. This work demonstrates the accleration of a large scale end-to-end application on GPUs showing significant speedups. This is achieved by implementing all relevant kernels on the GPU using CUDA. Nevertheless, further improvements to GPUs are needed to allow these applications to fully exploit the power of GPUs.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

14

Watson, Paul. "The parallel reduction of lambda calculus expressions." Thesis, University of Manchester, 1986. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.377690.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Peacock, Christopher. "Simultaneous engineering models for fault tolerant integrated circuits." Thesis, University of Hertfordshire, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.361260.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Heggarty, Jonathan W. "Parallel R-matrix computation." Thesis, Queen's University Belfast, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.287468.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Ali, Akhtar. "Comparative study of parallel programming models for multicore computing." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-94296.

Full text

Abstract:

Shared memory multi-core processor technology has seen a drastic developmentwith faster and increasing number of processors per chip. This newarchitecture challenges computer programmers to write code that scales overthese many cores to exploit full computational power of these machines.Shared-memory parallel programming paradigms such as OpenMP and IntelThreading Building Blocks (TBB) are two recognized models that offerhigher level of abstraction, shields programmers from low level detailsof thread management and scales computation over all available resources.At the same time, need for high performance power-ecient computing iscompelling developers to exploit GPGPU computing due to GPU's massivecomputational power and comparatively faster multi-core growth. Thistrend leads to systems with heterogeneous architectures containing multicoreCPUs and one or more programmable accelerators such as programmableGPUs. There exist dierent programming models to program these architecturesand code written for one architecture is often not portable to anotherarchitecture. OpenCL is a relatively new industry standard framework, de-ned by Khronos group, which addresses the portability issue. It oers aportable interface to exploit the computational power of a heterogeneous setof processors such as CPUs, GPUs, DSP processors and other accelerators. In this work, we evaluate the eectiveness of OpenCL for programmingmulti-core CPUs in a comparative case study with two CPU specic stableframeworks, OpenMP and Intel TBB, for ve benchmark applicationsnamely matrix multiply, LU decomposition, image convolution, Pi value approximationand image histogram generation. The evaluation includes aperformance comparison of the three frameworks and a study of the relativeeects of applying compiler optimizations on performance numbers.OpenCL performance on two vendor-dependent platforms Intel and AMD,is also evaluated. Then the same OpenCL code is ported to a modern GPUand its code correctness and performance portability is investigated. Finally,usability experience of coding using the three multi-core frameworksis presented.

APA, Harvard, Vancouver, ISO, and other styles

18

Zabala, Eugenio. "Data presentation models and their application to parallel computing." Thesis, University of York, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.358352.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Sotiropoulos, Pesiridis Konstantinos. "Parallel Simulation of SystemC Loosely-Timed Transaction Level Models." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-227806.

Full text

Abstract:

Parallelizing the development cycles of hardware and software is becoming the industry’s norm for reducing time to market for electronic devices. In the absence of hardware, software development is based on a virtual platform; a fully functional software model of a system under development, able to execute unmodified code. A Transaction Level Model, expressed with the SystemC TLM 2.0 language, is one of the many possible ways for constructing a virtual platform. Under SystemC’s simulation engine, hardware and software is being co-simulated. However, the sequential nature of the reference implementation of the SystemC’s simulation kernel, is a limiting factor. Poor simulation performance often constrains the scope and depth of the design decisions that can be evaluated. It is the main objective of this thesis’ project to demonstrate the feasibility of parallelizing the co-simulation of hardware and software using Transaction Level Models, outside SystemC’s reference simulation environment. The major obstacle identified is the preservation of causal relations between simulation events. The solution is obtained by using the process synchronization mechanism known as the Chandy/Misra/Bryantt algorithm. To demonstrate our approach and evaluate under which conditions a speedup can be achieved, we use the model of a cache-coherent, symmetric multiprocessor executing a synthetic application. Two versions of the model are used for the comparison; the parallel version, based on the Message Passing Interface 3.0, which incorporates the synchronization algorithm and an equivalent sequential model based on SystemC TLM 2.0. Our results indicate that by adjusting the parameters of the synthetic application, a certain threshold is reached, above which a significant speedup against the sequential SystemC simulation is observed. Although performed manually, the transformation of a SystemC TLM 2.0 model into a parallel MPI application is deemed feasible.

APA, Harvard, Vancouver, ISO, and other styles

20

Srivastava, Anurag. "Stabilized Explicit Time Integration for Parallel Air Quality Models." Thesis, Virginia Tech, 2006. http://hdl.handle.net/10919/34736.

Full text

Abstract:

Air Quality Models are defined for prediction and simulation of air pollutant concentrations over a certain period of time. The predictions can be used in setting limits for the emission levels of industrial facilities. The input data for the air quality models are very large and encompass various environmental conditions like wind speed, turbulence, temperature and cloud density.

Most air quality models are based on advection-diffusion equations. These differential equations are moderately stiff and require appropriate techniques for fast integration over large intervals of time. Implicit time stepping techniques for solving differential equations being unconditionally stable are considered suitable for the solution. However, implicit time stepping techniques impose certain data dependencies that can cause the parallelization of air quality models to be inefficient.

The current approach uses Runge Kutta Chebyshev explicit method for solution of advection diffusion equations. It is found that even if the explicit method used is computationally more expensive in the serial execution, it takes lesser execution time when parallelized because of less complicated data dependencies presented by the explicit time-stepping. The implicit time-stepping on the other hand cannot be parallelized efficiently because of the inherent complicated data dependencies.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

21

Ortega-Arjona, J. L. "Architectural patterns for Parallel Programming : models for performance estimation." Thesis, University College London (University of London), 2007. http://discovery.ucl.ac.uk/1444499/.

Full text

Abstract:

Parallel Programming relies on the coordination of computing resources, so that they simultaneously work towards a common objective. Achieving this requires extra effort from the software designer, because of the increased complexity involved. Furthermore, as Parallel Programming is considered a means to improve performance, the software designer has to consider sophisticated and cost-effective practices and techniques for performance measurement and analysis. In particular, it is of great interest to obtain performance information during design stages and before implementation, since this enables the software developer to select the organisation of computations and communications between components. The Architectural Performance Modelling Method is presented as a criteria for selecting the organisation of a parallel program based on estimating its probable per formance. By considering a parallel program as an instance of a software architecture, it can be described in terms of interacting software components. Such components can be classified depending on their particular objective and their rate of change, for example, as components associated with the hardware and software environment (or Platform), components representing the fundamental structural organisation for execution and communication (or Coordination), and so on. The performance of a parallel program can be estimated as the result of the contribution of each one of those kinds of components. An Architectural Performance Model is based on selecting from the Architectural Patterns for Parallel Programming (descriptions of coodinations commonly used in Parallel Programming), a component simulator (representing a simulation of a processing component's behaviour), and a performance analysis of parallel applications (in which the information on system performance is examined). Parallel programs simulated using the Architectural Performance Modelling Method range from a complete parallel pro gram to a partially implemented program design. The simulation of parallel systems, using the information about the problem to be solved, the available resources, and architectural patterns describing overall coordinations of the parallel programs, makes it possible to identify the best performing architectural solution for the system being built.

APA, Harvard, Vancouver, ISO, and other styles

22

Chan, Lai-Wan. "Adaptive and invariant connectionist models for pattern recognition." Thesis, University of Cambridge, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.238206.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Sarrafan, Amir Mansour. "Transputer models for high-performance bridges in local area networks." Thesis, University of Kent, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.278225.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Nestmann, Markus. "Erstellung einer einheitlichen Taxonomie für die Programmiermodelle der parallelen Programmierung." Bachelor's thesis, Universitätsbibliothek Chemnitz, 2017. http://nbn-resolving.de/urn:nbn:de:bsz:ch1-qucosa-224238.

Full text

Abstract:

Durch die parallele Programmierung wird ermöglicht, dass Programme nebenläufig auf mehreren CPU-Kernen oder CPUs ausgeführt werden können. Um das parallele Programmieren zu erleichtern, wurden diverse Sprachen (z.B. Erlang) und Bibliotheken (z.B. OpenMP) aufbauend auf parallele Programmiermodelle (z.B. Parallel Random Access Machine) entwickelt. Möchte z.B. ein Softwarearchitekt sich in einem Projekt für ein Programmiermodell entscheiden, muss er dabei auf mehrere wichtige Kriterien (z.B. Abhängigkeiten zur Hardware) achten. erleichternd für diese Suche sind Übersichten, die die Programmiermodelle in diesen Kriterien unterscheiden und ordnen. Werden existierenden Übersichten jedoch betrachtet, finden sich Unterschiede in der Klassifizierung, den verwendeten Begriffen und den aufgeführten Programmiermodellen. Diese Arbeit begleicht dieses Defizit, indem zuerst durch ein Systematic Literature Review die existierenden Taxonomien gesammelt und analysiert werden. Darauf aufbauend wird eine einheitliche Taxonomie erstellt. Mit dieser Taxonomie kann eine Übersicht über die parallelen Programmiermodelle erstellt werden. Diese Übersicht wird zusätzlich durch Informationen zu den jeweiligen Abhängigkeiten der Programmiermodelle zu der Hardware-Architektur erweitert werden. Der Softwarearchitekt (oder Projektleiter, Softwareentwickler,...) kann damit eine informierte Entscheidung treffen und ist nicht gezwungen alle Programmiermodelle einzeln zu analysieren.

APA, Harvard, Vancouver, ISO, and other styles

25

Ngo, Ton Anh. "The role of performance models in parallel programming and languages /." Thesis, Connect to this title online; UW restricted, 1997. http://hdl.handle.net/1773/6990.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Zhao, Haixiang. "Artificial Intelligence Models for Large Scale Buildings Energy Consumption Analysis." Phd thesis, Ecole Centrale Paris, 2011. http://tel.archives-ouvertes.fr/tel-00658767.

Full text

Abstract:

The energy performance in buildings is influenced by many factors, such as ambient weather conditions, building structure and characteristics, occupancy and their behaviors, the operation of sub-level components like Heating, Ventilation and Air-Conditioning (HVAC) system. This complex property makes the prediction, analysis, or fault detection/diagnosis of building energy consumption very difficult to accurately and quickly perform. This thesis mainly focuses on up-to-date artificial intelligence models with the applications to solve these problems. First, we review recently developed models for solving these problems, including detailed and simplified engineering methods, statistical methods and artificial intelligence methods. Then we simulate energy consumption profiles for single and multiple buildings, and based on these datasets, support vector machine models are trained and tested to do the prediction. The results from extensive experiments demonstrate high prediction accuracy and robustness of these models. Second, Recursive Deterministic Perceptron (RDP) neural network model is used to detect and diagnose faulty building energy consumption. The abnormal consumption is simulated by manually introducing performance degradation to electric devices. In the experiment, RDP model shows very high detection ability. A new approach is proposed to diagnose faults. It is based on the evaluation of RDP models, each of which is able to detect an equipment fault.Third, we investigate how the selection of subsets of features influences the model performance. The optimal features are selected based on the feasibility of obtaining them and on the scores they provide under the evaluation of two filter methods. Experimental results confirm the validity of the selected subset and show that the proposed feature selection method can guarantee the model accuracy and reduces the computational time.One challenge of predicting building energy consumption is to accelerate model training when the dataset is very large. This thesis proposes an efficient parallel implementation of support vector machines based on decomposition method for solving such problems. The parallelization is performed on the most time-consuming work of training, i.e., to update the gradient vector f. The inner problems are dealt by sequential minimal optimization solver. The underlying parallelism is conducted by the shared memory version of Map-Reduce paradigm, making the system particularly suitable to be applied to multi-core and multiprocessor systems. Experimental results show that our implementation offers a high speed increase compared to Libsvm, and it is superior to the state-of-the-art MPI implementation Pisvm in both speed and storage requirement.

APA, Harvard, Vancouver, ISO, and other styles

27

Wahlén, Niklas. "A Comparison of Different Parallel Programming Models for Multicore Processors." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-26320.

Full text

Abstract:

As computers are used in most areas today improving their performance is of great importance. Until recently a faster processor was the main contributor to the increase of overall computer speed. Today the situation has changed as heating is becoming a bigger problem. Running a processor faster requires more power which also leads to the processor's components getting warmer. A solution to this is to use several somewhat slower processors in the same computer, so called multiprocessor or multicore processor. That way programs can execute on different processors, or functionality of one program can be divided and run on several processors simultaneously. Programming for multicore architectures is however more complex than programming for computers with a single processor, as data in the memory now can be accessed by several instances, called threads, of a program at the same time. This calls for some kind of synchronization between such threads. Many different models are available to simplify the implementation procedure of programs for multicore computers, and such models are compared in this thesis. The models in question are Pthreads, OpenMP and Cilk++. The models differ from each other in many ways, and are found to be useful for different areas. While Pthreads is a good tool when one wants to expose the threading mechanisms and be sure to have high exibility, OpenMP and Cilk++ offer simpler interfaces. OpenMP's main strengths are its interface and good portability. Cilk++ is suitable when high performance is the most important aspect.

APA, Harvard, Vancouver, ISO, and other styles

28

Turner, Adrian Charles. "Parallel sampling and integrating as bases for models of hearing." Thesis, Lancaster University, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.296971.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Schnorr, Lucas Mello. "Some visualization models applied to the analysis of parallel applications." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2009. http://hdl.handle.net/10183/37179.

Full text

Abstract:

Les systèmes distribués, tels que les grilles, sont utilisés aujourd’hui pour l’exécution des grandes applications parallèles. Quelques caractéristiques de ces systèmes sont l’interconnexion complexe de ressources qui pourraient être présent et de la facile passage à l’échelle. La complexité d’interconnexion vient, par exemple, d’un nombre plus grand de directives de routage pour la communication entre les processus et une latence variable dans le temps. La passage à l’échelle signifie que des ressources peuvent être ajoutées indéfiniment simplement en les reliant à l’infrastructure existante. Ces caractéristiques influencent directement la façon dont la performance des applications parallèles doit être analysée. Les techniques de visualisation traditionnelles pour cette analyse sont généralement basées sur des diagrammes de Gantt que disposent la liste des composants de l’application verticalement et metent la ligne du temps sur l’axe horizontal. Ces représentations visuelles ne sont généralement pas adaptés à l’analyse des applications exécutées en parallèle dans les grilles. La première raison est qu’elles n’ont pas été conçues pour offrir aux développeurs une analyse qui montre aussi la topologie du réseau des ressources. La deuxième raison est que les techniques de visualisation traditionnels ne s’adaptent pas bien quand des milliers d’entités doivent être analysés ensemble. Cette thèse tente de résoudre les problèmes des techniques traditionnelles dans la visualisation des applications parallèles. L’idée principale est d’exploiter le domaine de la visualisation de l’information et essayer d’appliquer ses concepts dans le cadre de l’analyse des programmes parallèles. Portant de cette idée, la thèse propose deux modèles de visualisation : les trois dimensions et le modèle d’agrégation visuelle. Le premier peut être utilisé pour analyser les programmes parallèles en tenant compte de la topologie du réseau. L’affichage lui-même se compose de trois dimensions, où deux sont utilisés pour indiquer la topologie et la troisième est utilisée pour représenter le temps. Le second modèle peut être utilisé pour analyser des applications parallèles comportant un très grand nombre de processsus. Ce deuxième modèle exploite une organisation hiérarchique des données utilisée par une technique appelée Treemap pour représenter visuellement la hiérarchie. Les implications de cette thèse sont directement liées à l’analyse et la compréhension des applications parallèles exécutés dans les systèmes distribués. Elle améliore la compréhension des modes de communication entre les processus et améliore la possibilité d’assortir les motifs avec cette topologie de réseau réel sur des grilles. Bien que nous utilisons abondamment l’exemple de la topologie du réseau, l’approche pourrait être adapté, avec presque pas de changements, à l’interconnexion fourni par un middleware d’une interconnexion logique. Avec la technique d’agrégation, les développeurs sont en mesure de rechercher des patterns et d’observer le comportement des applications à grande échelle.
Sistemas distribuídos tais como grids são usados hoje para a execução de aplicações paralelas com um grande número de processos. Algumas características desses sistemas são a presença de uma complexa rede de interconexão e a escalabilidade de recursos. A complexidade de rede vem, por exemplo, de largura de banda e latências variáveis ao longo do tempo. Escalabilidade é a característica pela qual novos recursos podem ser adicionados em um grid apenas através da conexão em uma infraestrutura pré-existente. Estas características influenciam a forma como o desempenho de aplicações paralelas deve ser analisado. Esquemas tradicionais de visualização de desempenho são usualmente baseados em gráficos Gantt com uma dimensão para listar entidades monitoradas e outra para o tempo. Visualizações como essa não são apropriadas para a análise de aplicações paralelas executadas em grid. A primeira razão para tal é que elas não foram concebidas para oferecer ao desenvolvedor uma análise que mostra a topologia dos recursos e a relação disso com a aplicação. A segunda razão é que técnicas tradicionais não são escaláveis quando milhares de entidades monitoradas devem ser analisadas conjuntamente. Esta tese tenta resolver estes problemas encontrados em técnicas de visualização tradicionais para a análise de aplicações paralelas. A idéia principal consiste em explorar técnicas da área de visualização da informação e aplicá-las no contexto de análise de programas paralelos. Levando em conta isto, esta tese propõe dois modelos de visualização: o de três dimensões e o modelo de agregação visual. O primeiro pode ser utilizado para analisar aplicações levando-se em conta a topologia da rede dos recursos. A visualização em si é composta por três dimensões, onde duas são usadas para mostrar a topologia e a terceira é usada para representar o tempo. O segundo modelo pode ser usado para analisar aplicações paralelas com uma grande quantidade de processos. Ela explora uma organização hierárquica dos dados de monitoramento e uma técnica de visualização chamada Treemap para representar visualmente a hierarquia. Os dois modelos representam uma nova forma de analisar aplicação paralelas visualmente, uma vez que eles foram concebidos para larga-escala e sistemas distribuídos complexos, como grids. As implicações desta tese estão diretamente relacionadas à análise e ao entendimento do comportamento de aplicações paralelas executadas em sistemas distribuídos. Um dos modelos de visualização apresentados aumenta a compreensão dos padrões de comunicação entre processos e oferece a possibilidade de observar tal padrão com a topologia de rede. Embora a topologia de rede seja usada, a abordagem pode ser adaptada sem grandes mudanças para levar em conta interconexões lógicas de bibliotecas de comunicação. Com a técnica de agregação apresentada nesta tese, os desenvolvedores são capazes de observar padrões de aplicações paralelas de larga escala.
Highly distributed systems such as grids are used today for the execution of large-scale parallel applications. Some characteristics of these systems are the complex resource interconnection that might be present and the scalability. The interconnection complexity comes from the different number of hops to provide communication among applications processes and differences in network latencies and bandwidth. The scalability means that the resources can be added indefinitely just by connecting them to the existing infrastructure. These characteristics influence directly the way parallel applications performance must be analyzed. Current traditional visualization schemes to this analysis are usually based on Gantt charts with one dimension to list the monitored entities and the other dimension dedicated to time. These visualizations are generally not suited to parallel applications executed in grids. The first reason is that they were not built to offer to the developer an analysis that also shows the network topology of the resources. The second reason is that traditional visualization techniques do not scale well when thousands of monitored entities must be analyzed together. This thesis tries to overcome the issues encountered on traditional visualization techniques for parallel applications. The main idea behind our efforts is to explore techniques from the information visualization research area and to apply them in the context of parallel applications analysis. Based on this main idea, the thesis proposes two visualization models: the three-dimensional and the visual aggregation model. The former might be used to analyze parallel applications taking into account the network topology of the resources. The visualization itself is composed of three dimensions, where two of them are used to render the topology and the third is used to represent time. The later model can be used to analyze parallel applications composed of several thousands of processes. It uses hierarchical organization of monitoring data and an information visualization technique called Treemap to represent that hierarchy. Both models represent a novel way to visualize the behavior of parallel applications, since they are conceived considering large-scale and complex distributed systems, such as grids. The implications of this thesis are directly related to the analysis and understanding of parallel applications executed in distributed systems. It enhances the comprehension of patterns in communication among processes and improves the possibility of matching this patterns with real network topology of grids. Although we extensively use the network topology example, the approach could be adapted with almost no changes to the interconnection provided by a middleware of a logical interconnection. With the scalable visualization technique, developers are able to look for patterns and observe the behavior of large-scale applications.

APA, Harvard, Vancouver, ISO, and other styles

30

Bosch, Pons Jaume. "Breaking host-centric management of task-based parallel programming models." Doctoral thesis, Universitat Politècnica de Catalunya, 2021. http://hdl.handle.net/10803/672309.

Full text

Abstract:

Heterogeneous platforms had become popular to increase the computational power of the systems within a constrained power budget. They are present in several systems, from embedded platforms and mobile devices to high-end servers and clusters. However, the co-processors are managed following a master-slave model where the general-purpose CPU drives the rest of elements. This management limits the system possibilities as not all application parts are suitable to be executed in an accelerator. This thesis presents different proposals to enhance the usage of co-processors in task-based parallel programming models, which are a powerful tool to easily program applications for heterogeneous platforms. The first proposal enhances the task-based systems with an asynchronous, concurrent, and parameterizable behavior. The improvements go across the full-stack, from the programming model level down to the low-level communications used between the libraries and the co-processors. The evaluation shows that the implemented improvements boost the applications' performance as they can be easily tuned for the running platform. The second proposal adds support for task spawn and synchronization in co-processors. The offloaded tasks can create child tasks that target other architectures or remain inside the co-processor. This allows the programmers to implement applications easily and effectively. The evaluation shows the efficiency of the proposal implementation in terms of latency and power consumption. The results show that applications can increase their performance and optimize their power consumption just moving the task spawn from the host threads to the co-processor. This is thanks to the low-latency task management inside the co-processors, which also reduces the communications between the host and the co-processor. The third proposal extends task-based programming models with concepts of recurrent workloads. The regular task syntax has been extended with new clauses to label the recurrent tasks and provide the needed information to the runtime. The evaluation shows an application programmability increase thanks to the new syntax, which allows the specification of recurrent systems with much less code and better accuracy. Also, the direct management of task repetitions and periods in the co-processors allows an almost zero-latency management that is able to manage any task granularity.
Els sistemes heterogenis s'han popularitzat, ja que permeten incrementar la potència de càlcul sense implicar un augment del consum energètic. Aquests sistemes van des de plataformes encastades i dispositius mòbils, fins a servidors i clústers d'altes prestacions. En tots ells, la gestió dels coprocessadors segueix el patró primari-secundari on la unitat de còmput general (CPU, per les seves sigles en anglès) dirigeix la resta d'elements. Aquesta gestió limita les possibilitats dels sistemes i limita les parts de les aplicacions que poden ser executades en els acceleradors. Aquesta tesi presenta diferents propostes per millorar l'ús dels coprocessadors dins dels models de programació paral·lels basats en tasques. Aquests models de programació són una eina molt potent que permet programar fàcilment aplicacions pels sistemes heterogenis. La primera proposta millora els models de programació basats en tasques mitjançant aproximacions asíncrones, concurrents i parametritzables. Les millores són a tots els nivells, des del model de programació fins a les comunicacions a baix nivell entre les llibreries i els coprocessadors. Els resultats de l'avaluació mostren que les millores augmenten el rendiment de les aplicacions perquè permeten adaptar-les fàcilment a les plataformes d'execució. La segona proposta afegeix suport per la creació de tasques i la seva sincronització dins dels coprocessadors. Les tasques enviades als coprocessadors poden crear tasques filles pel mateix coprocessador o per altres elements del sistema. Això flexibilitza i facilita la programació d'aplicacions. L'avaluació mostra l'eficiència de la proposta respecte a la latència i el consum d'energia. Els resultats revelen que les aplicacions poden incrementar el seu rendiment i optimitzar el seu consum energètic creant les tasques directament a dins dels coprocessadors. La millora es deu a la baixa latència de la gestió de tasques dins dels coprocessadors que també suposa una reducció de les comunicacions entre la CPU i el coprocessador. La tercera proposta amplia les capacitats dels models de programació basats en tasques introduint conceptes de sistemes recurrents. La sintaxi bàsica d'una tasca s'amplia amb noves clàusules per distingir les recurrents i proporcionar al runtime la informació necessària. L'avaluació de la proposta mosta una millora en la programabilitat de les aplicacions gràcies a la nova sintaxi. Aquesta permet la creació de sistemes recurrents amb menys codi i amb una precisió major. La gestió directa de les repeticions i períodes de les tasques recurrents dins dels coprocessadors resulta en una latència mínima que permet qualsevol granularitat de tasques.
Arquitectura de computadors

APA, Harvard, Vancouver, ISO, and other styles

31

Hemmati, Moghadam Afshin. "Modelica PARallel benchmark suite (MPAR) - a test suite for evaluating the performance of parallel simulations of Modelica models." Thesis, Linköpings universitet, PELAB - Laboratoriet för programmeringsomgivningar, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-72685.

Full text

Abstract:

Using the object-oriented, equation-based modeling language Modelica, it is possible to model and simulate computationally intensive models. To reduce the simulation time, a desirable approach is to perform the simulations on parallel multi-core platforms. For this purpose, several works have been carried out so far, the most recent one includes language enhancements with explicit parallel programing language constructs in the algorithmic parts of the Modelica language. This extension automatically generates parallel simulation code for execution on OpenCL-enabled platforms, and it has been implemented in the open-source OpenModelica environment. However, to ensure that this extension as well as future developments regarding parallel simulations of Modelica models are feasible, performing a systematic benchmarking with respect to a set of appropriate Modelica models is essential, which is the main focus of study in this thesis. In this thesis a benchmark test suite containing computationally intensive Modelica models which are relevant for parallel simulations is presented. The suite is used in this thesis as a means for evaluating the feasibility and performance measurements of the generated OpenCL code when using the new Modelica language extension. In addition, several considerations and suggestions on how the modeler can efficiently parallelize sequential models to achieve better performance on OpenCL-enabled GPUs and multi-coreCPUs are also given. The measurements have been done for both sequential and parallel implementations of the benchmark suite using the generated code from the OpenModelica compiler on different hardware configurations including single and multi-core CPUs as well as GPUs. The gained results in this thesis show that simulating Modelica models using OpenCL as a target language is very feasible. In addition, it is concluded that for models with large data sizes and great level of parallelism, it is possible to achieve considerable speedup on GPUs compared to single and multi-core CPUs.

APA, Harvard, Vancouver, ISO, and other styles

32

Mandviwala, Hasnain A. "Capsules expressing composable computations in a parallel programming model /." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/26684.

Full text

Abstract:

Thesis (Ph.D)--Computing, Georgia Institute of Technology, 2009.
Committee Chair: Ramachandran, Umakishore; Committee Member: Knobe Kathleen; Committee Member: Pande, Santosh; Committee Member: Prvulovic, Milos; Committee Member: Rehg, James M.. Part of the SMARTech Electronic Thesis and Dissertation Collection.

APA, Harvard, Vancouver, ISO, and other styles

33

Craig, Bruce A. "Comparison of creep/duration of load performance in bending of Parallam® parallel strand lumber to machine stress rated lumber." Thesis, University of British Columbia, 1986. http://hdl.handle.net/2429/26194.

Full text

Abstract:

A comparison of the creep/duration of load (DOL) performance of a new structural wood composite material called Parallam® parallel strand lumber (PSL) to two grades of machine-stress-rated (HSR) Douglas-fir lumber is presented in this thesis. Evaluation of the creep/DOL performance was made on nominal 2x4 members under constant bending stress at three stress levels. A total of 306 test specimens were evaluated for a 15-1/2 month time period. The analysis suggests that the duration of load effect for Parallam PSL was consistent with the Madison curve for the time period studied while the MSR Douglas-fir lumber was consistent with recent duration of load models developed for structural lumber. The analysis also indicates that the current duration of load adjustment factors can be applied to develop working stresses for Parallam. The creep behaviour of the Parallam PSL was found to be equivalent or better than the two MSR lumber grades under dry-service conditions. Furthermore, evidence of linear viscoelastic behaviour was found for all test materials within the range of applied stresses evaluated. Two mathematical models of creep were fitted to the creep data and compared. A '4-parameter linear viscoelastic' model fitted the creep data better than an empirical 'power curve' model. The model parameters developed provide a basis for estimating the mean creep behaviour and variability in creep response for these materials under in-service load conditions for dry-service environments.
Forestry, Faculty of
Graduate

APA, Harvard, Vancouver, ISO, and other styles

34

Rahimi, Mona. "A PARALLEL IMPLEMENTATION OF GIBBS SAMPLING ALGORITHM FOR 2PNO IRT MODELS." OpenSIUC, 2011. https://opensiuc.lib.siu.edu/theses/696.

Full text

Abstract:

Item response theory (IRT) is a newer and improved theory compared to the classical measurement theory. The fully Bayesian approach shows promise for IRT models. However, it is computationally expensive, and therefore is limited in various applications. It is important to seek ways to reduce the execution time and a suitable solution is the use of high performance computing (HPC). HPC offers considerably high computational power and can handle applications with high computation and memory requirements. In this work, we have applied two different parallelism methods to the existing fully Bayesian algorithm for 2PNO IRT models so that it can be run on a high performance parallel machine with less communication load. With our parallel version of the algorithm, the empirical results show that a speedup was achieved and the execution time was considerably reduced.

APA, Harvard, Vancouver, ISO, and other styles

35

Buron, Cyprien. "Interactive generation and rendering of massive models : a parallel procedural approach." Thesis, Bordeaux, 2014. http://www.theses.fr/2014BORD0014/document.

Full text

Abstract:

Afin de créer des productions toujours plus réalistes, les industries du jeu vidéo et du cinéma cherchent à générer des environnements de plus en plus larges et complexes. Cependant, la modélisation manuelle des objets 3D dans de tels décors se révèle très coûteuse. A l’inverse, les méthodes de génération procédurale permettent de créer facilement une grande variété d’objets, tels que les plantes et les bâtiments. La modélisation par règles de grammaire offre un outil de haut niveau pour décrire ces objets, mais utiliser correctement ces règles s’avère très souvent compliqué. De plus, aucune solution de modélisation basée grammaire ne supporte l’édition et la visualisation d’environnements massifs en temps interactif. Dans un tel scénario, les artistes doivent modifier les objets en dehors de la scène avant de voir le résultat intégré.Dans ces travaux de recherche, nous nous intéressons à la génération procédurale et au rendu d’environnements à grande échelle. Nous voulons aussi faciliter la tâche des artistes avec des outils intuitifs de contrôle de grammaires. Tout d’abord nous proposons un système permettant la génération procédurale en parallèle sur le GPU en temps interactif. Pour cela, nous adoptons une approche d’expansion indépendante par segment, permettant une amplification des données en parallèle. Nous étendons ce système pour générer des modèles basés sur une structure interne, tels que les toits. Nous présentons aussi une solution utilisant des contextes externes pour contrôler facilement les grammaires par le biais de surface ou de texture. Pour finir nous intégrons un système de niveaux de détails et des techniques d’optimisation permettant la génération, l’édition et la visualisation interactives d’environnements à grande échelle. Grâce à notre système il est possible de générer et d’afficher interactivement des scènes comprenant des milliers de bâtiments et d’arbres, représentant environ 2 téraoctets de données
With the increasing computing and storage capabilities of recent hardware, movie and video games industries desire huger realistic environments. However, modeling such sceneries by hand turns out to be highly time consuming and costly. On the other hand, procedural modeling provides methods to easily generate high diversity of elements such as vegetation and architecture. While grammar rules bring a high-level powerful modeling tool, using these rules is often a tedious task, necessitating frustrating trial and error process. Moreover, as no solution proposes real-time generation and rendering for massive environments, artists have to work on separate parts before integrating the whole and see the results.In this research, we aim to provide interactive generation and rendering of very large sceneries, while offering artist-friendly methods for controlling grammars behavior. We first introduce a GPU-based pipeline providing parallel procedural generation at render time. To this end we propose a segment-based expansion method working on independent elements, thus allowing for parallel amplification. We then extend this pipeline to permit the construction of models relying on internal contexts, such as roofs. We also present external contexts to control grammars with surface and texture data. Finally, we integrate a LOD system with optimization techniques within our pipeline providing interactive generation, edition and visualization of massive environments. We demonstrate the efficiency of our pipeline with a scene comprising hundred thousand trees and buildings each, representing 2 terabytes of data

APA, Harvard, Vancouver, ISO, and other styles

36

Peng, Chao. "Real-time Visualization of Massive 3D Models on GPU Parallel Architectures." Diss., Virginia Tech, 2013. http://hdl.handle.net/10919/50573.

Full text

Abstract:

Real-time rendering of massive 3D models has been recognized as a challenging task due to the limited computational power and memory available in a workstation. Most existing acceleration techniques, such as mesh simplification algorithms with hierarchical data structures, suffer from the nature of sequential executions. As data complexity increases due to the fundamental advances in modeling and simulation technologies, 3D models become complex and require gigabytes in storage. Consequently, visualizing such large datasets becomes a computationally intensive process where sequential solutions are unable to satisfy the demands of real-time rendering.
Recently, the Graphics Processing Unit (GPU) has been praised as a massively parallel architecture not only for its significant improvements in performance but also because of its programmability for general-purpose computation. Today\'s GPUs allow researchers to solve problems by delivering fine-grained parallel implementations. In this dissertation, I concentrate on the design of parallel algorithms for real-time rendering of massive 3D polygonal models towards modern GPU architectures. As a result, the delivered rendering system supports high-performance visualization of 3D models composed of hundreds of millions of polygons on a single commodity workstation.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

37

Mohamed, Hussein Zeti Azura. "Parallel β-helix prediction : high-confidence models from multiple sequence alignments." Thesis, University of Edinburgh, 2005. http://hdl.handle.net/1842/12665.

Full text

Abstract:

This PhD project consisted of two parts. The first part was our successful T0100 prediction in CASP4. In this prediction, we produced one of the highest ranked threading alignments through sequence analysis which revealed a, “Cys-staples” pattern formed by putative disulphide bridges between consecutive turns in the parallel β-helix core of different homologues. This pattern was used as an anchoring point in the template-target alignment, and this novel approach motivated the follow-up project which constitutes the second part. The aim of this second part of the project was to apply the aforementioned approach as widely as possible and to produce high-confidence models for all detectable members of the PLL superfamily in GenPept (as retrieved from the NCBI in July 2002). Large-scale detection of PLL proteins was achieved initially with the help of two different third party fold recognition programs, setting the parameters and cut-off values carefully to be stringent in order to minimise false positive predictions. The two resulting datasets were then pooled and clustered. This resulted in twelve families with homologues in PDB, eight families without close homologues in PDB but with some members annotated as pectolytic enzymes, and one new family with no indication of prior classification as PLL. A small fraction of PLL predictions were deemed to be probable false positives, and a few others could not be followed up upon confidently because no homologues could be detected by standard BLAST searches of the public sequence databases. After augmenting the nine families without known structures through standard BLAST searches of SPTrEMBL and careful analysis and editing of automated target-template alignments, all plausible members of the altogether twenty-one families were modelled using an automated modeling procedure.

APA, Harvard, Vancouver, ISO, and other styles

38

Ravindran, Somasundaram. "Aspects of practical implementations of PRAM algorithms." Thesis, University of Warwick, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.386838.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Lanore, Vincent. "On Scalable Reconfigurable Component Models for High-Performance Computing." Thesis, Lyon, École normale supérieure, 2015. http://www.theses.fr/2015ENSL1051/document.

Full text

Abstract:

La programmation à base de composants est un paradigme de programmation qui facilite la réutilisation de code et la séparation des préoccupations. Les modèles à composants dits « reconfigurables » permettent de modifier en cours d'exécution la structure d'une application. Toutefois, ces modèles ne sont pas adaptés au calcul haute performance (HPC) car ils reposent sur des mécanismes ne passant pas à l'échelle.L'objectif de cette thèse est de fournir des modèles, des algorithmes et des outils pour faciliter le développement d'applications HPC reconfigurables à base de composants. La principale contribution de la thèse est le modèle à composants formel DirectMOD qui facilite l'écriture et la réutilisation de code de transformation distribuée. Afin de faciliter l'utilisation de ce premier modèle, nous avons également proposé :• le modèle formel SpecMOD qui permet la spécialisation automatique d'assemblage de composants afin de fournir des fonctionnalités de génie logiciel de haut niveau ; • des mécanismes de reconfiguration performants à grain fin pour les applications AMR, une classe d'application importante en HPC.Une implémentation de DirectMOD, appelée DirectL2C, a été réalisée et a permis d'implémenter une série de benchmarks basés sur l'AMR pour évaluer notre approche. Des expériences sur grappes de calcul et supercalculateur montrent que notre approche passe à l'échelle. De plus, une analyse quantitative du code produit montre que notre approche est compacte et facilite la réutilisation
Component-based programming is a programming paradigm which eases code reuse and separation of concerns. Some component models, which are said to be "reconfigurable", allow the modification at runtime of an application's structure. However, these models are not suited to High-Performance Computing (HPC) as they rely on non-scalable mechanisms.The goal of this thesis is to provide models, algorithms and tools to ease the development of component-based reconfigurable HPC applications.The main contribution of the thesis is the DirectMOD component model which eases development and reuse of distributed transformations. In order to improve on this core model in other directions, we have also proposed:• the SpecMOD formal component model which allows automatic specialization of hierarchical component assemblies and provides high-level software engineering features;• mechanisms for efficient fine-grain reconfiguration for AMR applications, an important application class in HPC.An implementation of DirectMOD, called DirectL2C, as been developed so as to implement a series of benchmarks to evaluate our approach. Experiments on HPC architectures show our approach scales. Moreover, a quantitative analysis of the benchmark's codes show that our approach is compact and eases reuse

APA, Harvard, Vancouver, ISO, and other styles

40

Strid, Ingvar. "Computational methods for Bayesian inference in macroeconomic models." Doctoral thesis, Handelshögskolan i Stockholm, Ekonomisk Statistik (ES), 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:hhs:diva-1118.

Full text

Abstract:

The New Macroeconometrics may succinctly be described as the application of Bayesian analysis to the class of macroeconomic models called Dynamic Stochastic General Equilibrium (DSGE) models. A prominent local example from this research area is the development and estimation of the RAMSES model, the main macroeconomic model in use at Sveriges Riksbank. Bayesian estimation of DSGE models is often computationally demanding. In this thesis fast algorithms for Bayesian inference are developed and tested in the context of the state space model framework implied by DSGE models. The algorithms discussed in the thesis deal with evaluation of the DSGE model likelihood function and sampling from the posterior distribution. Block Kalman filter algorithms are suggested for likelihood evaluation in large linearised DSGE models. Parallel particle filter algorithms are presented for likelihood evaluation in nonlinearly approximated DSGE models. Prefetching random walk Metropolis algorithms and adaptive hybrid sampling algorithms are suggested for posterior sampling. The generality of the algorithms, however, suggest that they should be of interest also outside the realm of macroeconometrics.

APA, Harvard, Vancouver, ISO, and other styles

41

Bengtsson, Jerker. "Models and Methods for Development of DSP Applications on Manycore Processors." Doctoral thesis, Högskolan i Halmstad, Centrum för forskning om inbyggda system (CERES), 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-14706.

Full text

Abstract:

Advanced digital signal processing systems require specialized high-performance embedded computer architectures. The term high-performance translates to large amounts of data and computations per time unit. The term embedded further implies requirements on physical size and power efficiency. Thus the requirements are of both functional and non-functional nature. This thesis addresses the development of high-performance digital signal processing systems relying on manycore technology. We propose building two-level hierarchical computer architectures for this domain of applications. Further, we outline a tool flow based on methods and analysis techniques for automated, multi-objective mapping of such applications on distributed memory manycore processors. In particular, the focus is put on how to provide a means for tunable strategies for mapping of task graphs on array structured distributed memory manycores, with respect to given application constraints. We argue for code mapping strategies based on predicted execution performance, which can be used in an auto-tuning feedback loop or to guide manual tuning directed by the programmer. Automated parallelization, optimisation and mapping to a manycore processor benefits from the use of a concurrent programming model as the starting point. Such a model allows the programmer to express different types and granularities of parallelism as well as computation characteristics of importance in the addressed class of applications. The programming model should also abstract away machine dependent hardware details. The analytical study of WCDMA baseband processing in radio base stations, presented in this thesis, suggests dataflow models as a good match to the characteristics of the application and as execution model abstracting computations on a manycore. Construction of portable tools further requires a manycore machine model and an intermediate representation. The models are needed in order to decouple algorithms, used to transform and map application software, from hardware. We propose a manycore machine model that captures common hardware resources, as well as resource dependent performance metrics for parallel computation and communication. Further, we have developed a multifunctional intermediate representation, which can be used as source for code generation and for dynamic execution analysis. Finally, we demonstrate how we can dynamically analyse execution using abstract interpretation on the intermediate representation. It is shown that the performance predictions can be used to accurately rank different mappings by best throughput or shortest end-to-end computation latency.

APA, Harvard, Vancouver, ISO, and other styles

42

Castillo, Villar Emilio. "Parallel architectures and runtime systems co-design for task-based programming models." Doctoral thesis, Universitat Politècnica de Catalunya, 2019. http://hdl.handle.net/10803/666783.

Full text

Abstract:

The increasing parallelism levels in modern computing systems has extolled the need for a holistic vision when designing multiprocessor architectures taking in account the needs of the programming models and applications. Nowadays, system design consists of several layers on top of each other from the architecture up to the application software. Although this design allows to do a separation of concerns where it is possible to independently change layers due to a well-known interface between them, it is hampering future systems design as the Law of Moore reaches to an end. Current performance improvements on computer architecture are driven by the shrinkage of the transistor channel width, allowing faster and more power efficient chips to be made. However, technology is reaching physical limitations were the transistor size will not be able to be reduced furthermore and requires a change of paradigm in systems design. This thesis proposes to break this layered design, and advocates for a system where the architecture and the programming model runtime system are able to exchange information towards a common goal, improve performance and reduce power consumption. By making the architecture aware of runtime information such as a Task Dependency Graph (TDG) in the case of dataflow task-based programming models, it is possible to improve power consumption by exploiting the critical path of the graph. Moreover, the architecture can provide hardware support to create such a graph in order to reduce the runtime overheads and making possible the execution of fine-grained tasks to increase the available parallelism. Finally, the current status of inter-node communication primitives can be exposed to the runtime system in order to perform a more efficient communication scheduling, and also creates new opportunities of computation and communication overlap that were not possible before. An evaluation of the proposals introduced in this thesis is provided and a methodology to simulate and characterize the application behavior is also presented.
El aumento del paralelismo proporcionado por los sistemas de cómputo modernos ha provocado la necesidad de una visión holística en el diseño de arquitecturas multiprocesador que tome en cuenta las necesidades de los modelos de programación y las aplicaciones. Hoy en día el diseño de los computadores consiste en diferentes capas de abstracción con una interfaz bien definida entre ellas. Las limitaciones de esta aproximación junto con el fin de la ley de Moore limitan el potencial de los futuros computadores. La mayoría de las mejoras actuales en el diseño de los computadores provienen fundamentalmente de la reducción del tamaño del canal del transistor, lo cual permite chips más rápidos y con un consumo eficiente sin apenas cambios fundamentales en el diseño de la arquitectura. Sin embargo, la tecnología actual está alcanzando limitaciones físicas donde no será posible reducir el tamaño de los transistores motivando así un cambio de paradigma en la construcción de los computadores. Esta tesis propone romper este diseño en capas y abogar por un sistema donde la arquitectura y el sistema de tiempo de ejecución del modelo de programación sean capaces de intercambiar información para alcanzar una meta común: La mejora del rendimiento y la reducción del consumo energético. Haciendo que la arquitectura sea consciente de la información disponible en el modelo de programación, como puede ser el grafo de dependencias entre tareas en los modelos de programación dataflow, es posible reducir el consumo energético explotando el camino critico del grafo. Además, la arquitectura puede proveer de soporte hardware para crear este grafo con el objetivo de reducir el overhead de construir este grado cuando la granularidad de las tareas es demasiado fina. Finalmente, el estado de las comunicaciones entre nodos puede ser expuesto al sistema de tiempo de ejecución para realizar una mejor planificación de las comunicaciones y creando nuevas oportunidades de solapamiento entre cómputo y comunicación que no eran posibles anteriormente. Esta tesis aporta una evaluación de todas estas propuestas, así como una metodología para simular y caracterizar el comportamiento de las aplicaciones

APA, Harvard, Vancouver, ISO, and other styles

43

Stavåker, Kristian. "Contributions to Simulation of Modelica Models on Data-Parallel Multi-Core Architectures." Doctoral thesis, Linköpings universitet, Programvara och system, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-116338.

Full text

Abstract:

Modelica is an object-oriented, equation-based modeling and simulation language being developed through an international effort by the Modelica Association. With Modelica it is possible to build computationally demanding models; however, simulating such models might take a considerable amount of time. Therefore techniques of utilizing parallel multi-core architectures for faster simulations are desirable. In this thesis the topic of simulation of Modelica on parallel architectures in general and on graphics processing units (GPUs) in particular is explored. GPUs support code that can be executed in a data-parallel fashion. It is also possible to connect and run several GPUs together which opens opportunities for even more parallelism. In this thesis several approaches regarding simulation of Modelica models on GPUs and multi-core architectures are explored. In this thesis the topic of expressing and solving partial differential equations (PDEs) in the context of Modelica is also explored, since such models usually give rise to equation systems with a regular structure, which can be suitable for efficient solution on GPUs. Constructs for PDE-based modeling are currently not part of the standard Modelica language specification. Several approaches on modeling and simulation with PDEs in the context of Modelica have been developed over the years. In this thesis we present selected earlier work, ongoing work and planned work on PDEs in the context of Modelica. Some approaches detailed in this thesis are: extending the language specification with PDE handling; using a software with support for PDEs and automatic discretization of PDEs; and connecting an external C++ PDE library via the functional mockup interface (FMI). Finally the topic of parallel skeletons in the context of Modelica is explored. A skeleton is a predefined, generic component that implements a common specific pattern of computation and data dependence. Skeletons provide a high degree of abstraction and portability and a skeleton can be customized with user code. Using skeletons with Modelica opens up the possibility of executing heavy Modelica-based matrix and vector computations on multi-core architectures. A working Modelica-SkePU library with some minor necessary compiler extensions is presented.
Modelica är ett objektorienterat, ekvationsbaserat modellerings- och simuleringsspråk som utvecklas via den internationella organisationen the Modelica Association. Med Modelica är det möjligt att bygga beräkningskrävande modeller vilket kan leda till långa simuleringstider. Därför är metoder för att utnyttja parallella flerkärniga arkitekturer för snabbare simuleringar önskvärda. I denna avhandling utforskas området simulering av Modelicamodeller på parallella arkitekturer i allmänhet och på grafikbearbetningsenheter (GPUs) i synnerhet. GPU-kod kan köras data-parallellt. Det är också möjligt att ansluta och köra flera GPUs tillsammans vilket öppnar upp möjligheter för ännu mer parallellism. I denna avhandling utforskas flera metoder avseende simulering av Modelicamodeller på GPUs och multi-core arkitekturer. I denna avhandling utforskas också ämnet att uttrycka och lösa partiella differentialekvationer (PDE:er) i Modelica. Modeller innehållande PDE:er ger vanligtvis upphov till ekvationssystem med en regelbunden data-parallel struktur, som lämpar sig för effektiv lösning på grafikprocessorer. Konstruktioner för PDE-baserad modellering ingår för närvarande inte i språkspecifikationen för Modelicastandarden. Flera metoder för modellering och simulering av PDE:er med Modelica har utvecklats genom åren. I denna avhandling presenterar vi utvalda tidigare arbeten, pågående arbeten, och planerade arbeten med PDE:er med Modelica. Några av metoderna som beskrivs i denna avhandling är: utvidga språkspecifikationen med PDE-hantering; stöd för PDE:er och automatisk diskretisering av PDE:er med hjälp av speciell programvara; och att ansluta ett externt C++ PDE bibliotek via det så kallade functional mockup interfacet (FMI). Slutligen studerar vi ämnet parallella skelett tillsammans med Modelica. Ett skelett är en fördefinierad, generisk programkomponent som implementerar ett gemensamt specifikt mönster av beräkning och databeroende. Skelett ger en hög grad av abstraktion och ett skelett kan skräddarsys med användarkod. Att använda skelett tillsammans med Modelica öppnar upp möjligheten att utföra tunga Modelicabaserade matris- och vektorberäkningar på flerkärniga arkitekturer. Ett fungerande Modelica-SkePU bibliotek tillsammans med några mindre kompilatorutvidgningar presenteras.

APA, Harvard, Vancouver, ISO, and other styles

44

Yin, Yue. "Models of computation for performance estimation in a parallel image processing system." [Florida] : State University System of Florida, 2000. http://etd.fcla.edu/etd/uf/2000/ana7022/master.PDF.

Full text

Abstract:

Thesis (M.S.)--University of Florida, 2000.
Title from first page of PDF file. Document formatted into pages; contains x, 78 p.; also contains graphics. Vita. Includes bibliographical references (p. 75-77).

APA, Harvard, Vancouver, ISO, and other styles

45

Richards, Andrew Perry. "Coal Pyrolysis Models for Use in Massively Parallel Oxyfuel-Fired Boiler Simulations." BYU ScholarsArchive, 2021. https://scholarsarchive.byu.edu/etd/8926.

Full text

Abstract:

Accurately modeling key aspects of coal combustion allows for the virtual testing and application of new technologies and processes without the need for investments in lab- and pilot-scale facilities, since such facilities may only be used for a few small tests. However, modeling of subprocesses must not only be accurate but computationally efficient. Modeling of coal devolatilization reactions and processes are one of the important parts of large-scale simulations of coal combustion systems. The work presented here details efforts to improve the modeling of coal devolatilization processes in massively-parallel simulations of coal combustors, including: (1) devolatilization rate/yield models, (2) modeling various chemical, physical, and thermodynamic properties of coal, char, and tar (including structural NMR parameters like carbon aromaticity, the elemental composition of coal char and tar, and the heating value of coal-based and other fuels), and (3) the application of various simplifying assumptions to equilibrium calculations of coal devolatilization products using multiple levels of fuel mixture fractions. Using several different advanced statistical methods, the models discussed here were developed and improved by careful comparison with large sets of experimental data. The advanced statistical methods and procedures show large improvements in these models over previous work.

APA, Harvard, Vancouver, ISO, and other styles

46

Johnson, Christopher Douglas. "A Parallel Genetic Algorithm for Optimizing Multicellular Models Applied to Biofilm Wrinkling." DigitalCommons@USU, 2017. https://digitalcommons.usu.edu/etd/5442.

Full text

Abstract:

Multiscale computational models integrating sub-cellular, cellular, and multicellular levels can be powerful tools that help researchers replicate, understand, and predict multicellular biological phenomena. To leverage their potential, these models need correct parameter values, which specify cellular physiology and affect multicellular outcomes. This work presents a robust parameter optimization method, utilizing a parallel and distributed genetic-algorithm software package. A genetic algorithm was chosen because of its superiority in fitting complex functions for which mathematical techniques are less suited. Searching for optimal parameters proceeds by comparing the multicellular behavior of a simulated system to that of a real biological system on the basis of features extracted from each which capture high-level, emergent multicellular outcomes. The goal is to find the set of parameters which minimizes discrepancy between the two sets of features. The method is first validated by demonstrating its effectiveness on synthetic data, then it is applied to calibrating a simple mechanical model of biofilm wrinkling, a common type of morphology observed in biofilms. Spatiotemporal convergence of cellular movement derived from experimental observations of different strains of Bacillus subtilis colonies is used as the basis of comparison.

APA, Harvard, Vancouver, ISO, and other styles

47

Stavåker, Kristian. "Contributions to Parallel Simulation of Equation-Based Models on Graphics Processing Units." Licentiate thesis, Linköpings universitet, PELAB - Laboratoriet för programmeringsomgivningar, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-71270.

Full text

Abstract:

In this thesis we investigate techniques and methods for parallel simulation of equation-based, object-oriented (EOO) Modelica models on graphics processing units (GPUs). Modelica is being developed through an international effort via the Modelica Association. With Modelica it is possible to build computationally heavy models; simulating such models however might take a considerable amount of time. Therefor techniques of utilizing parallel multi-core architectures for simulation are desirable. The goal in this work is mainly automatic parallelization of equation-based models, that is, it is up to the compiler and not the end-user modeler to make sure that code is generated that can efficiently utilize parallel multi-core architectures. Not only the code generation process has to be altered but the accompanying run-time system has to be modified as well. Adding explicit parallel language constructs to Modelica is also discussed to some extent. GPUs can be used to do general purpose scientific and engineering computing. The theoretical processing power of GPUs has surpassed that of CPUs due to the highly parallel structure of GPUs. GPUs are, however, only good at solving certain problems of data-parallel nature. In this thesis we relate several contributions, by the author and co-workers, to each other. We conclude that the massively parallel GPU architectures are currently only suitable for a limited set of Modelica models. This might change with future GPU generations. CUDA for instance, the main software platform used in the thesis for general purpose computing on graphics processing units (GPGPU), is changing rapidly and more features are being added such as recursion, function pointers, C++ templates, etc.; however the underlying hardware architecture is still optimized for data-parallelism.

APA, Harvard, Vancouver, ISO, and other styles

48

Ungureanu, George. "Automatic Software Synthesis from High-Level ForSyDe Models Targeting Massively Parallel Processors." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-127832.

Full text

Abstract:

In the past decade we have witnessed an abrupt shift to parallel computing subsequent to the increasing demand for performance and functionality that can no longer be satisfied by conventional paradigms. As a consequence, the abstraction gab between the applications and the underlying hardware increased, triggering both industry and academia in several research directions. This thesis project aims at analyzing some of these directions in order to offer a solution for bridging the abstraction gap between the description of a problem at a functional level and the implementation on a heterogeneous parallel platform using ForSyDe – a formal design methodology. This report treats applications employing data-parallel and time-parallel computation, regards nvidia CUDA-enabled GPGPUs as the main backend platform. The report proposes a heuristic transformation-and-refinement process based on analysis methods and design decisions to automate and aid in a correct-by-design backend code synthesis. Its purpose is to identify potential data parallelism and time parallelism in a high-level system. Furthermore, based on a basic platform model, the algorithm load-balances and maps the execution onto the best computation resources in an automated design flow. This design flow will be embedded into an already existing tool, f2cc (ForSyDe-to-CUDA C) and tested for correctness on an industrial-scale image processing application aimed at monitoring inkjet print-heads reliability.

APA, Harvard, Vancouver, ISO, and other styles

49

Schneider, Scott. "Shared Memory Abstractions for Heterogeneous Multicore Processors." Diss., Virginia Tech, 2010. http://hdl.handle.net/10919/30240.

Full text

Abstract:

We are now seeing diminishing returns from classic single-core processor designs, yet the number of transistors available for a processor is still increasing. Processor architects are therefore experimenting with a variety of multicore processor designs. Heterogeneous multicore processors with Explicitly Managed Memory (EMM) hierarchies are one such experimental design which has the potential for high performance, but at the cost of great programmer effort. EMM processors have cores that are divorced from the normal memory hierarchy, thus the onus is on the programmer to manage locality and parallelism. This dissertation presents the Cellgen source-to-source compiler which moves some of this complexity back into the compiler. Cellgen offers a directive-based programming model with semantics similar to OpenMP for the Cell Broadband Engine, a general-purpose processor with EMM. The compiler implicitly handles locality and parallelism, schedules memory transfers for data parallel regions of code, and provides performance predictions which can be leveraged to make scheduling decisions. We compare this approach to using a software cache, to a different programming model which is task based with explicit data transfers, and to programming the Cell directly using the native SDK. We also present a case study which uses the Cellgen compiler in a comparison across multiple kinds of multicore architectures: heterogeneous, homogeneous and radically data-parallel graphics processors.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

50

Patsias, Kyriakos. "A HIGH PERFORMANCE GIBBS-SAMPLING ALGORITHM FOR ITEM RESPONSE THEORY MODELS." Available to subscribers only, 2009. http://proquest.umi.com/pqdweb?did=1796121011&sid=3&Fmt=2&clientId=1509&RQT=309&VName=PQD.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Parallel Models'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles