Дисертації з теми "Architecture dataflow"

Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Architecture dataflow.

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-50 дисертацій для дослідження на тему "Architecture dataflow".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Iannucci, Robert A. "A dataflow/von Neumann hybrid architecture." Thesis, Massachusetts Institute of Technology, 1988. http://hdl.handle.net/1721.1/14778.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Benjamin, Steven I. "Dataflow : overview and simulation /." Online version of thesis, 1988. http://hdl.handle.net/1850/10221.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Narayanaswamy, Ramya Priyadharshini. "Design of a Power-aware Dataflow Processor Architecture." Thesis, Virginia Tech, 2010. http://hdl.handle.net/10919/34192.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In a sensor monitoring embedded computing environment, the data from a sensor is an event that triggers the execution of an application. A sensor node consists of multiple sensors and a general purpose processor that handles the multiple events by deploying an event-driven software model. The software overheads of the general purpose processors results in energy inefficiency. What is needed is a class of special purpose processing elements which are more energy efficient for the purpose of computation. In the past, special purpose microcontrollers have been designed which are energy efficient for the targeted application space. However, reuse of the same design techniques is not feasible for other application domains. Therefore, this thesis presents a power-aware dataflow processor architecture targeted for the electronic textile computing space. The processor architecture has no instructions, and handles multiple events inherently without deploying software methods. This thesis also shows that the power-aware implementation reduces the overall static power consumption.
Master of Science
4

Moser, Nico, Carsten Gremzow, and Matthias Menge. "Interconnection Optimization for Dataflow Architectures." Universitätsbibliothek Chemnitz, 2007. http://nbn-resolving.de/urn:nbn:de:swb:ch1-200700950.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In this paper we present a dataflow processor architecture based on [1], which is driven by controlflow generated tokens. We will show the special properties of this architecture with regard to scalability, extensibility, and parallelism. In this context we outline the application scope and compare our approach with related work. Advantages and disadvantages will be discussed and we suggest solutions to solve the disadvantages. Finally an example of the implementation of this architecture will be given and we have a look at further developments. We believe the features of this basic approach predestines the architecture especially for embedded systems and system on chips.
5

Ruggiero, C. A. "Throttle mechanisms for the Manchester Dataflow Machine." Thesis, University of Manchester, 1987. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.382765.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Li, Feng. "Compiling for a multithreaded dataflow architecture : algorithms, tools, and experience." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2014. http://tel.archives-ouvertes.fr/tel-00992753.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Across the wide range of multiprocessor architectures, all seem to share one common problem: they are hard to program. It is a general belief that parallelism is a software problem, and that perhaps we need more sophisticated compilation techniques to partition the application into concurrent threads. Many experts also make the point that the underlining architecture plays an equally important architecture before one may expect significant progress in the programmability of multiprocessors. Our approach favors a convergence of these viewpoints. The convergence of dataflow and von Neumann architecture promises latency tolerance, the exploitation of a high degree of parallelism, and light thread switching cost. Multithreaded dataflow architectures require a high degree of parallelism to tolerate latency. On the other hand, it is error-prone for programmers to partition the program into large number of fine grain threads. To reconcile these facts, we aim to advance the state of the art in automatic thread partitioning, in combination with programming language support for coarse-grain, functionally deterministic concurrency. This thesis presents a general thread partitioning algorithm for transforming sequential code into a parallel data-flow program targeting a multithreaded dataflow architecture. Our algorithm operates on the program dependence graph and on the static single assignment form, extracting task, pipeline, and data parallelism from arbitrary control flow, and coarsening its granularity using a generalized form of typed fusion. We design a new intermediate representation to ease code generation for an explicit token match dataflow execution model. We also implement a GCC-based prototype. We also evaluate coarse-grain dataflow extensions of OpenMP in the context of a large-scale 1024-core, simulated multithreaded dataflow architecture. These extension and simulated architecture allow the exploration of innovative memory models for dataflow computing. We evaluate these tools and models on realistic applications.
7

Motiwala, Quaeed. "Optimizations for acyclic dataflow graphs for hardware-software codesign." Thesis, This resource online, 1994. http://scholar.lib.vt.edu/theses/available/etd-06302009-040504/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Savaş, Süleyman. "Linear Algebra for Array Signal Processing on a Massively Parallel Dataflow Architecture." Thesis, Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-2192.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:

This thesis provides the deliberations about the implementation of Gentleman-Kung systolic array for QR decomposition using Givens Rotations within the context of radar signal

processing. The systolic array of Givens Rotations is implemented and analysed using a massively parallel processor array (MPPA), Ambric Am2045. The tools that are dedicated to the MPPA are tested in terms of engineering efficiency. aDesigner, which is built on eclipse environment, is used for programming, simulating and performance analysing. aDesigner has been produced for Ambric chip family. 2 parallel matrix multiplications have been implemented

to get familiar with the architecture and tools. Moreover different sized systolic arrays are implemented and compared with each other. For programming, ajava and astruct languages are provided. However floating point numbers are not supported by the provided languages.

Thus fixed point arithmetic is used in systolic array implementation of Givens Rotations. Stable and precise numerical results are obtained as outputs of the algorithms. However the analysis

results are not reliable because of the performance analysis tools.

9

Savaş, Süleyman. "Linear Algebra for Array Signal Processing on a Massively Parallel Dataflow Architecture." Thesis, Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-4137.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:

This thesis provides the deliberations about the implementation of Gentleman-Kung systolic array for QR decomposition using Givens Rotations within the context of radar signal processing. The systolic array of Givens Rotations is implemented and analysed using a massively parallel processor array (MPPA), Ambric Am2045. The tools that are dedicated to the MPPA are tested in terms of engineering efficiency. aDesigner, which is built on eclipse environment, is used for programming, simulating and performance analysing. aDesigner has been produced for Ambric chip family. 2 parallel matrix multiplications have been implemented to get familiar with the architecture and tools. Moreover different sized systolic arrays are implemented and compared with each other. For programming, ajava and astruct languages are provided. However floating point numbers are not supported by the provided languages. Thus fixed point arithmetic is used in systolic array implementation of Givens Rotations. Stable

and precise numerical results are obtained as outputs of the algorithms. However the analysis results are not reliable because of the performance analysis tools.

10

Pradal, Christophe. "Architecture de dataflow pour des systèmes modulaires et génériques de simulation de plante." Thesis, Montpellier, 2019. http://www.theses.fr/2019MONTS034.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
La modélisation en biologie, plus particulièrement celle de la croissance et du fonctionnement des plantes, est un domaine actuellement en pleine expansion, utile pour appréhender les enjeux liés au changement climatique et à la sécurité alimentaire au niveau mondial. La modélisation et la simulation sont des outils incontournables pour la compréhension des relations complexes entre l'architecture des plantes et les processus qui influencent leur croissance dans un environnement changeant. Pour la modélisation des plantes, un grand nombre de formalismes ont été développés dans de nombreuses disciplines et à différentes échelles de représentation. L'objectif de cette thèse est de définir une architecture modulaire qui permette de simuler des systèmes structure-fonction en réutilisant et en assemblant différents modèles existants. Nous étudierons d'abord les différentes approches de la réutilisation logicielle, proposées par Krueger, les systèmes à tableau noir et les systèmes de workflows scientifiques. Ces différentes approches sont utilisées afin de faire coopérer, de réutiliser et d'assembler des artefacts logiciels de façon modulaire. A partir du constat que ces systèmes fournissent les abstractions nécessaires à l'intégration d'artefacts variés, notre hypothèse de travail est qu'une architecture hybride, basée sur les systèmes à tableau noir avec un contrôle procédural piloté par dataflow, permettrait à la fois d'obtenir la modularité tout en permettant au modélisateur de garder le contrôle sur l'exécution. Dans le chapitre 2, nous décrivons la plateforme OpenAlea, une plateforme à composants logiciels et offrant un système de workflow scientifique, permettant l'assemblage et la composition de modèles à travers une interface de programmation visuelle. Dans le chapitre 3, nous proposons une structure de données pour le tableau noir, associant une représentation topologique de l'architecture des plantes à différentes échelles, le Multiscale Tree Graph, et sa spatialisation géométrique à l'aide de la bibliothèque 3D PlantGL. Ensuite, dans le chapitre 4, nous présentons les lambda-dataflows, une extension des dataflows permettant de coupler simulation et analyse. Puis, dans le chapitre 5, nous présentons une première application, qui illustre l'utilisation d'un modèle générique de feuilles de graminées dans différents modèles de plantes. Finalement, dans le chapitre 6, nous présentons l'ensemble des éléments de l'architecture utilisés pour élaborer un cadre générique de modélisation du développement des maladies foliaires dans un couvert architecturé. L'architecture présentée dans cette thèse et sa mise en œuvre dans OpenAlea sont un premier pas vers la réalisation de plateformes de modélisation intégratives ouvertes, permettant la coopération de modèles hétérogènes en biologie. L'utilisation du formalisme de workflows scientifiques en analyse et en simulation permet notamment d'envisager à court terme l'élaboration des plateformes de simulation collaboratives et distribuées à grande échelle
Biological modeling, particularly of plant growth and functioning, is a rapidly expanding field that is useful in addressing climate change and food security issues at the global level. Modeling and simulation are essential tools for understanding the complex relationships between plant architecture and the processes that influence their growth in a changing environment.For plant modeling, a large number of formalisms have been developed in many disciplines and at different scales of representation.The objective of this thesis is to define a modular architecture that allows to simulate structural-functional plant systems by reusing and assembling different existing models.We will first study the different approaches to software reuse proposed by Krueger, then blackboard systems, and scientific workflow systems.These different approaches are used to cooperate, reuse and assemble software artifacts in a modular manner.Based on the observation that these systems provide the abstractions necessary for the integration of various artifacts, our working hypothesis is that a hybrid architecture, based on blackboard systems with dataflow-driven procedural control, would both achieve modularity while allowing the modeler to maintain control over execution.In Chapter 2, we describe the OpenAlea platform, a platform with software components and a scientific workflow system, allowing the assembly and composition of models through a visual programming interface. In Chapter 3, we propose a data structure for the blackboard, combining a topological representation of plant architecture at different scales, the Multiscale Tree Graph, and its geometric spatialization using the 3D PlantGL library. In chapter 4, we present the lambda-dataflows, an extension of dataflows allowing to couple simulation and analysis.Then, in Chapter 5, we present a first application, which illustrates the use of a generic gramineous leaf model in different plant models. Finally, in Chapter 6, we present all the architectural elements used to develop a generic framework for modelling the development of foliar diseases in an architectural canopy.The architecture presented in this thesis and its implementation in OpenAlea are a first step towards the realization of open integrative modeling platforms, allowing the cooperation of heterogeneous models in biology. The use of scientific workflow formalism in analysis and simulation makes it possible to consider in the short term the development of collaborative and distributed simulation platforms on a large scale
11

Guo, Jinghong. "Distributed, Modular, Open Control Architecture for Power Conversion Systems." Diss., Virginia Tech, 2005. http://hdl.handle.net/10919/27900.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Due to close coupling to hardware and lack of software engineering technologies, the control software in digitally controlled power conversion systems is difficult to design and maintain. This is a natural consequence of a topology- or application-driven design approach. This research work proposes a distributed, modular, open control architecture for power conversion systems to reduce control design complexity, encapsulate and localize design dependencies, reduce unnecessary redesign effort and improve software quality. Dataflow style is chosen as the architectural style for the proposed control architecture based on comparative analysis. The detailed implementation of the dataflow architecture is presented. The resulting dataflow control software is evaluated in comparison to the legacy approach to control design used in industry and academia. The dataflow control software for a 3-phase voltage source inverter is also tested on a real PEBB-based converter system. To further explore the flexibility of control composition that is brought by the dataflow approach, the feasibility of dynamic control reconfiguration is also presented as an important future research direction.
Ph. D.
12

Lesparre, Youen. "Evaluation de l'affectation des tâches sur une architecture à mémoire distribuée pour des modèles flot de données." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066086/document.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Avec l'augmentation de l'utilisation des smartphones, des objets connectés et des véhicules automatiques, le domaine des systèmes embarqués est devenu omniprésent dans notre environnement. Ces systèmes sont souvent contraints en terme de consommation et de taille. L'utilisation des processeurs many-cores dans des systèmes embarqués permet une conception rapide tout en respectant des contraintes temps-réels et en conservant une consommation énergétique basse.Exécuter une application sur un processeur many-core requiert un dispatching des tâches appelé problème de mapping et est connu comme étant NP-complet.Les contributions de cette thèse sont divisées en trois parties :Tout d'abord, nous étendons d'importantes propriétés dataflow au modèle Phased Computation Graph.Ensuite, nous présentons un générateur de graphe dataflow capable de générer des Synchonous Dataflow Graphs, Cyclo-Static Dataflow Graphs et Phased Computation Graphs vivant avec plus de 10000 tâches en moins de 30 secondes. Le générateur est comparé à SDF3 et PREESM.Enfin, la contribution majeure de cette thèse propose une nouvelle méthode d'évaluation d'un mapping en utilisant les modèles Synchonous Dataflow Graphe et Cyclo-Static Dataflow Graphe. La méthode évalue efficacement la mémoire consommée par les communications d'un dataflow mappé sur une architecture à mémoire distribuée. L'évaluation est déclinée en deux versions, la première garantit la vivacité alors que la seconde ajoute une contrainte de débit. La méthode d'évaluation est expérimentée avec des dataflow générés par Turbine et avec des applications réelles
With the increasing use of smart-phones, connected objects or automated vehicles, embedded systems have become ubiquitous in our living environment. These systems are often highly constrained in terms of power consumption and size. They are more and more implemented with many-core processor array that allow, rapid design to meet stringent real-time constraints while operating at relatively low frequency, with reduced power consumption.Running an application on a processor array requires dispatching its tasks on the processors in order to meet capacity and performance constraints. This mapping problem is known to be NP-complete.The contributions of this thesis are threefold:First we extend important notions from the Cyclo-Static Dataflow Graph to the Phased Computation Graph model and two equivalent sufficient conditions of liveness.Second, we present a random dataflow graph generator able to generate Synchonous Dataflow Graphs, Cyclo-Static Dataflow Graphs and Phased Computation Graphs. The Generator, is able to generate live dataflow of up to 10,000 tasks in less than 30 seconds. It is compared with SDF3 and PREESM.Third and most important, we propose a new method of evaluation of a mapping using the Synchonous Dataflow Graph and the Cyclo-Static Dataflow Graph models. The method evaluates efficiently the memory footprint of the communications of a dataflow graph mapped on a distributed architecture. The evaluation is declined in two versions, the first guarantees a live mapping while the second accounts for a constraint on throughput.The evaluation method is experimented on dataflow graphs from Turbine and on real-life applications
13

Voigt, Sven-Ole. "Dynamically reconfigurable dataflow architecture for high performance digital signal processing on multi FPGA platforms." Aachen Shaker, 2008. http://d-nb.info/992481694/04.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
14

Silva, Antonio Carlos Fernandes da. "ChipCflow: tool for convert C code in a static dataflow architecture in reconfigurable hardware." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-30062015-141638/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
A growing search for alternative architectures and softwares have been noted in the last years. This search happens due to the advance of hardware technology and such advances must be complemented by innovations on design methodologies, test and verification techniques in order to use technology effectively. Alternative architectures and softwares, in general, explores the parallelism of applications, differently to Von Neumann model. Among high performance alternative architectures, there is the Dataflow Architecture. In this kind of architecture, the process of program execution is determined by data availability, thus the parallelism is intrinsic in these systems. The dataflow architectures become again a highlighted search area due to hardware advances, in particular, the advances of Reconfigurable Computing and Field Programmable Gate Arrays (FPGAs). ChipCflow projet is a tool for execution of algorithms using dynamic dataflow graph in FPGA. In this thesis, the development of a code conversion tool to generate aplications in a static dataflow architecture, is described. Also the ChipCflow project where the code conversion tool is part, is presented. The specification of algorithm to be converted is made in C language and converted to a hadware description language, respecting the proposed by ChipCflow project. The results are the proof of concept of converting a high-level language code for dataflow architecture to be used into a FPGA.
Existe uma crescente busca por softwares e arquiteturas alternativas. Essa busca acontece pois houveram avanços na tecnologia do hardware, e estes avanços devem ser complementados por inovações nas metodologias de projetos, testes e verificação para que haja um uso eficaz da tecnologia. Os software e arquiteturas alternativas, geralmente são modelos que exploram o paralelismo das aplicações, ao contrário do modelo de Von Neumann. Dentre as arquiteturas alternativas de alto desempenho, tem-se a arquitetura a fluxo de dados. Nesse tipo de arquitetura, o processo de execução de programas é determinado pela disponibilidade dos dados, logo o paralelismo está embutido na própria natureza do sistema. O modelo a fluxo de dados possui a vantagem de expressar o paralelismo de maneira intrínseca, eliminando a necessidade do programador explicitar em seu código os trechos onde deve haver paralelismo. As arquiteturas a fluxo de dados voltaram a ser uma área de pesquisa devido aos avanços do hardware, em particular, os avanços da Computação Reconfigurável e dos Field Programmable Gate Arrays (FPGAs).Nesta tese é descrita uma ferramenta de conversão de código que visa a geração de aplicações utilizando uma arquitetura a fluxo de dados estática. Também é descrito o projeto ChipCflow, cuja ferramenta de conversão de código, descrita nesta tese, é parte integrante. A especificação do algoritmo a ser convertido é feita em linguagem C e convertida para uma linguagem de descrição de hardware, respeitando o modelo proposto pelo ChipCflow. Os resultados alcançados visam a prova de conceito da conversão de código de uma linguagem de alto nível para uma arquitetura a fluxo de dados a ser configurada em FPGA.
15

Ngo, Dinh Thanh. "Runtime mapping of dynamic dataflow applications on heterogeneous multiprocessor platforms." Thesis, Lorient, 2015. http://www.theses.fr/2015LORIS371/document.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
La complexité et le nombre toujours plus grandissant des applications, notamment les standards vidéo, nécessite d’étudier des méthodes et outils pour leur déploiement sur des architectures elles aussi toujours plus complexes. En effet, afin d’atteindre les performances requises en matière de temps d’exécution ou consommation énergétique, les architectures modernes proposent des éléments de calculs hétérogènes, où chacun est spécialisé pour une fonction précise. Cette thèse s’appuie sur le modèle flot de données pour la spécification de l’application. Ce modèle permet d’exposer explicitement le parallélisme spatial et temporel de l’application à travers un réseau d’acteurs interconnectés par des canaux de type FIFO. Les acteurs, en charge du calcul, peuvent exhiber un comportement statique ou dynamique. Les derniers standards vidéo contraignent à s’appuyer sur les modèles dynamiques pour obtenir une spécification fonctionnelle. Les besoins de calcul sont alors dépendants des données à traiter. Le déploiement d’une application dynamique ne peut donc se faire à l’aide des approches statiques existantes dans la littérature. L’objectif de cette thèse est de proposer des algorithmes efficaces permettant de déployer à la volée une application flot de données dynamique sur une architecture multiprocesseurs hétérogène. La première contribution est un algorithme qui permet de trouver rapidement une solution de déploiement de l’application. La deuxième contribution est un algorithme basé sur les mouvements pour adapter en cours d’exécution le déploiement en réponse aux aspects dynamiques de l’application
Modern multimedia applications are subject to an increasing complexity with widespread standards. This has led to the interest in dataflow approach that offers a powerful perspective on parallel com- putations at high level. In the meantime, the emergence of massively parallel architectures has revealed the trend towards heterogeneous Multi-Processor System-on-Chips (MPSoCs) to offer a better perfor- mance and energy tradeoff than their homogeneous counterparts. However, this also imposes challenges to the mapping of multimedia applications on such complex architectures. This thesis presents an adaptive methodology for mapping dataflow applications on heterogeneous MPSoCs. This thesis focuses on video decoders specified in RVC-CAL language, a dedicated dataflow language for video applications. Existing static approaches cannot capture all behaviors in dynamic dataflow applications. Thus, this requires to adapt the mapping according to the input data. The algorithm offers some adaptive parameters combined with our analyt- ical communication model to improve a performance while consider- ing load balancing. We evaluate our algorithms on a set of randomly generated benchmarks and real video decoders like MPEG4-SP and HEVC. Experimental results reveal that our mapping methodology is fast enough (in milliseconds) and the runtime remapping signifi- cantly improves the initial mapping. In the remapping process, we take the migration cost into account because the reconfiguration time also contributes to the overall performance
16

Mandlekar, Anup Shrikant. "An Application Framework for a Power-Aware Processor Architecture." Thesis, Virginia Tech, 2012. http://hdl.handle.net/10919/34484.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The instruction-set based general purpose processors are not energy-efficient for event-driven applications. The E-textiles group at Virginia Tech proposed a novel data-flow processor architecture design to bridge the gap between event-driven applications and the target architecture. The architecture, although promising in terms of performance and energy-efficiency, was explored for limited number of applications. This thesis presents a model-driven approach for the design of an application framework, facilitating rapid development of software applications to test the architecture performance. The application framework is integrated with the prior automation framework bringing software applications at the right level of abstraction. The processor architecture design is made flexible and scalable, making it suitable for a wide range of applications. Additionally, an embedded flash memory based architecture design for reduction in the static power consumption is proposed. This thesis estimates significant reduction in overall power consumption with the incorporation of flash memory.
Master of Science
17

Shelor, Charles F. "Dataflow Processing in Memory Achieves Significant Energy Efficiency." Thesis, University of North Texas, 2018. https://digital.library.unt.edu/ark:/67531/metadc1248478/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The large difference between processor CPU cycle time and memory access time, often referred to as the memory wall, severely limits the performance of streaming applications. Some data centers have shown servers being idle three out of four clocks. High performance instruction sequenced systems are not energy efficient. The execute stage of even simple pipeline processors only use 9% of the pipeline's total energy. A hybrid dataflow system within a memory module is shown to have 7.2 times the performance with 368 times better energy efficiency than an Intel Xeon server processor on the analyzed benchmarks. The dataflow implementation exploits the inherent parallelism and pipelining of the application to improve performance without the overhead functions of caching, instruction fetch, instruction decode, instruction scheduling, reorder buffers, and speculative execution used by high performance out-of-order processors. Coarse grain reconfigurable logic in an energy efficient silicon process provides flexibility to implement multiple algorithms in a low energy solution. Integrating the logic within a 3D stacked memory module provides lower latency and higher bandwidth access to memory while operating independently from the host system processor.
18

Voigt, Sven O. [Verfasser]. "Dynamically Reconfigurable Dataflow Architecture for High-Performance Digital Signal Processing on Multi-FPGA Platforms / Sven O Voigt." Aachen : Shaker, 2009. http://d-nb.info/116130908X/34.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
19

Zheng, Chunfang. "GRAPHICAL MODELING AND SIMULATION OF A HYBRID HETEROGENEOUS AND DYNAMIC SINGLE-CHIP MULTIPROCESSOR ARCHITECTURE." UKnowledge, 2004. http://uknowledge.uky.edu/gradschool_theses/249.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
A single-chip, hybrid, heterogeneous, and dynamic shared memory multiprocessor architecture is being developed which may be used for real-time and non-real-time applications. This architecture can execute any application described by a dataflow (process flow) graph of any topology; it can also dynamically reconfigure its structure at the node and processor architecture levels and reallocate its resources to maximize performance and to increase reliability and fault tolerance. Dynamic change in the architecture is triggered by changes in parameters such as application input data rates, process execution times, and process request rates. The architecture is a Hybrid Data/Command Driven Architecture (HDCA). It operates as a dataflow architecture, but at the process level rather than the instruction level. This thesis focuses on the development, testing and evaluation of a new graphic software (hdca) developed to first do a static resource allocation for the architecture to meet timing requirements of an application and then hdca simulates the architecture executing the application using statically assigned resources and parameters. While simulating the architecture executing an application, the software graphically and dynamically displays parameters and mechanisms important to the architectures operation and performance. The new graphical software is able to show system and node level dynamic capability of the HDCA. The newly developed software can model a fixed or varying input data rate. The model also allows fault tolerance analysis of the architecture.
20

Cavenaghi, Marcos Antônio. "Implementação de um simulador para a arquitetura de dados Wolf." Universidade de São Paulo, 1992. http://www.teses.usp.br/teses/disponiveis/54/54132/tde-08062009-102639/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Esse trabalho apresenta a Proto-Arquitetura a fluxo de dados WOLF e trata da implementação de um simulador simplificado dirigido a eventos para essa arquitetura. O projeto WOLF propõe- se a implementar e estudar as características de um supercomputador de alta velocidade baseado no modelo de fluxo de dados dinâmico com granularidade variável. Para situar o trabalho é apresentado alguns conceitos envolvidos em simulações as principais implementações em f1uxo de dados. Os resultados preliminares obtidos com o simulador são apresentados. Esses resultados são analisados e comparados com dados conhecidos da arquitetura f1uxo de dados da Maquina de Manchester. Quando a maquina Proto-WOLF tiver suas características particulares desativadas, essa deve comportar-se como a Maquina de Manchester. Os resultados obtidos refletem esse comportamento. Conclui-se, portanto que o simulador implementado está se comportando de uma maneira apropriada.
This work presents the Proto-WOLF dataflow architecture and implementation of a simplified event-driven simulator for this architecture. The WOLF project is a proposal for the implementation of a supercomputer based on the dynamic dataflow model with variable granularity. In order to place the work in context, some of the basic concepts involved in simulation are presented and a survey of the most relevant works in dataflow is presented. The preliminary simulation results are presented. These results are canalized and compared with known results from the dataflow architecture of the Manchester Dataflow Machine. When the simulated Proto-WOLF machine has all its unique features disabled, it is expected that it should behave in a Manchester-like fashion. The results obtained fully agree with these. It is, then, possible to conclude that the implemented simulator is behaving in a proper manner.
21

Cavenaghi, Marcos Antônio. "Implementação e estudo da arquitetura a fluxo de dados Wolf." Universidade de São Paulo, 1997. http://www.teses.usp.br/teses/disponiveis/76/76132/tde-01062009-111139/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Esse trabalho apresenta a arquitetura a fluxo de dados Wolf. Essa arquitetura foi proposta considerando-se alguns problemas conhecidos em execução de código em arquiteturas a fluxo de dados tais como tratamento de código seqüencial e tratamento de estruturas de dados (vetores e matrizes). Wolf é baseado no modelo de fluxo de dados dinâmico e explora granulosidade variável, sendo a mais fina a nível de instrução. Alguns conceitos explorados por outras arquiteturas a fluxo de dados guiaram o desenvolvimento do Wolf. Entre esses estão o conceito de macro-dataflow e multithreading. Para o estudo da arquitetura Wolf foi desenvolvido um simulador dirigido a tempo que implementa suas características. O simulador Saw foi escrito na linguagem orientada a objetos C++ e pode ser compilado em qualquer plataforma que use um compilador padrão ANSI de 32 bits. O código do simulador foi exaustivamente testado e os resultados numéricos dos programas executados estão de acordo com resultados apresentados por uma arquitetura Von Neumann padrão. Após o estudo dos resultados obtidos com as simulações, foram identificados alguns problemas com a arquitetura Wolf. Algumas hipóteses foram testadas para solucionar tais problemas. Os resultados desses testes levaram à alteração da arquitetura. Desta alteração originou-se a arquitetura Wolf II que será apresentada também nesse trabalho. Por se tratar de uma conseqüência dos estudos realizados com a arquitetura Wolf, não serão feitos experimentos com o Wolf II.
This work presents the dataflow architecture Wolf. Wolf has been proposed in the focus of some known problems identified in previous works: the execution of data structures (vectors and matrices) and sequential code: to name a few. Wolf is based in the dynamic dataflow model and explores variable granularity (the thinnest is at instruction level). Some concepts developed in the designing of other hybrid architectures: guided the Wolf implementation. The macro-dataflow and multithreading were two of them. Focusing the study of the Wolf architecture: it has been developed a time driven simulator (Saw). The object oriented language C++ was used for this implementation. The code can be compiled on any ANSI standard 32 bits compiler. This code was exhaustively tested and the numeric results obtained with the experiments were equal to the ones obtained with Von Neumann architecture. This study identified some problems with the Wolf architecture. Some proposals were implemented in the simulator to try to identify the causes for the problems. The results led to an alteration in the Wolf architecture. The new proposed architecture (Wolf II) is described in the last chapter: but it was not submitted to experiments as Wolf was.
22

White, Joey. "USING DATAFLOW ARCHITECTURE TO SOLVE THE TRANSPORT LAG PROBLEM WHEN INTERFACING WITH AN ENGINEERING MODEL FLIGHT COMPUTER IN A TELEMETRY SIMULATION." International Foundation for Telemetering, 1991. http://hdl.handle.net/10150/613183.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
International Telemetering Conference Proceedings / November 04-07, 1991 / Riviera Hotel and Convention Center, Las Vegas, Nevada
One of the most challenging technical problems in the development of a spacecraft telemetry simulation is the interface with a flight computer running real-world flight software. The ability of the simulation to satisfy flight software requests for telemetry data, and to load, mode, and control the flight software along with the simulation, can be constrained or degraded using conventional interface solutions. Telemetry dataflow architecture systems can be utilized to solve the interface problems with less constraints. This is an especially attractive solution in a telemetry simulation where the telemetry system can also be used to format and serialize spacecraft telemetry, and receive and preprocess commands. This paper discusses the concepts developed for such a system for a training simulation of the Orbital Maneuvering Vehicle for NASA at Johnson Space Center.
23

Arumí, Albó Pau. "Real-time multimedia on off-the-shelf operating systems: from timeliness dataflow models to pattern languages." Doctoral thesis, Universitat Pompeu Fabra, 2009. http://hdl.handle.net/10803/7558.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Els sistemes multimèdia basats en programari capaços de processar àudio, vídeo i gràfics a temps-real són omnipresents avui en dia. Els trobem no només a les estacions de treball de sobre-taula sinó també als dispositius ultra-lleugers com els telèfons mòbils. Degut a que la majoria de processament es realitza mitjançant programari, usant abstraccions del maquinari i els serveis oferts pel sistema operatiu i les piles de llibreries que hi ha per sota, el desenvolupament ràpid d'aplicacions esdevé possible. A més d'aquesta immediatesa i exibilitat (comparat amb les plataformes orientades al maquinari), aquests plataformes també ofereixen capacitats d'operar en temps-real amb uns límits de latència apropiats. Malgrat tot això, els experts en el domini dels multimèdia s'enfronten a un desafiament seriós: les funcionalitats i complexitat de les seves aplicacions creixen ràpidament; mentrestant, els requeriments de temps-real (com ara la baixa latència) i els estàndards de fiabilitat augmenten. La present tesi es centra en l'objectiu de proporcionar una caixa d'eines als experts en el domini que els permeti modelar i prototipar sistemes de processament multimèdia. Aquestes eines contenen plataformes i construccions que reecteixen els requeriments del domini i de l'aplicació, i no de propietats accidentals de la implementació (com ara la sincronització entre threads i manegament de buffers). En aquest context ataquem dos problemes diferents però relacionats:la manca de models de computació adequats pel processament de fluxos multimèdia en temps-real, i la manca d'abstraccions apropiades i mètodes sistemàtics de desenvolupament de programari que suportin els esmentats models. Existeixen molts models de computació orientats-a-l'actor i ofereixen millors abstraccions que les tècniques d'enginyeria del programari dominants, per construir sistemes multimèdia de temps-real. La família de les Process Networks i els models Dataflow basades en xarxes d'actors de processat del senyal interconnectats són els més adequats pel processament de fluxos continus. Aquests models permeten expressar els dissenys de forma propera al domini del problema (en comptes de centrar-se en detalls de la implementació), i possibiliten una millor modularització i composició jeràrquica del sistema. Això és possible perquè el model no sobreespecifica com els actors s'han d'executar, sinó que només imposa dependències de dades en un estil de llenguatge declaratiu. Aquests models admeten el processat multi-freqüència i, per tant, planificacions complexes de les execucions dels actors. Però tenen un problema: els models no incorporen el concepte de temps d'una forma útil i, en conseqüència, les planifiacions periòdiques no garanteixen un comportament de temps-real i de baixa latència. Aquesta dissertació soluciona aquesta limitació a base de descriure formalment un nou model que hem anomenat Time-Triggered Synchronous Dataflow (TTSDF). En aquest nou model les planificacions periòdiques són intercalades per vàries "activacions" temporalment-disparades (time-triggered) de forma que les entrades i sortides de la xarxa de processat poden ser servides de forma regular. El model TTSDF té la mateixa expressivitat (o, en altres paraules, té computabilitat equivalent) que el model Synchronous Dataow (SDF). Però a més, té l'avantatge que garanteix la operativitat en temps-real, amb mínima latència i absència de forats i des-sincronitzacions a la sortida. Finalment, permet el balancejat de la càrrega en temps d'execució entre diferents activacions de callbacks i la paralel·lització dels actors. Els models orientats-a-l'actor no són solucions directament aplicables; no són suficients per construir sistemes multimèdia amb una metodologia sistemàtica i pròpia d'una enginyeria. També afrontem aquest problema i, per solucionar-lo, proposem un catàleg de patrons de disseny específics del domini organitzats en un llenguatge de patrons. Aquest llenguatge de patrons permet el refús del disseny, posant una especial atenció al context en el qual el disseny-solució és aplicable, les forces enfrontades que necessita balancejar i les implicacions de la seva aplicació. Els patrons proposats es centren en com: organitzar diferents tipus de connexions entre els actors, transferir dades entre els actors, habilitar la comunicació dels humans amb l'enginy del dataflow, i finalment, prototipar de forma ràpida interfícies gràfiques d'usuari per sobre de l'enginy del dataflow, creant aplicacions completes i extensibles. Com a cas d'estudi, presentem un entorn de desenvolupament (framework) orientat-a-objectes (CLAM), i aplicacions específiques construïdes al seu damunt, que fan ús extensiu del model TTSDF i els patrons contribuïts en aquesta tesi.
Software-based multimedia systems that deal with real-time audio, video and graphics processing are pervasive today, not only in desktop workstations but also in ultra-light devices such as smart-phones. The fact that most of the processing is done in software, using the high-level hardware abstractions and services offered by the underlying operating systems and library stacks, enables for quick application development. Added to this exibility and immediacy (compared to hardware oriented platforms), such platforms also offer soft real-time capabilities with appropriate latency bounds. Nevertheless, experts in the multimedia domain face a serious challenge: the features and complexity of their applications are growing rapidly; meanwhile, real-time requirements (such as low latency) and reliability standards increase. This thesis focus on providing multimedia domain experts with workbench of tools they can use to model and prototype multimedia processing systems. Such tools contain platforms and constructs that reect the requirements of the domain and application, and not accidental properties of the implementation (such as thread synchronization and buffers management). In this context, we address two distinct but related problems: the lack of models of computation that can deal with continuous multimedia streams processing in real-time, and the lack of appropriate abstractions and systematic development methods that support such models. Many actor-oriented models of computation exist and they offer better abstractions than prevailing software engineering techniques (such as object-orientation) for building real-time multimedia systems. The family of Process Networks and Dataow models based on networks of connected processing actors are the most suited for continuous stream processing. Such models allow to express designs close to the problem domain (instead of focusing in implementation details such as threads synchronization), and enable better modularization and hierarchical composition. This is possible because the model does not over-specify how the actors must run, but only imposes data dependencies in a declarative language fashion. These models deal with multi-rate processing and hence complex periodic actor's execution schedulings. The problem is that the models do not incorporate the concept of time in a useful way and, hence, the periodic schedules do not guarantee real-time and low latency requirements. This dissertation overcomes this shortcoming by formally describing a new model that we named Time-Triggered Synchronous Dataow (TTSDF), whose periodic schedules can be interleaved by several time-triggered activations" so that inputs and outputs of the processing graph are regularly serviced. The TTSDF model has the same expressiveness (or equivalent computability) than the Synchronous Dataow (SDF) model, with the advantage that it guarantees minimum latency and absence of gaps and jitter in the output. Additionally, it enables run-time load balancing between callback activations and parallelization. Actor-oriented models are not off-the-shelf solutions and do not suffice for building multimedia systems in a systematic and engineering approach. We address this problem by proposing a catalog of domain-speciffic design patterns organized in a pattern language. This pattern language provides design reuse paying special attention to the context in which a design solution is applicable, the competing forces it needs to balance and the implications of its application. The proposed patterns focus on how to: organize different kinds of actors connections, transfer tokens between actors, enable human interaction with the dataow engine, and finally, rapid prototype user interfaces on top of the dataow engine, creating complete and extensible applications. As a case study, we present an object-oriented framework (CLAM), and speciffic applications built upon it, that makes extensive use of the contributed TTSDF model and patterns.
24

Amstel, Duco van. "Optimisation de la localité des données sur architectures manycœurs." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM019/document.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
L'évolution continue des architectures des processeurs a été un moteur important de la recherche en compilation. Une tendance dans cette évolution qui existe depuis l'avènement des ordinateurs modernes est le rapport grandissant entre la puissance de calcul disponible (IPS, FLOPS, ...) et la bande-passante correspondante qui est disponible entre les différents niveaux de la hiérarchie mémoire (registres, cache, mémoire vive). En conséquence la réduction du nombre de communications mémoire requis par un code donnée a constitué un sujet de recherche important. Un principe de base en la matière est l'amélioration de la localité temporelle des données: regrouper dans le temps l'ensemble des accès à une donnée précise pour qu'elle ne soit requise que pendant peu de temps et pour qu'elle puisse ensuite être transféré vers de la mémoire lointaine (mémoire vive) sans communications supplémentaires.Une toute autre évolution architecturale a été l'arrivée de l'ère des multicoeurs et au cours des dernières années les premières générations de processeurs manycoeurs. Ces architectures ont considérablement accru la quantité de parallélisme à la disposition des programmes et algorithmes mais ceci est à nouveau limité par la bande-passante disponible pour les communications entres coeurs. Ceci a amené dans le monde de la compilation et des techniques d'optimisation des problèmes qui étaient jusqu'à là uniquement connus en calcul distribué.Dans ce texte nous présentons les premiers travaux sur une nouvelle technique d'optimisation, le pavage généralisé qui a l'avantage d'utiliser un modèle abstrait pour la réutilisation des données et d'être en même temps utilisable dans un grand nombre de contextes. Cette technique trouve son origine dans le pavage de boucles, une techniques déjà bien connue et qui a été utilisée avec succès pour l'amélioration de la localité des données dans les boucles imbriquées que ce soit pour les registres ou pour le cache. Cette nouvelle variante du pavage suit une vision beaucoup plus large et ne se limite pas au cas des boucles imbriquées. Elle se base sur une nouvelle représentation, le graphe d'utilisation mémoire, qui est étroitement lié à un nouveau modèle de besoins en termes de mémoire et de communications et qui s'applique à toute forme de code exécuté itérativement. Le pavage généralisé exprime la localité des données comme un problème d'optimisation pour lequel plusieurs solutions sont proposées. L'abstraction faite par le graphe d'utilisation mémoire permet la résolution du problème d'optimisation dans différents contextes. Pour l'évaluation expérimentale nous montrons comment utiliser cette nouvelle technique dans le cadre des boucles, imbriquées ou non, ainsi que dans le cas des programmes exprimés dans un langage à flot-de-données. En anticipant le fait d'utiliser le pavage généralisé pour la distribution des calculs entre les cœurs d'une architecture manycoeurs nous donnons aussi des éléments de réponse pour modéliser les communications et leurs caractéristiques sur ce genre d'architectures. En guise de point final, et pour montrer l'étendue de l'expressivité du graphe d'utilisation mémoire et le modèle de besoins en mémoire et communications sous-jacent, nous aborderons le sujet du débogage de performances et l'analyse des traces d'exécution. Notre but est de fournir un retour sur le potentiel d'amélioration en termes de localité des données du code évalué. Ce genre de traces peut contenir des informations au sujet des communications mémoire durant l'exécution et a de grandes similitudes avec le problème d'optimisation précédemment étudié. Ceci nous amène à une brève introduction dans le monde de l'algorithmique des graphes dirigés et la mise-au-point de quelques nouvelles heuristiques pour le problème connu de joignabilité mais aussi pour celui bien moins étudié du partitionnement convexe
The continuous evolution of computer architectures has been an important driver of research in code optimization and compiler technologies. A trend in this evolution that can be traced back over decades is the growing ratio between the available computational power (IPS, FLOPS, ...) and the corresponding bandwidth between the various levels of the memory hierarchy (registers, cache, DRAM). As a result the reduction of the amount of memory communications that a given code requires has been an important topic in compiler research. A basic principle for such optimizations is the improvement of temporal data locality: grouping all references to a single data-point as close together as possible so that it is only required for a short duration and can be quickly moved to distant memory (DRAM) without any further memory communications.Yet another architectural evolution has been the advent of the multicore era and in the most recent years the first generation of manycore designs. These architectures have considerably raised the bar of the amount of parallelism that is available to programs and algorithms but this is again limited by the available bandwidth for communications between the cores. This brings some issues thatpreviously were the sole preoccupation of distributed computing to the world of compiling and code optimization techniques.In this document we present a first dive into a new optimization technique which has the promise of offering both a high-level model for data reuses and a large field of potential applications, a technique which we refer to as generalized tiling. It finds its source in the already well-known loop tiling technique which has been applied with success to improve data locality for both register and cache-memory in the case of nested loops. This new "flavor" of tiling has a much broader perspective and is not limited to the case of nested loops. It is build on a new representation, the memory-use graph, which is tightly linked to a new model for both memory usage and communication requirements and which can be used for all forms of iterate code.Generalized tiling expresses data locality as an optimization problem for which multiple solutions are proposed. With the abstraction introduced by the memory-use graph it is possible to solve this optimization problem in different environments. For experimental evaluations we show how this new technique can be applied in the contexts of loops, nested or not, as well as for computer programs expressed within a dataflow language. With the anticipation of using generalized tiling also to distributed computations over the cores of a manycore architecture we also provide some insight into the methods that can be used to model communications and their characteristics on such architectures.As a final point, and in order to show the full expressiveness of the memory-use graph and even more the underlying memory usage and communication model, we turn towards the topic of performance debugging and the analysis of execution traces. Our goal is to provide feedback on the evaluated code and its potential for further improvement of data locality. Such traces may contain information about memory communications during an execution and show strong similarities with the previously studied optimization problem. This brings us to a short introduction to the algorithmics of directed graphs and the formulation of some new heuristics for the well-studied topic of reachability and the much less known problem of convex partitioning
25

Suzanne, Aurélie. "Decision Support Query Processing of Spanning Event Streams." Thesis, Nantes Université, 2022. http://www.theses.fr/2022NANU4022.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
L’ère du Big Data nécessite de nouvelles architectures de traitement de données, parmi lesquelles les systèmes de streaming qui sont devenus très populaires. Ces systèmes sont capables de résumer des flux de données infinis avec des agrégats sur les données les plus récentes. Cependant, jusqu’à présent, seuls les événements ponctuels ont été pris en compte et les événements à durée, qui s’étendent sur plusieurs instants, ont été laissés de côté, limités au seul monde des bases de données temporelles. Cette thèse définit un cadre pour traiter de tels mécanismes de flux sur des événements à durée. Ensuite, un moteur de requêtes d’agrégations continues est développé, capable d’incorporer la durée de vie des événements pour fournir un calcul d’agrégat exact. Il met à disposition des structures adaptées pour réaliser un calcul efficace des fenêtres glissantes. Ce moteur a ensuite été étendu pour prendre en charge le calcul partagé de plusieurs requêtes exécutées simultanément, tout en gérant correctement les événements en retard. Afin d’élaborer, en temps réel, le plan d’exécution des requêtes le plus efficace, une politique basée sur un calcul de coûts est mise en place. Tout au long de cette thèse, de nombreuses expérimentations ont été menées pour montrer la pertinence et l’efficacité des différentes approches proposées dans une grande variété de contextes et de profils de flux
The Big Data era requires new processing architectures, among which streaming systems which have become very popular. Those systems are able to summarize infinite data streams with aggregates on the most recent data. However, up to now, only point events have been considered and spanning events, which come with a duration, have been let aside, restricted to the persistent databases world only. In this thesis, a unified framework to deal with such stream mechanisms on spanning events is defined. Then, we develop an engine for Aggregate Continuous Query (ACQ), which is able to incorporate event lifespan to provide exact aggregate computation, and provides adapted structures for an efficient computation of sliding windows. This engine is further extended to handle shared computation of simultaneously running ACQs, while properly managing out-oforder events. In order to elaborate at runtime the most efficient query execution plan, a costbased policy is followed. Throughout this thesis, many experiments have been carried out to show the pertinence and the efficiency of our approaches in a lar
26

Friston, S. "Low latency rendering with dataflow architectures." Thesis, University College London (University of London), 2017. http://discovery.ucl.ac.uk/1544925/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The research presented in this thesis concerns latency in VR and synthetic environments. Latency is the end-to-end delay experienced by the user of an interactive computer system, between their physical actions and the perceived response to these actions. Latency is a product of the various processing, transport and buffering delays present in any current computer system. For many computer mediated applications, latency can be distracting, but it is not critical to the utility of the application. Synthetic environments on the other hand attempt to facilitate direct interaction with a digitised world. Direct interaction here implies the formation of a sensorimotor loop between the user and the digitised world - that is, the user makes predictions about how their actions affect the world, and see these predictions realised. By facilitating the formation of the this loop, the synthetic environment allows users to directly sense the digitised world, rather than the interface, and induce perceptions, such as that of the digital world existing as a distinct physical place. This has many applications for knowledge transfer and efficient interaction through the use of enhanced communication cues. The complication is, the formation of the sensorimotor loop that underpins this is highly dependent on the fidelity of the virtual stimuli, including latency. The main research questions we ask are how can the characteristics of dataflow computing be leveraged to improve the temporal fidelity of the visual stimuli, and what implications does this have on other aspects of the fidelity. Secondarily, we ask what effects latency itself has on user interaction. We test the effects of latency on physical interaction at levels previously hypothesized but unexplored. We also test for a previously unconsidered effect of latency on higher level cognitive functions. To do this, we create prototype image generators for interactive systems and virtual reality, using dataflow computing platforms. We integrate these into real interactive systems to gain practical experience of how the real perceptible benefits of alternative rendering approaches, but also what implications are when they are subject to the constraints of real systems. We quantify the differences of our systems compared with traditional systems using latency and objective image fidelity measures. We use our novel systems to perform user studies into the effects of latency. Our high performance apparatuses allow experimentation at latencies lower than previously tested in comparable studies. The low latency apparatuses are designed to minimise what is currently the largest delay in traditional rendering pipelines and we find that the approach is successful in this respect. Our 3D low latency apparatus achieves lower latencies and higher fidelities than traditional systems. The conditions under which it can do this are highly constrained however. We do not foresee dataflow computing shouldering the bulk of the rendering workload in the future but rather facilitating the augmentation of the traditional pipeline with a very high speed local loop. This may be an image distortion stage or otherwise. Our latency experiments revealed that many predictions about the effects of low latency should be re-evaluated and experimenting in this range requires great care.
27

Astolfi, Vitor Fiorotto. "ChipCflow - em hardware dinamicamente reconfigurável." Universidade de São Paulo, 2009. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-05032010-203142/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Nos últimos anos, houve um grande avanço na computação reconfigurável, em particular em hardware que emprega Field-Programmable Gate Arrays. Porém, esse aumento de capacidade e desempenho aumentou a distância entre a capacidade de projeto e a disponibilidade de tecnologia para o desenvolvimento do projeto. As linguagens de programação imperativas de alto nível, como C, são mais apropriadas para o desenvolvimento de aplicativos complexos que as linguagens de descrição de hardware. Por isso, surgiram diversas ferramentas para o desenvolvimento de hardware a partir de código em C. A ferramenta ChipCflow, da qual faz parte este projeto, é uma delas. A execução dos programas por meio dessa ferramenta será completamente baseada em seu fluxo de dados, seguindo o modelo dinâmico encontrado nas arquiteturas de computadores a fluxo de dados, aproveitando ao máximo o paralelismo considerado natural desse modelo e as características do hardware parcialmente reconfigurável. Neste projeto em particular, o objetivo é a prova de conceito (proof of concept) para a criação de instâncias, em forma de operadores, de um algoritmo ChipCflow em hardware parcialmente reconfigurável, tendo como base a plataforma Virtex da Xilinx
In recent years, reconfigurable computing has become increasingly more advanced, especially in hardware that uses Field-Programmable Gate Arrays. However, the increase of performance in FPGAs accumulated the gap between design capacity and technology for the development of the design. Imperative high-level programming languages such as C are more appropriate for the development of complex algorithms than hardware description languages (HDL). For this reason, many ANSI C-like programming tools for the development of hardware came to existence. The ChipCflow project, of which this project is part, is one of these tools. The execution of algorithms through this tool will be completely directed by data flow, according to the dynamic model found on Dataflow Architectures, taking advantage of its natural high levels of parallelism and the characteristics of the partially reconfigurable hardware. In this project, the objective is a proof of concept for the creation of instances, in the form of operators, of a ChipCflow algorithm on a partially reconfigurable hardware, taking as reference the Xilinx Virtex boards
28

Lopes, Joelmir José. "ChipCflow - uma ferramenta para execução de algoritmos utilizando o modelo a fluxo de dados dinâmico em hardware reconfigurável." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-05122012-154304/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Devido à complexidade das aplicações, a demanda crescente por sistemas que usam milhões de transistores e hardware complexo; tem sido desenvolvidas ferramentas que convertem C em Linguagem de Descrição de Hardware, tais como VHDL e Verilog. Neste contexto, esta tese apresenta o projeto ChipCflow, o qual usa arquitetura a fluxo de dados, para implementar lógica de alto desempenho em Field Programmable Gate Array (FPGA). Maquinas a fluxo de dados são computadores programáveis, cujo hardware é otimizado para computação paralela de granularidade fina dirigida por dados. Em outras palavras, a execução de programas é determinado pela disponibilidade dos dados, assim, o paralelismo é intrínseco neste sistema. Por outro lado, com o avanço da tecnologia da microeletrônica, o FPGA tem sido utilizado principalmente devido a sua flexibilidade, facilidade para implementar sistemas complexos e paralelismo intrínseco. Um dos desafios é criar ferramentas para programadores que usam linguagem de alto nível (HLL), como a linguagem C, e produzir hardware diretamente. Essas ferramentas devem usar a máxima experiência dos programadores, o paralelismo das arquiteturas a fluxo de dados dinâmica, a flexibilidade e o paralelismo do FPGA, para produzir um hardware eficiente, otimizado para alto desempenho e baixo consumo de energia. O projeto ChipCflow é uma ferramenta que converte os programas de aplicação escritos em linguagem C para a linguagem VHDL, baseado na arquitetura a fluxo de dados dinâmica. O principal objetivo dessa tese é definir e implementar os operadores do ChipCflow, usando a arquitetura a fluxo de dados dinâmica em FPGA. Esses operadores usam tagged tokens para identificar dados, com base em instâncias de operadores. A implementação dos operadores e das instâncias usam um modelo de implementação assíncrono em FPGA para obter maior velocidade e menor consumo
Due to the complexity of applications, the growing demand for both systems using millions of transistors and consecutive complex hardware, tools that convert C into a Hardware Description Language (HDL), as VHDL and Verilog, have been developed. In this context this thesis presents the ChipCflow project, which uses dataflow architecture to implement high-performance logics in Field Programmable Gate Array (FPGA). Dataflow machines are programmable computers whose hardware is optimized for fine-grain data-flow parallel computation. In other words the execution of programs is determined by data availability, thus parallelism is intrinsic in these systems. On the other hand, with the advance of technology of microelectronics, the FPGA has been used mainly because of its flexibility, facilities to implement complex systems and intrinsic parallelism. One of the challenges is to create tools for programmers who use HLL (High Level Language), such as C language, producing hardware directly. These tools should use the utmost experience of the programmers, the parallelism of dynamic dataflow architecture and the flexibility and parallelism of FPGA to produce efficient hardware optimized for high performance and lower power consumption. The ChipCflow project is a tool that converts application programs written in C language into VHDL, based on the dynamic dataflow architecture. The main goal in this thesis is to define and implement the operators of ChipCflow using dynamic dataflow architecture in FPGA. These operators use tagged tokens to identify data based on instances of operators and their implementation and instances use an asynchronous implementation model in FPGA to achieve faster speed and lower consumption
29

Bhagyanath, Anoop [Verfasser], and Klaus [Akademischer Betreuer] Schneider. "Code Generation for Synchronous Control Asynchronous Dataflow Architectures / Anoop Bhagyanath ; Betreuer: Klaus Schneider." Kaiserslautern : Technische Universität Kaiserslautern, 2021. http://d-nb.info/122615428X/34.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
30

MAHIOUT, ABDERRAHMANE. "Placement et ordonnancement automatiques de programmes dataflow data-paralleles sur les architectures paralleles." Paris 11, 1996. http://www.theses.fr/1996PA112268.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Ce travail de these s'est deroule dans le cadre du projet 8 1/2 de l'equipe architectures paralleles du lri. Ce projet a pour but de developper les outils theoriques et logiciels necessaires a la realisation de grandes simulations numeriques sur les architectures paralleles actuelles. Ce travail a consiste a concevoir des outils permettant le placement et l'ordonnancement automatiques de programmes sur les architectures paralleles actuelles. En effet, les environnements actuels des machines paralleles ne permettent pas une exploitation aisee de celles-ci: le programmeur doit avoir une parfaite connaissance de l'architecture sous-jacente pour developper son application et cela induit une complexite de conception. De plus, l'application est dependante de l'architecture cible, donc non portable. Nous avons propose des modeles de programmes (graphe e-dfg) et d'architecture tenant compte du contexte data-parallele des applications. Nous avons ensuite bati des outils de placement/ordonnancement bases sur l'exploitation de la localite et une politique d'ordonnancement au plus tot. Ces outils de placement/ordonnancement ont fait l'objet d'experimentations sur des benchmarks de graphes puis sur la machine parallele sp2, ce qui a permis de mettre l'accent sur l'importance de la granularite ainsi que sur la necessite d'ameliorer la modelisation de l'architecture
31

Selva, Manuel. "Performance monitoring of throughput constrained dataflow programs executed on shared-memory multi-core architectures." Thesis, Lyon, INSA, 2015. http://www.theses.fr/2015ISAL0055/document.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Les progrès continus de la microélectronique couplés au problème de gestion de la puissance dissipée ont conduit les fabricants de processeurs à se tourner vers des puces dites multi-coeurs au début des années 2000. Ces processeurs sont composés de plusieurs unités de calcul indépendantes. Contrairement aux progrès précédents ces architectures multi-coeurs, le logiciel doit être en grande parti repensé pour tirer parti de toutes les unités de calcul. Il faut pouvoir paralléliser une application séquentielle en tâches le plus indépendantes possibles pour pouvoir les exécuter sur différentes unités de calcul. Pour cela, de nombreux modèles de programmations dits concurrents ont été proposés. Dans cette thèse nous nous intéressons aux programmes décrits à l’aide du modèle dataflow. Ce travail porte sur l’évaluation des performances de programmes dataflow (forme que revêtent typiquement des applications de types traitement de flux vidéos ou protocoles de communication) sur des architectures multi-coeurs. Plus particulièrement, le sujet de la thèse porte sur l’extension de modèles de programmation dataflow avec des éléments d’expression de propriétés de qualité de service ainsi que la prise en compte de ces éléments pour détecter, à l’exécution, les goulots d’étranglement de performance au sein des programmes. Les informations concernant les goulots d'étranglements collectées pendant l'exécution sont utilisées à la fois pour faire de l'analyse hors-ligne et pour faire des adaptations pendant l'exécution des programmes. Dans le premier cas, le programmeur utilise ces informations pour savoir quelles parties du programme dataflow il faut optimiser et pour savoir comment distribuer efficacement le programme sur les unités de calcul. Dans le second cas, les informations collectées sont utilisées par des mécanismes d'adaptation automatique afin de redistribuer le travail sur les différentes unités de calcul de façon plus efficace. Nous portons une attention particulière au profiling de l'utilisation faite par les applications dataflow du système mémoire. Les informations sur les échanges de données fournies par le modèle de programmation permettent d'exploiter de façon intelligente les architectures mémoires des machines multi-coeurs. Néanmoins, la complexité de ces dernières ne permet pas de façon générale d'évaluer statiquement l'impact sur les performances des accès mémoires. Nous proposons donc la mise en place d'un système de profiling mémoire pour des applications dataflow basé sur des mécanismes matériels
Because of physical limits, hardware designers have switched to parallel systems to exploit the still growing number of transistors per square millimeter of silicon. These parallel systems are made of several independent computing units. To benefit from these computing units, software must be changed. Existing sequential applications have to be split into independent tasks to be executed in parallel on the different computing units. To that end, many concurrent programming models have been proposed and are in use today. We focus in this thesis on the dataflow concurrent programming model. This work is about performance evaluation of dataflow programs on multicore architectures. We propose to extend dataflow programming models with the notion of throughput constraints and to take this information into account in the compilation tool chain to detect at runtime the throughput bottlenecks. The profiling results gathered during the execution are used both for off-line analyzes and to adapt the application during its execution. In the former case, the developer uses this information to know which part of the dataflow program should be optimized and to efficiently distribute the program on the computing units. In the later case, the profiling information is used by runtime adaptation mechanisms to distribute differently the work on the computing units. We give a particular focus on the profiling of the usage of the memory subsystem. The data exchange information provide by the programming model allows to efficiently used the memory subsystem of multicore architectures. Nevertheless, the complexity of modern memory systems doesn't allow to statically evaluate the impact of memory accesses on the global performances of the application. We propose to set up memory profiling dedicated to dataflow applications based on hardware profiling mechanisms
32

Magna, Patrícia. "Redução dos bits de emparelhamento da máquina de fluxo de dados de Manchester." Universidade de São Paulo, 1992. http://www.teses.usp.br/teses/disponiveis/54/54132/tde-17042009-115457/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
O modelo a fluxo de dados tem grande destaque em pesquisas em arquiteturas de alto desempenho. Neste modelo, o controle de execução é feito apenas pela disponibilidade dos dados, permitindo que seja explorado o máximo de paralelismo implícito em um programa. As propostas que serão expostas neste trabalho visam solucionar um particular problema da máquina de fluxo de dados de Manchester. Esta arquitetura para tratar código reentrante, impõe que as fichas de dados, além da indicação da instrução destino, possuam um rótulo. Estas informações extras, que formam 70% da ficha de dado, fazem com que a implantação da máquina seja complexa. Assim, o hardware impõe um sério limite a velocidade de processamento, impedindo a plena utilização do modelo. Neste trabalho, serão apresentadas propostas para a redução do número de informações necessárias para o correto funcionamento da máquina, possibilitando uma implementação mais simples e mais eficiente.
The dataflow model is specially relevant you research in high-performance architectures. In this model, the execution control is done by taking into account only the dates availability, thus allowing maximum exploitation of the paralelism implicit in programs. The present work is based on the Manchester dataflow machine, which, in to order you handle the reentran code, imposes the dates token you have, in addition you the destination instruction Field, albel. Additional This information, which corresponds you 70% of the dates token, compounds the machine implementation it substantially bounds the execution speed and prevents the full model utilization. This work presents approaches will be reducing the amount of information needed will be to proper machine operation in to order you achieve to simpler and lives effective implementation.
33

Magna, Patrícia. "Proposta e simulação de uma arquitetura a fluxo de dados de segunda geração." Universidade de São Paulo, 1997. http://www.teses.usp.br/teses/disponiveis/76/76132/tde-06042009-113436/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Neste trabalho é apresentada a arquitetura SEED, proposta a partir das experiências adquiridas com as arquiteturas baseadas no modelo a fluxo de dados que foram estudadas até o presente. A arquitetura SEED utiliza o modelo a fluxo de dados para escalonar e executar blocos de instruções, visando aproveitar a principal qualidade apresentada pelo modelo, que consiste em expor o máximo de paralelismo existente nos programas. No entanto, a arquitetura explora paralelismo de granularidade mais grossa que as arquiteturas a fluxo de dados, a fim de reduzir o trafego de fichas de dados na arquitetura. Esta redução tenta resolver ou amenizar problemas como a excessiva ocupação de memória e a grande complexidade exigida do hardware. Além da especificação da funcionalidade de toda a arquitetura SEED, este trabalho apresenta uma proposta para o particionamento do código. A utilização desta proposta permite a geração de blocos de códigos que podem ser executados corretamente pela arquitetura SEED. Alguns benchmarks foram gerados utilizando essa proposta de particionamento de código. Estes benchmarks foram executados no simulador da arquitetura SEED, visando analisar e avaliar o comportamento da arquitetura com diversas configurações de hardware.
In this work is presented the SEED architecture. This architecture was proposed considering the experiences obtained with existing architectures based on dataflow model. The SEED architecture uses dataflow model to schedule and execute sets of instructions, called code blocks. This approach tries to make use of the main quality of the dataflow model that is to expose the maximum parallelism of the programs. However, this architecture explores coarser granularity than the one usually considered in dataflow architectures in order to reduce the data token traffic in the architecture. This type of reduction tries to solve problems like excessive occupation of memory and high complexity of the hardware. Besides the specification of all units that compose the SEED architecture, this work also proposes a way of partitioning programs, creating code blocks that may be executed by SEED architecture. Some benchmarks were generated using this proposal for partitioning programs. These benchmarks were executed in the SEED architecture simulator, in order to analyze the behavior of the proposed architecture under special configurations.
34

Savas, Süleyman. "Utilizing Heterogeneity in Manycore Architectures for Streaming Applications." Licentiate thesis, Högskolan i Halmstad, Centrum för forskning om inbyggda system (CERES), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-33792.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In the last decade, we have seen a transition from single-core to manycore in computer architectures due to performance requirements and limitations in power consumption and heat dissipation. The first manycores had homogeneous architectures consisting of a few identical cores. However, the applications, which are executed on these architectures, usually consist of several tasks requiring different hardware resources to be executed efficiently. Therefore, we believe that utilizing heterogeneity in manycores will increase the efficiency of the architectures in terms of performance and power consumption. However, development of heterogeneous architectures is more challenging and the transition from homogeneous to heterogeneous architectures will increase the difficulty of efficient software development due to the increased complexity of the architecture. In order to increase the efficiency of hardware and software development, new hardware design methods and software development tools are required. Additionally, there is a lack of knowledge on the performance of applications when executed on manycore architectures. The transition began with a shift from single-core architectures to homogeneous multicore architectures consisting of a few identical cores. It now continues with a shift from homogeneous architectures with identical cores to heterogeneous architectures with different types of cores specialized for different purposes. However, this transition has increased the complexity of architectures and hence the complexity of software development and execution. In order to decrease the complexity of software development, new software tools are required. Additionally, there is a lack of knowledge on what kind of heterogeneous manycore design is most efficient for different applications and what are the performances of these applications when executed on current commercial manycores. This thesis studies manycore architectures in order to reveal possible uses of heterogeneity in manycores and facilitate choice of architecture for software and hardware developers. It defines a taxonomy for manycore architectures that is based on the levels of heterogeneity they contain and discusses benefits and drawbacks of these levels. Additionally, it evaluates several applications, a dataflow language (CAL), a source-to-source compilation framework (Cal2Many), and a commercial manycore architecture (Epiphany). The compilation framework takes implementations written in the dataflow language as input and generates code targetting different manycore platforms. Based on these evaluations, the thesis identifies the bottlenecks of the architecture. It finally presents a methodology for developing heterogeneoeus manycore architectures which target specific application domains. Our studies show that using different types of cores in manycore architectures has the potential to increase the performance of streaming applications. If we add specialized hardware blocks to a core, the performance easily increases by 15x for the target application while the core size increases by 40-50% which can be optimized further. Other results prove that dataflow languages, together with software development tools, decrease software development efforts significantly (25-50%) while having a small impact (2-17%) on the performance.
HiPEC (High Performance Embedded Computing)
NGES (Towards Next Generation Embedded Systems: Utilizing Parallelism and Reconfigurability)
35

Menon, Suraj S. "Supporting Distributed Fault Tolerance In A Real-Time Micro-Kernel." Thesis, Virginia Tech, 2006. http://hdl.handle.net/10919/35463.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Research into modular approaches for constructing power electronics control systems has provided a number of benefits, as well as new opportunities. Control systems composed of an interconnected collection of standardized parts makes distributed processing a realistic possibility. Unfortunately, current strategies to supporting software on such systems have a number of critical drawbacks. Many existing approaches rely on centralized control strategies, fail to support fault tolerance in the face of failures among processing nodes or communications links, and fail to robustly support live addition or removal of nodes from a running network. In this context, failure of a single element means failure of the entire system. This thesis describes research to extend the Dataflow Architecture Real-time Kernel (DARK) to support distributed, fault-tolerant execution of control algorithms for power electronics control systems. An appropriate scheme for fault-tolerant scheduling of processes on distributed processing nodes is described, added to DARK, and evaluated. Literature indicates that fault-tolerant multiprocessor scheduling for hard real-time tasks with task precedence constraints is an NP-hard problem. The new system is based on an off-line fault-tolerant scheduling strategy that generates a static schedule of tasks for each processing unit to follow. This algorithm handles both the task precedence constraints and the constraints imposed by the underlying network protocol(DRPESNET). Modifications to the underlying daisy-chained, packet-switched, time-triggered ring network protocol to support communications fault tolerance and plug-and-play addition or removal of live nodes from an existing control system are also described.
Master of Science
36

Georgiou, Yiannis. "Contributions for resource and job management in high performance computing." Grenoble, 2010. http://www.theses.fr/2010GRENM079.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Le domaine du Calcul à Haute Performance (HPC) évolue étroitement avec les dernières avancées technologiques des architectures informatiques et des besoins toujours croissants en demande de puissance de calcul. Cette thèse s'intéresse à l'étude d'un type d'intergiciel particulier appelé gestionnaire de tâches et ressources (RJMS) qui est chargé de distribuer la puissance de calcul aux applications dans les plateformes pour le HPC. Le RJMS joue un rôle central du fait de sa position dans la pile logicielle. Les dernières évolutions dans les couches matérielles et dans les applications ont largement augmenté le niveau de complexité auquel doit faire face ce type d'intergiciel. Des problématiques telles que le passage à l'échelle, la prise en compte d'un taux d'activité irrégulier, la gestion des contraintes liées à la topologie du matériel, l'efficacité énergétique et la tolérance aux pannes doivent être particulièrement pris en considération, afin, entre autres, de fournir une meilleure exploitation des ressources à la fois du point de vue global du système ainsi que de celui des utilisateurs. La première contribution de cette thèse est un état de l'art sur la gestion des tâches et des ressources ainsi qu'une analyse comparative des principaux intergiciels actuels et des différentes problématiques de recherche associées. Une métrique importante pour évaluer l'apport d'un RJMS sur une plate-forme est le niveau d'utilisation de l'ensemble du système. On constate parmi les traces d'activité de plusieurs plateformes qu'un grand nombre d'entre elles présentent un taux d'utilisation significativement inférieure à une pleine utilisation. Ce constat est la principale motivation des autres contributions de cette thèse qui portent sur les méthodes d'exploitations de ces périodes de sous-utilisation au profit de la gestion globale du système ou des applications en court d'exécution. Plus particulièrement cette thèse explore premièrement, les moyens d'accroître le taux de calculs utiles dans le contexte des grilles légères en présence d'une forte variabilité de la disponibilité des ressources de calcul. Deuxièmement, nous avons étudié le cas des tâches dynamiques et proposé différentes techniques s'intégrant au RJMS OAR et troisièmement nous évalués plusieurs modes d'exploitation des ressources en prenant en compte la consommation énergétique. Finalement, les évaluations de cette thèse reposent sur une approche expérimentale pour laquelle nous avons proposés des outils et une méthodologie permettant d'améliorer significativement la maîtrise et la reproductibilité d'expériences complexes propre à ce domaine d'étude
High Performance Computing is characterized by the latest technological evolutions in computing architectures and by the increasing needs of applications for computing power. A particular middleware called Resource and Job Management System (RJMS), is responsible for delivering computing power to applications. The RJMS plays an important role in HPC since it has a strategic place in the whole software stack because it stands between the above two layers. However, the latest evolutions in hardware and applications layers have provided new levels of complexities to this middleware. Issues like scalability, management of topological constraints, energy efficiency and fault tolerance have to be particularly considered, among others, in order to provide a better system exploitation from both the system and user point of view. This dissertation provides a state of the art upon the fundamental concepts and research issues of Resources and Jobs Management Systems. It provides a multi-level comparison (concepts, functionalities, performance) of some Resource and Jobs Management Systems in High Performance Computing. An important metric to evaluate the work of a RJMS on a platform is the observed system utilization. However, studies and logs of production platforms show that HPC systems in general suffer of significant un-utilization rates. Our study deals with these clusters' un-utilization periods by proposing methods to aggregate otherwise un-utilized resources for the benefit of the system or the application. More particularly this thesis explores RJMS level mechanisms: 1) for increasing the jobs valuable computation rates in the high volatile environments of a lightweight grid context, 2) for improving system utilization with malleability techniques and 3) providing energy efficient system management through the exploitation of idle computing machines. The experimentation and evaluation in this type of contexts provide important complexities due to the inter-dependency of multiple parameters that have to be taken into control. In this thesis we have developed a methodology based upon real-scale controlled experimentation with submission of synthetic or real workload traces
37

Stan, Oana. "Placement of tasks under uncertainty on massively multicore architectures." Thesis, Compiègne, 2013. http://www.theses.fr/2013COMP2116/document.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Ce travail de thèse de doctorat est dédié à l'étude de problèmes d'optimisation combinatoire du domaine des architectures massivement parallèles avec la prise en compte des données incertaines tels que les temps d'exécution. On s'intéresse aux programmes sous contraintes probabilistes dont l'objectif est de trouver la meilleure solution qui soit réalisable avec un niveau de probabilité minimal garanti. Une analyse quantitative des données incertaines à traiter (variables aléatoires dépendantes, multimodales, multidimensionnelles, difficiles à caractériser avec des lois de distribution usuelles), nous a conduit à concevoir une méthode qui est non paramétrique, intitulée "approche binomiale robuste". Elle est valable quelle que soit la loi jointe et s'appuie sur l'optimisation robuste et sur des tests d'hypothèse statistique. On propose ensuite une méthodologie pour adapter des algorithmes de résolution de type approchée pour résoudre des problèmes stochastiques en intégrant l'approche binomiale robuste afin de vérifier la réalisabilité d'une solution. La pertinence pratique de notre démarche est enfin validée à travers deux problèmes issus de la compilation des applications de type flot de données pour les architectures manycore. Le premier problème traite du partitionnement stochastique de réseaux de processus sur un ensemble fixé de nœuds, en prenant en compte la charge de chaque nœud et les incertitudes affectant les poids des processus. Afin de trouver des solutions robustes, un algorithme par construction progressive à démarrages multiples a été proposé ce qui a permis d'évaluer le coût des solution et le gain en robustesse par rapport aux solutions déterministes du même problème. Le deuxième problème consiste à traiter de manière globale le placement et le routage des applications de type flot de données sur une architecture clustérisée. L'objectif est de placer les processus sur les clusters en s'assurant de la réalisabilité du routage des communications entre les tâches. Une heuristique de type GRASP a été conçue pour le cas déterministe, puis adaptée au cas stochastique clustérisé
This PhD thesis is devoted to the study of combinatorial optimization problems related to massively parallel embedded architectures when taking into account uncertain data (e.g. execution time). Our focus is on chance constrained programs with the objective of finding the best solution which is feasible with a preset probability guarantee. A qualitative analysis of the uncertain data we have to treat (dependent random variables, multimodal, multidimensional, difficult to characterize through classical distributions) has lead us to design a non parametric method, the so-called "robust binomial approach", valid whatever the joint distribution and which is based on robust optimization and statistical hypothesis testing. We also propose a methodology for adapting approximate algorithms for solving stochastic problems by integrating the robust binomial approach when verifying for solution feasibility. The paractical relevance of our approach is validated through two problems arising in the compilation of dataflow application for manycore platforms. The first problem treats the stochastic partitioning of networks of processes on a fixed set of nodes, by taking into account the load of each node and the uncertainty affecting the weight of the processes. For finding stochastic solutions, a semi-greedy iterative algorithm has been proposed which allowed measuring the robustness and cost of the solutions with regard to those for the deterministic version of the problem. The second problem consists in studying the global placement and routing of dataflow applications on a clusterized architecture. The purpose being to place the processes on clusters such that it exists a feasible routing, a GRASP heuristic has been conceived first for the deterministic case and afterwards extended for the chance constrained variant of the problem
38

Bodin, Bruno. "Analyse d'Applications Flot de Données pour la Compilation Multiprocesseur." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2013. http://tel.archives-ouvertes.fr/tel-00922578.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Les systèmes embarqués sont des équipements électroniques et informatiques, soumis à de nombreuses contraintes et dont le fonctionnement doit être continu. Pour définir le comportement de ces systèmes, les modèles de programmation dataflows sont souvent utilisés. Ce choix de modèle est motivé d'une part, parce qu'ils permettent de décrire un comportement cyclique, nécessaire aux systèmes embarqués ; et d'autre part, parce que ces modèles s'apprêtent à des analyses qui peuvent fournir des garanties de fonctionnement et de performance essentielles. La société Kalray propose une architecture embarquée, le MPPA. Il est accompagné du langage de programmation ΣC. Ce langage permet alors de décrire des applications sous forme d'un modèle dataflow déjà très étudié, le modèle Cyclo-Static Dataflow Graph(CSDFG). Cependant, les CSDFG générés par ce langage sont souvent trop complexes pour permettre l'utilisation des techniques d'analyse existantes. L'objectif de cette thèse est de fournir des outils algorithmiques qui résolvent les différentes étapes d'analyse nécessaires à l'étude d'une application ΣC, mais dans un temps d'exécution raisonnable, et sur des instances de grande taille. Nous étudions trois problèmes d'analyse distincts : le test de vivacité, l'évaluation du débit maximal, et le dimensionnement mémoire. Pour chacun de ces problèmes, nous fournissons des méthodes algorithmiques rapides, et dont l'efficacité a été vérifiée expérimentalement. Les méthodes que nous proposons sont issues de résultats sur les ordonnancements périodiques ; elles fournissent des résultats approchés et sans aucune garantie de performance. Pour pallier cette faiblesse, nous proposons aussi de nouveaux outils d'analyse basés sur les ordonnancements K-périodiques. Ces ordonnancements généralisent nos travaux d'ordonnancement périodiques et nous permettrons dans un avenir proche de concevoir des méthodes d'analyse bien plus efficaces.
39

Silva, Bruno de Abreu. "Gerenciamento de tags na arquitetura ChipCflow - uma máquina a fluxo de dados dinâmica." Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-17052011-085128/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Nos últimos anos, percebeu-se uma crescente busca por softwares e arquiteturas alternativas. Essa busca acontece porque houve avanços na tecnologia do hardware e estes avanços devem ser complementados por inovações nas metodologias de projetos, testes e verificação para que haja um uso eficaz da tecnologia. Muitos dos softwares e arquiteturas alternativas, geralmente partem para modelos que exploram o paralelismo das aplicações, ao contrário do modelo de von Neumann. Dentre as arquiteturas alternativas de alto desempenho, tem-se a arquitetura a fluxo de dados. Nesse tipo de arquitetura, o processo de execução de programas é determinado pela disponibilidade dos dados. Logo, o paralelismo está embutido na própria natureza do sistema. O modelo a fluxo de dados possui a vantagem de expressar o paralelismo de maneira intrínseca, eliminando a necessidade de o programador explicitar em seu código os trechos onde deve haver paralelismo. As arquiteturas a fluxo de dados voltaram a ser um tema de pesquisa devido aos avanços do hardware, em particular, os avanços da Computação Reconfigurável e os FPGAs (Field-Programmable Gate Arrays). O projeto ChipCflow é uma ferramenta para execução de algoritmos usando o modelo a fluxo de dados dinâmico em FPGA. Este trabalho apresenta o formato para os tagged-tokens do ChipCflow, os operadores de manipulação das tags dos tokens e suas implementações a fim de que se tenha a PROVA-DE-CONCEITOS para tais operadores na arquitetura ChipCflow
The alternative architectures and softwares researches have been growing in the last years. These researches are happening due to the advance of hardware technology and such advances must be complemented by improvements on design methodologies, test and verification techniques in order to use technology effectively. Many of the alternative architectures and softwares, in general, explore the parallelism of applications, differently to von Neumann model. Among high performance alternative architectures, there is the Dataflow Architecture. In this kind of architecture, the execution of programs is determined by data availability, thus the parallelism is intrinsic in these systems. The dataflow architectures become again a highlighted research area due to hardware advances, in particular, the advances of Reconfigurable Computing and FPGAs (Field-Programmable Gate Arrays). ChipCflow project is a tool for execution of algorithms using dynamic dataflow graph in FPGA. The main goal in this module of the ChipCflow project is to define the tagged-token format, the iterative operators that will manipulate the tags of tokens and to implement them
40

Arnesen, Adam T. "Increasing Design Productivity for FPGAs Through IP Reuse and Meta-Data Encapsulation." BYU ScholarsArchive, 2011. https://scholarsarchive.byu.edu/etd/2614.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
As Moore's law continues to progress, it is becoming increasingly difficult for hardware designers to fully utilize the increasing number of transistors available semiconductor devices including FPGAs. This design productivity gap must be addressed to allow designs to take full advantage of the increased logic density that results from rising transistor density. The reuse of previously developed and verified intellectual property (IP) is one approach that has claimed to narrow the design productivity gap. Reuse, however, has proved difficult to realize in practice because of the complexity of IP and the reluctance of designers to reuse IP that they do not understand. This thesis proposes to narrow the design productivity gap for FPGAs by simplifying the reuse problem by encapsulating IP with extra machine-readable information or meta-data. This meta-data simplifies reuse by providing a language independent format for composing complex systems, providing a parameter representation system, defining high-level data types for FPGA IP, and allowing arbitrary IP to be described as actors in the homogeneous synchronous dataflow model of computation.This work implements meta-data in XML and presents two XML schemas that enable reuse. A new XML schema known as CHREC XML is presented as well as extensions that enable IP-XACT to be used to describe FPGA dataflow IP. Two tools developed in this work are also presented that leverage meta-data to simplify reuse of arbitrary IP. These tools simplify structural composition of IP, allow designers to manipulate parameters, check and validate high-level data types, and automatically synthesize control circuitry for dataflow designs. Productivity improvements are also demonstrated by reusing IP to quickly compose software radio receivers.
41

Arras, Paul-Antoine. "Ordonnancement d'applications à flux de données pour les MPSoC embarqués hybrides comprenant des unités de calcul programmables et des accélérateurs matériels." Thesis, Bordeaux, 2015. http://www.theses.fr/2015BORD0031/document.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Bien que de nombreux appareils numériques soient aujourd'hui capables de lire des contenus vidéo en temps réel et d'offrir une restitution de grande qualité, le décodage vidéo dans les systèmes embarqués n'en est pas pour autant devenu une opération anodine. En effet, les codecs récents tels que H.264 et HEVC sont d'une complexité telle que le recours à des architectures mixtes logiciel/matériel est presque incontournable. Or les plateformes de ce type sont notoirement difficiles à programmer efficacement. Cette thèse relève le défi du développement d'applications à flux de données pour les cibles embarquées hybrides et de leur exécution efficace, et propose plusieurs contributions. La première est une extension des heuristiques d'ordonnancement de liste pour tenir compte des contraintes mémorielles. La seconde est un modèle d'exécution à flot de données compatible avec la plupart des modèles existants et avec une large classe de plateformes matérielles, ainsi qu'un ordonnanceur dynamique. Enfin, de nombreux développements ont été menés sur une architecture réelle de STMicroelectronics pour démontrer la faisabilité de l'approche
Although numerous electronic devices are nowadays able to play video contents in real time and offer high-quality reproduction, video decoding in embedded systems has not become a trivial process yet. As a mater of fact, recent codecs such as H.264 and HEVC exhibit such a complexity that resorting to mixed sofware-hardware architecture is almost unavoidable. However, programming efficiently this kind of platforms is well-known to be tricky. This thesis addresses the issue of developing streaming applications for hybrid embedded targets and executing them efficiently, and proposes several contributions. The first one is an extension of the classical list-scheduling heuristics to take memory constraints into account. Te second one is a datafow execution model compatible with most existing models and with a large set of hardware platforms, as well as a dynamic scheduler. Lastly, numerous developments have been carried out on a real-world architecture from STMicroelectronics so as to demonstrate the feasibility of the approach
42

Glanon, Philippe Anicet. "Deployment of loop-intensive applications on heterogeneous multiprocessor architectures." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASG029.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Les systèmes cyber-physiques (CPS en anglais) sont des systèmes distribués qui intègrent un large panel d'applications logicielles et de ressources de calcul hétérogènes connectées par divers moyens de communication (filaire ou non-filaire). Ces systèmes ont pour caractéristique de traiter en temps-réel, un volume important de données provenant de processus physiques, chimiques ou biologiques. Une problématique essentielle dans la phase de conception des CPSs est de prédire le comportement temporel des applications logicielles et de fournir des garanties de performances pour ces applications. Afin de répondre à cette problématique, des stratégies d'ordonnancement statique sont nécessaires. Ces stratégies doivent tenir compte de plusieurs contraintes, notamment les contraintes de dépendances cycliques induites par les boucles de calcul des applications ainsi que les contraintes de ressource et de communication des architectures de calcul. En effet, les boucles étant l'une des parties les plus critiques en temps d'exécution pour plusieurs applications de calcul intensif, le comportement temporel et les performances optimales des applications logicielles dépendent de l'ordonnancement optimal des structures de boucles embarqués dans les programmes de calcul. Pour prédire le comportement temporel des applications logicielles et fournir des garanties de performances pour ces applications, les stratégies d'ordonnancement statiques doivent donc explorer et exploiter efficacement le parallélisme embarqué dans les patterns d'exécution des programmes à boucles intensives tout en garantissant le respect des contraintes de ressources et de communication des architectures de calcul. L'ordonnancement d'un programme à boucles intensives sous contraintes ressources et communication est un problème complexe et difficile. Afin de résoudre efficacement ce problème, il est indispensable de concevoir des heuristiques. Cependant, pour concevoir des heuristiques efficaces, il est important de caractériser l'ensemble des solutions optimales pour le problème d'ordonnancement. Une solution optimale pour un problème d'ordonnancement est un ordonnancement qui réalise un objectif optimal de performance. Dans cette thèse, nous nous intéressons au problème d'ordonnancement des programmes à boucles intensives sur des architectures de calcul multiprocesseurs hétérogènes sous des contraintes de ressource et de communication, avec l'objectif d'optimiser le débit de fonctionnement des applications logicielles. Pour ce faire, nous utilisons les modèles de flots de données statiques pour décrire les structures de boucles spécifiées dans les programmes de calcul et nous concevons des stratégies d'ordonnancement périodiques sur la base des propriétés structurelles et mathématiques de ces modèles afin de générer des solutions optimales et approximatives d'ordonnancement
Cyber-physical systems (CPSs) are distributed computing-intensive systems, that integrate a wide range of software applications and heterogeneous processing resources, each interacting with the other ones through different communication resources to process a large volume of data sensed from physical, chemical or biological processes. An essential issue in the design stage of these systems is to predict the timing behaviour of software applications and to provide performance guarantee to these applications. In order tackle this issue, efficient static scheduling strategies are required to deploy the computations of software applications on the processing architectures. These scheduling strategies should deal with several constraints, which include the loop-carried dependency constraints between the computational programs as well as the resource and communication constraints of the processing architectures intended to execute these programs. Actually, loops being one of the most time-critical parts of many computing-intensive applications, the optimal timing behaviour and performance of the applications depends on the optimal schedule of loops structures enclosed in the computational programs executed by the applications. Therefore, to provide performance guarantee for the applications, the scheduling strategies should efficiently explore and exploit the parallelism embedded in the repetitive execution patterns of loops while ensuring the respect of resource and communications constraints of the processing architectures of CPSs. Scheduling a loop under resource and communication constraints is a complex problem. To solve it efficiently, heuristics are obviously necessary. However, to design efficient heuristics, it is important to characterize the set of optimal solutions for the scheduling problem. An optimal solution for a scheduling problem is a schedule that achieve an optimal performance goal. In this thesis, we tackle the study of resource-constrained and communication-constrained scheduling of loop-intensive applications on heterogeneous multiprocessor architectures with the goal of optimizing throughput performance for the applications. In order to characterize the set of optimal scheduling solutions and to design efficient scheduling heuristics, we use synchronous dataflow (SDF) model of computation to describe the loop structures specified in the computational programs of software applications and we design software pipelined scheduling strategies based on the structural and mathematical properties of the SDF model
43

Farabet, Clément. "Analyse sémantique des images en temps-réel avec des réseaux convolutifs." Phd thesis, Université Paris-Est, 2013. http://tel.archives-ouvertes.fr/tel-00965622.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Une des questions centrales de la vision informatique est celle de la conception et apprentissage de représentations du monde visuel. Quel type de représentation peut permettre à un système de vision artificielle de détecter et classifier les objects en catégories, indépendamment de leur pose, échelle, illumination, et obstruction. Plus intéressant encore, comment est-ce qu'un tel système peut apprendre cette représentation de façon automatisée, de la même manière que les animaux et humains parviennent à émerger une représentation du monde qui les entoure. Une question liée est celle de la faisabilité calculatoire, et plus précisément celle de l'efficacité calculatoire. Étant donné un modèle visuel, avec quelle efficacité peut-il être entrainé, et appliqué à de nouvelles données sensorielles. Cette efficacité a plusieurs dimensions: l'énergie consommée, la vitesse de calcul, et l'utilisation mémoire. Dans cette thèse je présente trois contributions à la vision informatique: (1) une nouvelle architecture de réseau convolutif profond multi-échelle, permettant de capturer des relations longue distance entre variables d'entrée dans des données type image, (2) un algorithme à base d'arbres permettant d'explorer de multiples candidats de segmentation, pour produire une segmentation sémantique avec confiance maximale, (3) une architecture de processeur dataflow optimisée pour le calcul de réseaux convolutifs profonds. Ces trois contributions ont été produites dans le but d'améliorer l'état de l'art dans le domain de l'analyse sémantique des images, avec une emphase sur l'efficacité calculatoire. L'analyse de scènes (scene parsing) consiste à étiqueter chaque pixel d'une image avec la catégorie de l'objet auquel il appartient. Dans la première partie de cette thèse, je propose une méthode qui utilise un réseau convolutif profond, entrainé à même les pixels, pour extraire des vecteurs de caractéristiques (features) qui encodent des régions de plusieurs résolutions, centrées sur chaque pixel. Cette méthode permet d'éviter l'usage de caractéristiques créées manuellement. Ces caractéristiques étant multi-échelle, elles permettent au modèle de capturer des relations locales et globales à la scène. En parallèle, un arbre de composants de segmentation est calculé à partir de graphe de dis-similarité des pixels. Les vecteurs de caractéristiques associés à chaque noeud de l'arbre sont agrégés, et utilisés pour entrainé un estimateur de la distribution des catégories d'objets présents dans ce segment. Un sous-ensemble des noeuds de l'arbre, couvrant l'image, est ensuite sélectionné de façon à maximiser la pureté moyenne des distributions de classes. En maximisant cette pureté, la probabilité que chaque composant ne contienne qu'un objet est maximisée. Le système global produit une précision record sur plusieurs benchmarks publics. Le calcul de réseaux convolutifs profonds ne dépend que de quelques opérateurs de base, qui sont particulièrement adaptés à une implémentation hardware dédiée. Dans la deuxième partie de cette thèse, je présente une architecture de processeur dataflow dédiée et optimisée pour le calcul de systèmes de vision à base de réseaux convolutifs--neuFlow--et un compilateur--luaFlow--dont le rôle est de compiler une description haut-niveau (type graphe) de réseaux convolutifs pour produire un flot de données et calculs optimal pour l'architecture. Ce système a été développé pour faire de la détection, catégorisation et localisation d'objets en temps réel, dans des scènes complexes, en ne consommant que 10 Watts, avec une implémentation FPGA standard.
44

Wei, Ching-Jen, and 魏慶仁. "AN EFFICIENT DATAFLOW ARCHITECTURE FOR MACRO ACTOR PROCESSING." Thesis, 1995. http://ndltd.ncl.edu.tw/handle/96458741982320537024.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
碩士
大同工學院
資訊工程研究所
83
The dataflow execution which potentially exploits the parallelism hidden in the program can achieve the speedup efficiently. According to the availability of operands, the corresponding instructions can be fired if all of the necessary operands have arrived, that is, instruction can be executed without waiting for previous instruction if no dependency encountered. Through dataflow execution model can exploit all levels of parallelism, advantages of locality in sequential execution is not adopted. The proposed architecture adopts both the advantages of locality and exploitation of parallelism. The processor applies double queues to reduce the bubble instruction between match and execution units, so that the utilization is promoted. The simplicity of architecture is also one of the features. Based on the high level dataflow language, SISAL, and the related translator, the software environment is built for our architecture. According to the partition criteria, the dataflow graph (DFG) is partitioned and allocated to the processing elements. Simulation results under the Scientific and Engineering Software (SES) show this an efficient processor.
45

"Constraint extension to dataflow network." 2004. http://library.cuhk.edu.hk/record=b5891959.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Tsang Wing Yee.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.
Includes bibliographical references (leaves 90-93).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 2 --- Preliminaries --- p.4
Chapter 2.1 --- Constraint Satisfaction Problems --- p.4
Chapter 2.2 --- Dataflow Networks --- p.5
Chapter 2.3 --- The Lucid Programming Language --- p.9
Chapter 2.3.1 --- Daton Domain --- p.10
Chapter 2.3.2 --- Constants --- p.10
Chapter 2.3.3 --- Variables --- p.10
Chapter 2.3.4 --- Dataflow Operators --- p.11
Chapter 2.3.5 --- Functions --- p.16
Chapter 2.3.6 --- Expression and Statement --- p.17
Chapter 2.3.7 --- Examples --- p.17
Chapter 2.3.8 --- Implementation --- p.19
Chapter 3 --- Extended Dataflow Network --- p.25
Chapter 3.1 --- Assertion Arcs --- p.25
Chapter 3.2 --- Selection Operators --- p.27
Chapter 3.2.1 --- The Discrete Choice Operator --- p.27
Chapter 3.2.2 --- The Discrete Committed Choice Operator --- p.29
Chapter 3.2.3 --- The Range Choice Operators --- p.29
Chapter 3.2.4 --- The Range Committed Choice Operators --- p.32
Chapter 3.3 --- Examples --- p.33
Chapter 3.4 --- E-Lucid --- p.39
Chapter 3.4.1 --- Modified Four Cockroaches Problem --- p.42
Chapter 3.4.2 --- Traffic Light Problem --- p.45
Chapter 3.4.3 --- Old Maid Problem --- p.48
Chapter 4 --- Implementation of E-Lucid --- p.54
Chapter 4.1 --- Overview --- p.54
Chapter 4.2 --- Definition of Terms --- p.56
Chapter 4.3 --- Function ELUCIDinterpreter --- p.57
Chapter 4.4 --- Function Edemand --- p.58
Chapter 4.5 --- Function transf ormD --- p.59
Chapter 4.5.1 --- Labelling Datastreams of Selection Operators --- p.59
Chapter 4.5.2 --- Removing Committed Choice Operators --- p.62
Chapter 4.5.3 --- "Removing asa, wvr, and upon" --- p.62
Chapter 4.5.4 --- Labelling Output Datastreams of if-then-else-fi --- p.63
Chapter 4.5.5 --- Transforming Statements to Daton Statements --- p.63
Chapter 4.5.6 --- Transforming Daton Expressions Recursively --- p.65
Chapter 4.5.7 --- An Example --- p.65
Chapter 4.6 --- "Functions constructCSP, f indC, and transf ormC" --- p.68
Chapter 4.7 --- An Example --- p.75
Chapter 4.8 --- Function backtrack --- p.77
Chapter 5 --- Related Works --- p.83
Chapter 6 --- Conclusion --- p.87
46

Chung, Hua-Yuan, and 鍾華堃. "VHDL Implementation of Scheduled Dataflow Architecture and the Register Context." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/79120224172051556308.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
碩士
輔仁大學
資訊工程學系
98
Since the invention of microprocessors around 1970, CPU performance improvement together with the ILP had been the main focus in the computer industry. Around the year 2000, ILP seemed to have reached a limit, together with the power consumption and heat dissipation emerged multi-core era. The focus has shifted from ILP to TLP and efficient use of multi-core processors. However, the RAW hazard detection technique relies on complex hardware in the current computers which may cause the designers to make the CPU consume lot of energy and the design more complex. In this particular research we propose a totally different architecture and a different way to solve the RAW hazard. By using dataflow paradigm, we can naturally eliminate the RAW hazards. Besides, this architecture comes as a new paradigm to closely link the ILP and TLP by combining sequential and dataflow paradigm. This is named as Scheduled Dataflow Architecture (SDF). SDF is a non-blocking multithreaded decoupled dataflow architecture, because the main engine relies on dataflow paradigm. Since it is a decoupled architecture, the synchronization processor is responsible for data access and the execution processor is responsible for execution of all the instructions. Previously SDF was simulated in C++ and C languages [19-20]. For more precisely to imitate the hardware complexity, this simulation uses VHDL to implement SDF and simulated it by ModelSIM. We have also tested using Altera DE2 hardware. The main focus of this research is to measure the performance gain having more register context. When a multithreaded architecture is used, passing of data between threads can happen through the frame memory. If we use the register context, and efficiently pass the data to the following threads that need the results of the previous thread, several memory accesses can be reduced, thus improving the performance of a program. To test the SDF, we have also used the program into CycloneII FPGA chip of DE2 board. SDF uses at least 50% of the resource of CycloneII. CycloneII can synthesis SDF using at most four register sets. We used for all these synthesis and found that SDF requires at least two register sets to run multithreaded program concurrently.
47

CAI, FENG-ZHOU, and 蔡豐洲. "A pipeline bubbles reduction technique for the Monsoon dataflow architecture." Thesis, 1993. http://ndltd.ncl.edu.tw/handle/32228626885367156032.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
48

Varadarajan, Keshavan. "A Coarse Grained Reconfigurable Architecture Framework Supporting Macro-Dataflow Execution." Thesis, 2012. http://etd.iisc.ernet.in/handle/2005/2302.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
A Coarse-Grained Reconfigurable Architecture (CGRA) is a processing platform which constitutes an interconnection of coarse-grained computation units (viz. Function Units (FUs), Arithmetic Logic Units (ALUs)). These units communicate directly, viz. send-receive like primitives, as opposed to the shared memory based communication used in multi-core processors. CGRAs are a well-researched topic and the design space of a CGRA is quite large. The design space can be represented as a 7-tuple (C, N, T, P, O, M, H) where each of the terms have the following meaning: C -choice of computation unit, N -choice of interconnection network, T -Choice of number of context frame (single or multiple), P -presence of partial reconfiguration, O choice of orchestration mechanism, M -design of memory hierarchy and H host-CGRA coupling. In this thesis, we develop an architectural framework for a Macro-Dataflow based CGRA where we make the following choice for each of these parameters: C -ALU, N -Network-on-Chip (NoC), T -Multiple contexts, P -support for partial reconfiguration, O -Macro Dataflow based orchestration, M -data memory banks placed at the periphery of the reconfigurable fabric (reconfigurable fabric is the name given to the interconnection of computation units), H -loose coupling between host processor and CGRA, enabling our CGRA to execute an application independent of the host-processor’s intervention. The motivations for developing such a CGRA are: To execute applications efficiently through reduction in reconfiguration time (i.e. the time needed to transfer instructions and data to the reconfigurable fabric) and reduction in execution time through better exploitation of all forms of parallelism: Instruction Level Parallelism (ILP), Data Level Parallelism (DLP) and Thread/Task Level Parallelism (TLP). We choose a macro-dataflow based orchestration framework in combination with partial reconfiguration so as to ease exploitation of TLP and DLP. Macro-dataflow serves as a light weight synchronization mechanism. We experiment with two variants of the macro-dataflow orchestration units, namely: hardware controlled orchestration unit and the compiler controlled orchestration unit. We employ a NoC as it helps reduce the reconfiguration overhead. To permit customization of the CGRA for a particular domain through the use of domain-specific custom-Intellectual Property (IP) blocks. This aids in improving both application performance and makes it energy efficient. To develop a CGRA which is completely programmable and accepts any program written using the C89 standard. The compiler and the architecture were co-developed to ensure that every feature of the architecture could be automatically programmed through an application by a compiler. In this CGRA framework, the orchestration mechanism (O) and the host-CGRA coupling (H) are kept fixed and we permit design space exploration of the other terms in the 7-tuple design space. The mode of compilation and execution remains invariant of these changes, hence referred to as a framework. We now elucidate the compilation and execution flow for this CGRA framework. An application written in C language is compiled and is transformed into a set of temporal partitions, referred to as HyperOps in this thesis. The macro-dataflow orchestration unit selects a HyperOp for execution when all its inputs are available. The instructions and operands for a ready HyperOp are transferred to the reconfigurable fabric for execution. Each ALU (in the computation unit) is capable of waiting for the availability of the input data, prior to issuing instructions. We permit the launch and execution of a temporal partition to progress in parallel, which reduces the reconfiguration overhead. We further cut launch delays by keeping loops persistent on fabric and thus eliminating the need to launch the instructions. The CGRA framework has been implemented using Bluespec System Verilog. We evaluate the performance of two of these CGRA instances: one for cryptographic applications and another instance for linear algebra kernels. We also run other general purpose integer and floating point applications to demonstrate the generic nature of these optimizations. We explore various microarchitectural optimizations viz. pipeline optimizations (i.e. changing value of T ), different forms of macro dataflow orchestration such as hardware controlled orchestration unit and compiler-controlled orchestration unit, different execution modes including resident loops, pipeline parallelism, changes to the router etc. As a result of these optimizations we observe 2.5x improvement in performance as compared to the base version. The reconfiguration overhead was hidden through overlapping launching of instructions with execution making. The perceived reconfiguration overhead is reduced drastically to about 9-11 cycles for each HyperOp, invariant of the size of the HyperOp. This can be mainly attributed to the data dependent instruction execution and use of the NoC. The overhead of the macro-dataflow execution unit was reduced to a minimum with the compiler controlled orchestration unit. To benchmark the performance of these CGRA instances, we compare the performance of these with an Intel Core 2 Quad running at 2.66GHz. On the cryptographic CGRA instance, running at 700MHz, we observe one to two orders of improvement in performance for cryptographic applications and up to one order of magnitude performance degradation for linear algebra CGRA instance. This relatively poor performance of linear algebra kernels can be attributed to the inability in exploiting ILP across computation units interconnected by the NoC, long latency in accessing data memory placed at the periphery of the reconfigurable fabric and unavailability of pipelined floating point units (which is critical to the performance of linear algebra kernels). The superior performance of the cryptographic kernels can be attributed to higher computation to load instruction ratio, careful choice of custom IP block, ability to construct large HyperOps which allows greater portion of the communication to be performed directly (as against communication through a register file in a general purpose processor) and the use of resident loops execution mode. The power consumption of a computation unit employed on the cryptography CGRA instance, along with its router is about 76mW, as estimated by Synopsys Design Vision using the Faraday 90nm technology library for an activity factor of 0.5. The power of other instances would be dependent on specific instantiation of the domain specific units. This implies that for a reconfigurable fabric of size 5 x 6 the total power consumption is about 2.3W. The area and power ( 84mW) dissipated by the macro dataflow orchestration unit, which is common to both instances, is comparable to a single computation unit, making it an effective and low overhead technique to exploit TLP.
49

Lee, Yen-Lin, and 李彥霖. "Study on Reconfigurable System-On-Chip Architecture Based on Dataflow Computing." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/45120514829399151226.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
碩士
國立交通大學
電機與控制工程系
89
Nowadays, the industry of information appliances and communication products is growing rapidly. System-On-Chip(SOC) design has been becoming the key to enable the explosive growth. However, the state-of-the-art SOC design is rather IP-based and provides several challenges to designers, especially in the transformation of algorithms and the integration of IP cores. To alleviate the difficulty of SOC design this thesis proposes a novel dataflow architecture for the integration of IP cores. The proposed architecture employs the Petri Nets to dynamically schedule the operations of DSP algorithms onto processing elements and, hence, efficiently performs dataflow computing. As a result, the Petri-Net-based scheduler is insensitive to the timing of computation and interprocessor communication, and the proposed SOC architecture is reconfigurable for DSP applications.
50

Alle, Mythri. "Compiling For Coarse-Grained Reconfigurable Architectures Based On Dataflow Execution Paradigm." Thesis, 2012. http://etd.iisc.ernet.in/handle/2005/2453.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Coarse-Grained Reconfigurable Architectures(CGRAs) can be employed for accelerating computational workloads that demand both flexibility and performance. CGRAs comprise a set of computation elements interconnected using a network and this interconnection of computation elements is referred to as a reconfigurable fabric. The size of application that can be accommodated on the reconfigurable fabric is limited by the size of instruction buffers associated with each Compute element. When an application cannot be accommodated entirely, application is partitioned such that each of these partitions can be executed on the reconfigurable fabric. These partitions are scheduled by an orchestrator. The orchestrator employs dynamic dataflow execution paradigm. Dynamic dataflow execution paradigm has inherent support for synchronization and helps in exploitation of parallelism that exists across application partitions. In this thesis, we present a compiler that targets such CGRAs. The compiler presented in this thesis is capable of accepting applications specified in C89 standard. To enable architectural design space exploration, the compiler is designed such that it can be customized for several instances of CGRAs employing dataflow execution paradigm at the orchestrator. This can be achieved by specifying the appropriate configuration parameters to the compiler. The focus of this thesis is to provide efficient support for various kinds of parallelism while ensuring correctness. The compiler is designed to support fine-grained task level parallelism that exists across iterations of loops and function calls. Additionally, compiler can also support pipeline parallelism, where a loop is split into multiple stages that execute in a pipelined manner. The prototype compiler, which targets multiple instances of a CGRA, is demonstrated in this thesis. We used this compiler to target multiple variants of CGRAs employing dataflow execution paradigm. We varied the reconfigur-able fabric, orchestration mechanism employed, size of instruction buffers. We also choose applications from two different domains viz. cryptography and linear algebra. The execution time of the CGRA (the best among all instances) is compared against an Intel Quad core processor. Cryptography applications show a performance improvement ranging from more than one order of magnitude to close to two orders of magnitude. These applications have large amounts of ILP and our compiler could successfully expose the ILP available in these applications. Further, the domain customization also played an important role in achieving good performance. We employed two custom functional units for accelerating Cryptography applications and compiler could efficiently use them. In linear algebra kernels we observe multiple iterations of the loop executing in parallel, effectively exploiting loop-level parallelism at runtime. Inspite of this we notice close to an order of magnitude performance degradation. The reason for this degradation can be attributed to the use of non-pipelined floating point units, and the delays involved in accessing memory. Pipeline parallelism was demonstrated using this compiler for FFT and QR factorization. Thus, the compiler is capable of efficiently supporting different kinds of parallelism and can support complete C89 standard. Further, the compiler can also support different instances of CGRAs employing dataflow execution paradigm.

До бібліографії