Дисертації з теми "Architecture dataflow"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-50 дисертацій для дослідження на тему "Architecture dataflow".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Iannucci, Robert A. "A dataflow/von Neumann hybrid architecture." Thesis, Massachusetts Institute of Technology, 1988. http://hdl.handle.net/1721.1/14778.
Benjamin, Steven I. "Dataflow : overview and simulation /." Online version of thesis, 1988. http://hdl.handle.net/1850/10221.
Narayanaswamy, Ramya Priyadharshini. "Design of a Power-aware Dataflow Processor Architecture." Thesis, Virginia Tech, 2010. http://hdl.handle.net/10919/34192.
Master of Science
Moser, Nico, Carsten Gremzow, and Matthias Menge. "Interconnection Optimization for Dataflow Architectures." Universitätsbibliothek Chemnitz, 2007. http://nbn-resolving.de/urn:nbn:de:swb:ch1-200700950.
Ruggiero, C. A. "Throttle mechanisms for the Manchester Dataflow Machine." Thesis, University of Manchester, 1987. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.382765.
Li, Feng. "Compiling for a multithreaded dataflow architecture : algorithms, tools, and experience." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2014. http://tel.archives-ouvertes.fr/tel-00992753.
Motiwala, Quaeed. "Optimizations for acyclic dataflow graphs for hardware-software codesign." Thesis, This resource online, 1994. http://scholar.lib.vt.edu/theses/available/etd-06302009-040504/.
Savaş, Süleyman. "Linear Algebra for Array Signal Processing on a Massively Parallel Dataflow Architecture." Thesis, Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-2192.
This thesis provides the deliberations about the implementation of Gentleman-Kung systolic array for QR decomposition using Givens Rotations within the context of radar signal
processing. The systolic array of Givens Rotations is implemented and analysed using a massively parallel processor array (MPPA), Ambric Am2045. The tools that are dedicated to the MPPA are tested in terms of engineering efficiency. aDesigner, which is built on eclipse environment, is used for programming, simulating and performance analysing. aDesigner has been produced for Ambric chip family. 2 parallel matrix multiplications have been implemented
to get familiar with the architecture and tools. Moreover different sized systolic arrays are implemented and compared with each other. For programming, ajava and astruct languages are provided. However floating point numbers are not supported by the provided languages.
Thus fixed point arithmetic is used in systolic array implementation of Givens Rotations. Stable and precise numerical results are obtained as outputs of the algorithms. However the analysis
results are not reliable because of the performance analysis tools.
Savaş, Süleyman. "Linear Algebra for Array Signal Processing on a Massively Parallel Dataflow Architecture." Thesis, Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-4137.
This thesis provides the deliberations about the implementation of Gentleman-Kung systolic array for QR decomposition using Givens Rotations within the context of radar signal processing. The systolic array of Givens Rotations is implemented and analysed using a massively parallel processor array (MPPA), Ambric Am2045. The tools that are dedicated to the MPPA are tested in terms of engineering efficiency. aDesigner, which is built on eclipse environment, is used for programming, simulating and performance analysing. aDesigner has been produced for Ambric chip family. 2 parallel matrix multiplications have been implemented to get familiar with the architecture and tools. Moreover different sized systolic arrays are implemented and compared with each other. For programming, ajava and astruct languages are provided. However floating point numbers are not supported by the provided languages. Thus fixed point arithmetic is used in systolic array implementation of Givens Rotations. Stable
and precise numerical results are obtained as outputs of the algorithms. However the analysis results are not reliable because of the performance analysis tools.
Pradal, Christophe. "Architecture de dataflow pour des systèmes modulaires et génériques de simulation de plante." Thesis, Montpellier, 2019. http://www.theses.fr/2019MONTS034.
Biological modeling, particularly of plant growth and functioning, is a rapidly expanding field that is useful in addressing climate change and food security issues at the global level. Modeling and simulation are essential tools for understanding the complex relationships between plant architecture and the processes that influence their growth in a changing environment.For plant modeling, a large number of formalisms have been developed in many disciplines and at different scales of representation.The objective of this thesis is to define a modular architecture that allows to simulate structural-functional plant systems by reusing and assembling different existing models.We will first study the different approaches to software reuse proposed by Krueger, then blackboard systems, and scientific workflow systems.These different approaches are used to cooperate, reuse and assemble software artifacts in a modular manner.Based on the observation that these systems provide the abstractions necessary for the integration of various artifacts, our working hypothesis is that a hybrid architecture, based on blackboard systems with dataflow-driven procedural control, would both achieve modularity while allowing the modeler to maintain control over execution.In Chapter 2, we describe the OpenAlea platform, a platform with software components and a scientific workflow system, allowing the assembly and composition of models through a visual programming interface. In Chapter 3, we propose a data structure for the blackboard, combining a topological representation of plant architecture at different scales, the Multiscale Tree Graph, and its geometric spatialization using the 3D PlantGL library. In chapter 4, we present the lambda-dataflows, an extension of dataflows allowing to couple simulation and analysis.Then, in Chapter 5, we present a first application, which illustrates the use of a generic gramineous leaf model in different plant models. Finally, in Chapter 6, we present all the architectural elements used to develop a generic framework for modelling the development of foliar diseases in an architectural canopy.The architecture presented in this thesis and its implementation in OpenAlea are a first step towards the realization of open integrative modeling platforms, allowing the cooperation of heterogeneous models in biology. The use of scientific workflow formalism in analysis and simulation makes it possible to consider in the short term the development of collaborative and distributed simulation platforms on a large scale
Guo, Jinghong. "Distributed, Modular, Open Control Architecture for Power Conversion Systems." Diss., Virginia Tech, 2005. http://hdl.handle.net/10919/27900.
Ph. D.
Lesparre, Youen. "Evaluation de l'affectation des tâches sur une architecture à mémoire distribuée pour des modèles flot de données." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066086/document.
With the increasing use of smart-phones, connected objects or automated vehicles, embedded systems have become ubiquitous in our living environment. These systems are often highly constrained in terms of power consumption and size. They are more and more implemented with many-core processor array that allow, rapid design to meet stringent real-time constraints while operating at relatively low frequency, with reduced power consumption.Running an application on a processor array requires dispatching its tasks on the processors in order to meet capacity and performance constraints. This mapping problem is known to be NP-complete.The contributions of this thesis are threefold:First we extend important notions from the Cyclo-Static Dataflow Graph to the Phased Computation Graph model and two equivalent sufficient conditions of liveness.Second, we present a random dataflow graph generator able to generate Synchonous Dataflow Graphs, Cyclo-Static Dataflow Graphs and Phased Computation Graphs. The Generator, is able to generate live dataflow of up to 10,000 tasks in less than 30 seconds. It is compared with SDF3 and PREESM.Third and most important, we propose a new method of evaluation of a mapping using the Synchonous Dataflow Graph and the Cyclo-Static Dataflow Graph models. The method evaluates efficiently the memory footprint of the communications of a dataflow graph mapped on a distributed architecture. The evaluation is declined in two versions, the first guarantees a live mapping while the second accounts for a constraint on throughput.The evaluation method is experimented on dataflow graphs from Turbine and on real-life applications
Voigt, Sven-Ole. "Dynamically reconfigurable dataflow architecture for high performance digital signal processing on multi FPGA platforms." Aachen Shaker, 2008. http://d-nb.info/992481694/04.
Silva, Antonio Carlos Fernandes da. "ChipCflow: tool for convert C code in a static dataflow architecture in reconfigurable hardware." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-30062015-141638/.
Existe uma crescente busca por softwares e arquiteturas alternativas. Essa busca acontece pois houveram avanços na tecnologia do hardware, e estes avanços devem ser complementados por inovações nas metodologias de projetos, testes e verificação para que haja um uso eficaz da tecnologia. Os software e arquiteturas alternativas, geralmente são modelos que exploram o paralelismo das aplicações, ao contrário do modelo de Von Neumann. Dentre as arquiteturas alternativas de alto desempenho, tem-se a arquitetura a fluxo de dados. Nesse tipo de arquitetura, o processo de execução de programas é determinado pela disponibilidade dos dados, logo o paralelismo está embutido na própria natureza do sistema. O modelo a fluxo de dados possui a vantagem de expressar o paralelismo de maneira intrínseca, eliminando a necessidade do programador explicitar em seu código os trechos onde deve haver paralelismo. As arquiteturas a fluxo de dados voltaram a ser uma área de pesquisa devido aos avanços do hardware, em particular, os avanços da Computação Reconfigurável e dos Field Programmable Gate Arrays (FPGAs).Nesta tese é descrita uma ferramenta de conversão de código que visa a geração de aplicações utilizando uma arquitetura a fluxo de dados estática. Também é descrito o projeto ChipCflow, cuja ferramenta de conversão de código, descrita nesta tese, é parte integrante. A especificação do algoritmo a ser convertido é feita em linguagem C e convertida para uma linguagem de descrição de hardware, respeitando o modelo proposto pelo ChipCflow. Os resultados alcançados visam a prova de conceito da conversão de código de uma linguagem de alto nível para uma arquitetura a fluxo de dados a ser configurada em FPGA.
Ngo, Dinh Thanh. "Runtime mapping of dynamic dataflow applications on heterogeneous multiprocessor platforms." Thesis, Lorient, 2015. http://www.theses.fr/2015LORIS371/document.
Modern multimedia applications are subject to an increasing complexity with widespread standards. This has led to the interest in dataflow approach that offers a powerful perspective on parallel com- putations at high level. In the meantime, the emergence of massively parallel architectures has revealed the trend towards heterogeneous Multi-Processor System-on-Chips (MPSoCs) to offer a better perfor- mance and energy tradeoff than their homogeneous counterparts. However, this also imposes challenges to the mapping of multimedia applications on such complex architectures. This thesis presents an adaptive methodology for mapping dataflow applications on heterogeneous MPSoCs. This thesis focuses on video decoders specified in RVC-CAL language, a dedicated dataflow language for video applications. Existing static approaches cannot capture all behaviors in dynamic dataflow applications. Thus, this requires to adapt the mapping according to the input data. The algorithm offers some adaptive parameters combined with our analyt- ical communication model to improve a performance while consider- ing load balancing. We evaluate our algorithms on a set of randomly generated benchmarks and real video decoders like MPEG4-SP and HEVC. Experimental results reveal that our mapping methodology is fast enough (in milliseconds) and the runtime remapping signifi- cantly improves the initial mapping. In the remapping process, we take the migration cost into account because the reconfiguration time also contributes to the overall performance
Mandlekar, Anup Shrikant. "An Application Framework for a Power-Aware Processor Architecture." Thesis, Virginia Tech, 2012. http://hdl.handle.net/10919/34484.
Master of Science
Shelor, Charles F. "Dataflow Processing in Memory Achieves Significant Energy Efficiency." Thesis, University of North Texas, 2018. https://digital.library.unt.edu/ark:/67531/metadc1248478/.
Voigt, Sven O. [Verfasser]. "Dynamically Reconfigurable Dataflow Architecture for High-Performance Digital Signal Processing on Multi-FPGA Platforms / Sven O Voigt." Aachen : Shaker, 2009. http://d-nb.info/116130908X/34.
Zheng, Chunfang. "GRAPHICAL MODELING AND SIMULATION OF A HYBRID HETEROGENEOUS AND DYNAMIC SINGLE-CHIP MULTIPROCESSOR ARCHITECTURE." UKnowledge, 2004. http://uknowledge.uky.edu/gradschool_theses/249.
Cavenaghi, Marcos Antônio. "Implementação de um simulador para a arquitetura de dados Wolf." Universidade de São Paulo, 1992. http://www.teses.usp.br/teses/disponiveis/54/54132/tde-08062009-102639/.
This work presents the Proto-WOLF dataflow architecture and implementation of a simplified event-driven simulator for this architecture. The WOLF project is a proposal for the implementation of a supercomputer based on the dynamic dataflow model with variable granularity. In order to place the work in context, some of the basic concepts involved in simulation are presented and a survey of the most relevant works in dataflow is presented. The preliminary simulation results are presented. These results are canalized and compared with known results from the dataflow architecture of the Manchester Dataflow Machine. When the simulated Proto-WOLF machine has all its unique features disabled, it is expected that it should behave in a Manchester-like fashion. The results obtained fully agree with these. It is, then, possible to conclude that the implemented simulator is behaving in a proper manner.
Cavenaghi, Marcos Antônio. "Implementação e estudo da arquitetura a fluxo de dados Wolf." Universidade de São Paulo, 1997. http://www.teses.usp.br/teses/disponiveis/76/76132/tde-01062009-111139/.
This work presents the dataflow architecture Wolf. Wolf has been proposed in the focus of some known problems identified in previous works: the execution of data structures (vectors and matrices) and sequential code: to name a few. Wolf is based in the dynamic dataflow model and explores variable granularity (the thinnest is at instruction level). Some concepts developed in the designing of other hybrid architectures: guided the Wolf implementation. The macro-dataflow and multithreading were two of them. Focusing the study of the Wolf architecture: it has been developed a time driven simulator (Saw). The object oriented language C++ was used for this implementation. The code can be compiled on any ANSI standard 32 bits compiler. This code was exhaustively tested and the numeric results obtained with the experiments were equal to the ones obtained with Von Neumann architecture. This study identified some problems with the Wolf architecture. Some proposals were implemented in the simulator to try to identify the causes for the problems. The results led to an alteration in the Wolf architecture. The new proposed architecture (Wolf II) is described in the last chapter: but it was not submitted to experiments as Wolf was.
White, Joey. "USING DATAFLOW ARCHITECTURE TO SOLVE THE TRANSPORT LAG PROBLEM WHEN INTERFACING WITH AN ENGINEERING MODEL FLIGHT COMPUTER IN A TELEMETRY SIMULATION." International Foundation for Telemetering, 1991. http://hdl.handle.net/10150/613183.
One of the most challenging technical problems in the development of a spacecraft telemetry simulation is the interface with a flight computer running real-world flight software. The ability of the simulation to satisfy flight software requests for telemetry data, and to load, mode, and control the flight software along with the simulation, can be constrained or degraded using conventional interface solutions. Telemetry dataflow architecture systems can be utilized to solve the interface problems with less constraints. This is an especially attractive solution in a telemetry simulation where the telemetry system can also be used to format and serialize spacecraft telemetry, and receive and preprocess commands. This paper discusses the concepts developed for such a system for a training simulation of the Orbital Maneuvering Vehicle for NASA at Johnson Space Center.
Arumí, Albó Pau. "Real-time multimedia on off-the-shelf operating systems: from timeliness dataflow models to pattern languages." Doctoral thesis, Universitat Pompeu Fabra, 2009. http://hdl.handle.net/10803/7558.
Software-based multimedia systems that deal with real-time audio, video and graphics processing are pervasive today, not only in desktop workstations but also in ultra-light devices such as smart-phones. The fact that most of the processing is done in software, using the high-level hardware abstractions and services offered by the underlying operating systems and library stacks, enables for quick application development. Added to this exibility and immediacy (compared to hardware oriented platforms), such platforms also offer soft real-time capabilities with appropriate latency bounds. Nevertheless, experts in the multimedia domain face a serious challenge: the features and complexity of their applications are growing rapidly; meanwhile, real-time requirements (such as low latency) and reliability standards increase. This thesis focus on providing multimedia domain experts with workbench of tools they can use to model and prototype multimedia processing systems. Such tools contain platforms and constructs that reect the requirements of the domain and application, and not accidental properties of the implementation (such as thread synchronization and buffers management). In this context, we address two distinct but related problems: the lack of models of computation that can deal with continuous multimedia streams processing in real-time, and the lack of appropriate abstractions and systematic development methods that support such models. Many actor-oriented models of computation exist and they offer better abstractions than prevailing software engineering techniques (such as object-orientation) for building real-time multimedia systems. The family of Process Networks and Dataow models based on networks of connected processing actors are the most suited for continuous stream processing. Such models allow to express designs close to the problem domain (instead of focusing in implementation details such as threads synchronization), and enable better modularization and hierarchical composition. This is possible because the model does not over-specify how the actors must run, but only imposes data dependencies in a declarative language fashion. These models deal with multi-rate processing and hence complex periodic actor's execution schedulings. The problem is that the models do not incorporate the concept of time in a useful way and, hence, the periodic schedules do not guarantee real-time and low latency requirements. This dissertation overcomes this shortcoming by formally describing a new model that we named Time-Triggered Synchronous Dataow (TTSDF), whose periodic schedules can be interleaved by several time-triggered activations" so that inputs and outputs of the processing graph are regularly serviced. The TTSDF model has the same expressiveness (or equivalent computability) than the Synchronous Dataow (SDF) model, with the advantage that it guarantees minimum latency and absence of gaps and jitter in the output. Additionally, it enables run-time load balancing between callback activations and parallelization. Actor-oriented models are not off-the-shelf solutions and do not suffice for building multimedia systems in a systematic and engineering approach. We address this problem by proposing a catalog of domain-speciffic design patterns organized in a pattern language. This pattern language provides design reuse paying special attention to the context in which a design solution is applicable, the competing forces it needs to balance and the implications of its application. The proposed patterns focus on how to: organize different kinds of actors connections, transfer tokens between actors, enable human interaction with the dataow engine, and finally, rapid prototype user interfaces on top of the dataow engine, creating complete and extensible applications. As a case study, we present an object-oriented framework (CLAM), and speciffic applications built upon it, that makes extensive use of the contributed TTSDF model and patterns.
Amstel, Duco van. "Optimisation de la localité des données sur architectures manycœurs." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM019/document.
The continuous evolution of computer architectures has been an important driver of research in code optimization and compiler technologies. A trend in this evolution that can be traced back over decades is the growing ratio between the available computational power (IPS, FLOPS, ...) and the corresponding bandwidth between the various levels of the memory hierarchy (registers, cache, DRAM). As a result the reduction of the amount of memory communications that a given code requires has been an important topic in compiler research. A basic principle for such optimizations is the improvement of temporal data locality: grouping all references to a single data-point as close together as possible so that it is only required for a short duration and can be quickly moved to distant memory (DRAM) without any further memory communications.Yet another architectural evolution has been the advent of the multicore era and in the most recent years the first generation of manycore designs. These architectures have considerably raised the bar of the amount of parallelism that is available to programs and algorithms but this is again limited by the available bandwidth for communications between the cores. This brings some issues thatpreviously were the sole preoccupation of distributed computing to the world of compiling and code optimization techniques.In this document we present a first dive into a new optimization technique which has the promise of offering both a high-level model for data reuses and a large field of potential applications, a technique which we refer to as generalized tiling. It finds its source in the already well-known loop tiling technique which has been applied with success to improve data locality for both register and cache-memory in the case of nested loops. This new "flavor" of tiling has a much broader perspective and is not limited to the case of nested loops. It is build on a new representation, the memory-use graph, which is tightly linked to a new model for both memory usage and communication requirements and which can be used for all forms of iterate code.Generalized tiling expresses data locality as an optimization problem for which multiple solutions are proposed. With the abstraction introduced by the memory-use graph it is possible to solve this optimization problem in different environments. For experimental evaluations we show how this new technique can be applied in the contexts of loops, nested or not, as well as for computer programs expressed within a dataflow language. With the anticipation of using generalized tiling also to distributed computations over the cores of a manycore architecture we also provide some insight into the methods that can be used to model communications and their characteristics on such architectures.As a final point, and in order to show the full expressiveness of the memory-use graph and even more the underlying memory usage and communication model, we turn towards the topic of performance debugging and the analysis of execution traces. Our goal is to provide feedback on the evaluated code and its potential for further improvement of data locality. Such traces may contain information about memory communications during an execution and show strong similarities with the previously studied optimization problem. This brings us to a short introduction to the algorithmics of directed graphs and the formulation of some new heuristics for the well-studied topic of reachability and the much less known problem of convex partitioning
Suzanne, Aurélie. "Decision Support Query Processing of Spanning Event Streams." Thesis, Nantes Université, 2022. http://www.theses.fr/2022NANU4022.
The Big Data era requires new processing architectures, among which streaming systems which have become very popular. Those systems are able to summarize infinite data streams with aggregates on the most recent data. However, up to now, only point events have been considered and spanning events, which come with a duration, have been let aside, restricted to the persistent databases world only. In this thesis, a unified framework to deal with such stream mechanisms on spanning events is defined. Then, we develop an engine for Aggregate Continuous Query (ACQ), which is able to incorporate event lifespan to provide exact aggregate computation, and provides adapted structures for an efficient computation of sliding windows. This engine is further extended to handle shared computation of simultaneously running ACQs, while properly managing out-oforder events. In order to elaborate at runtime the most efficient query execution plan, a costbased policy is followed. Throughout this thesis, many experiments have been carried out to show the pertinence and the efficiency of our approaches in a lar
Friston, S. "Low latency rendering with dataflow architectures." Thesis, University College London (University of London), 2017. http://discovery.ucl.ac.uk/1544925/.
Astolfi, Vitor Fiorotto. "ChipCflow - em hardware dinamicamente reconfigurável." Universidade de São Paulo, 2009. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-05032010-203142/.
In recent years, reconfigurable computing has become increasingly more advanced, especially in hardware that uses Field-Programmable Gate Arrays. However, the increase of performance in FPGAs accumulated the gap between design capacity and technology for the development of the design. Imperative high-level programming languages such as C are more appropriate for the development of complex algorithms than hardware description languages (HDL). For this reason, many ANSI C-like programming tools for the development of hardware came to existence. The ChipCflow project, of which this project is part, is one of these tools. The execution of algorithms through this tool will be completely directed by data flow, according to the dynamic model found on Dataflow Architectures, taking advantage of its natural high levels of parallelism and the characteristics of the partially reconfigurable hardware. In this project, the objective is a proof of concept for the creation of instances, in the form of operators, of a ChipCflow algorithm on a partially reconfigurable hardware, taking as reference the Xilinx Virtex boards
Lopes, Joelmir José. "ChipCflow - uma ferramenta para execução de algoritmos utilizando o modelo a fluxo de dados dinâmico em hardware reconfigurável." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-05122012-154304/.
Due to the complexity of applications, the growing demand for both systems using millions of transistors and consecutive complex hardware, tools that convert C into a Hardware Description Language (HDL), as VHDL and Verilog, have been developed. In this context this thesis presents the ChipCflow project, which uses dataflow architecture to implement high-performance logics in Field Programmable Gate Array (FPGA). Dataflow machines are programmable computers whose hardware is optimized for fine-grain data-flow parallel computation. In other words the execution of programs is determined by data availability, thus parallelism is intrinsic in these systems. On the other hand, with the advance of technology of microelectronics, the FPGA has been used mainly because of its flexibility, facilities to implement complex systems and intrinsic parallelism. One of the challenges is to create tools for programmers who use HLL (High Level Language), such as C language, producing hardware directly. These tools should use the utmost experience of the programmers, the parallelism of dynamic dataflow architecture and the flexibility and parallelism of FPGA to produce efficient hardware optimized for high performance and lower power consumption. The ChipCflow project is a tool that converts application programs written in C language into VHDL, based on the dynamic dataflow architecture. The main goal in this thesis is to define and implement the operators of ChipCflow using dynamic dataflow architecture in FPGA. These operators use tagged tokens to identify data based on instances of operators and their implementation and instances use an asynchronous implementation model in FPGA to achieve faster speed and lower consumption
Bhagyanath, Anoop [Verfasser], and Klaus [Akademischer Betreuer] Schneider. "Code Generation for Synchronous Control Asynchronous Dataflow Architectures / Anoop Bhagyanath ; Betreuer: Klaus Schneider." Kaiserslautern : Technische Universität Kaiserslautern, 2021. http://d-nb.info/122615428X/34.
MAHIOUT, ABDERRAHMANE. "Placement et ordonnancement automatiques de programmes dataflow data-paralleles sur les architectures paralleles." Paris 11, 1996. http://www.theses.fr/1996PA112268.
Selva, Manuel. "Performance monitoring of throughput constrained dataflow programs executed on shared-memory multi-core architectures." Thesis, Lyon, INSA, 2015. http://www.theses.fr/2015ISAL0055/document.
Because of physical limits, hardware designers have switched to parallel systems to exploit the still growing number of transistors per square millimeter of silicon. These parallel systems are made of several independent computing units. To benefit from these computing units, software must be changed. Existing sequential applications have to be split into independent tasks to be executed in parallel on the different computing units. To that end, many concurrent programming models have been proposed and are in use today. We focus in this thesis on the dataflow concurrent programming model. This work is about performance evaluation of dataflow programs on multicore architectures. We propose to extend dataflow programming models with the notion of throughput constraints and to take this information into account in the compilation tool chain to detect at runtime the throughput bottlenecks. The profiling results gathered during the execution are used both for off-line analyzes and to adapt the application during its execution. In the former case, the developer uses this information to know which part of the dataflow program should be optimized and to efficiently distribute the program on the computing units. In the later case, the profiling information is used by runtime adaptation mechanisms to distribute differently the work on the computing units. We give a particular focus on the profiling of the usage of the memory subsystem. The data exchange information provide by the programming model allows to efficiently used the memory subsystem of multicore architectures. Nevertheless, the complexity of modern memory systems doesn't allow to statically evaluate the impact of memory accesses on the global performances of the application. We propose to set up memory profiling dedicated to dataflow applications based on hardware profiling mechanisms
Magna, Patrícia. "Redução dos bits de emparelhamento da máquina de fluxo de dados de Manchester." Universidade de São Paulo, 1992. http://www.teses.usp.br/teses/disponiveis/54/54132/tde-17042009-115457/.
The dataflow model is specially relevant you research in high-performance architectures. In this model, the execution control is done by taking into account only the dates availability, thus allowing maximum exploitation of the paralelism implicit in programs. The present work is based on the Manchester dataflow machine, which, in to order you handle the reentran code, imposes the dates token you have, in addition you the destination instruction Field, albel. Additional This information, which corresponds you 70% of the dates token, compounds the machine implementation it substantially bounds the execution speed and prevents the full model utilization. This work presents approaches will be reducing the amount of information needed will be to proper machine operation in to order you achieve to simpler and lives effective implementation.
Magna, Patrícia. "Proposta e simulação de uma arquitetura a fluxo de dados de segunda geração." Universidade de São Paulo, 1997. http://www.teses.usp.br/teses/disponiveis/76/76132/tde-06042009-113436/.
In this work is presented the SEED architecture. This architecture was proposed considering the experiences obtained with existing architectures based on dataflow model. The SEED architecture uses dataflow model to schedule and execute sets of instructions, called code blocks. This approach tries to make use of the main quality of the dataflow model that is to expose the maximum parallelism of the programs. However, this architecture explores coarser granularity than the one usually considered in dataflow architectures in order to reduce the data token traffic in the architecture. This type of reduction tries to solve problems like excessive occupation of memory and high complexity of the hardware. Besides the specification of all units that compose the SEED architecture, this work also proposes a way of partitioning programs, creating code blocks that may be executed by SEED architecture. Some benchmarks were generated using this proposal for partitioning programs. These benchmarks were executed in the SEED architecture simulator, in order to analyze the behavior of the proposed architecture under special configurations.
Savas, Süleyman. "Utilizing Heterogeneity in Manycore Architectures for Streaming Applications." Licentiate thesis, Högskolan i Halmstad, Centrum för forskning om inbyggda system (CERES), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-33792.
HiPEC (High Performance Embedded Computing)
NGES (Towards Next Generation Embedded Systems: Utilizing Parallelism and Reconfigurability)
Menon, Suraj S. "Supporting Distributed Fault Tolerance In A Real-Time Micro-Kernel." Thesis, Virginia Tech, 2006. http://hdl.handle.net/10919/35463.
Master of Science
Georgiou, Yiannis. "Contributions for resource and job management in high performance computing." Grenoble, 2010. http://www.theses.fr/2010GRENM079.
High Performance Computing is characterized by the latest technological evolutions in computing architectures and by the increasing needs of applications for computing power. A particular middleware called Resource and Job Management System (RJMS), is responsible for delivering computing power to applications. The RJMS plays an important role in HPC since it has a strategic place in the whole software stack because it stands between the above two layers. However, the latest evolutions in hardware and applications layers have provided new levels of complexities to this middleware. Issues like scalability, management of topological constraints, energy efficiency and fault tolerance have to be particularly considered, among others, in order to provide a better system exploitation from both the system and user point of view. This dissertation provides a state of the art upon the fundamental concepts and research issues of Resources and Jobs Management Systems. It provides a multi-level comparison (concepts, functionalities, performance) of some Resource and Jobs Management Systems in High Performance Computing. An important metric to evaluate the work of a RJMS on a platform is the observed system utilization. However, studies and logs of production platforms show that HPC systems in general suffer of significant un-utilization rates. Our study deals with these clusters' un-utilization periods by proposing methods to aggregate otherwise un-utilized resources for the benefit of the system or the application. More particularly this thesis explores RJMS level mechanisms: 1) for increasing the jobs valuable computation rates in the high volatile environments of a lightweight grid context, 2) for improving system utilization with malleability techniques and 3) providing energy efficient system management through the exploitation of idle computing machines. The experimentation and evaluation in this type of contexts provide important complexities due to the inter-dependency of multiple parameters that have to be taken into control. In this thesis we have developed a methodology based upon real-scale controlled experimentation with submission of synthetic or real workload traces
Stan, Oana. "Placement of tasks under uncertainty on massively multicore architectures." Thesis, Compiègne, 2013. http://www.theses.fr/2013COMP2116/document.
This PhD thesis is devoted to the study of combinatorial optimization problems related to massively parallel embedded architectures when taking into account uncertain data (e.g. execution time). Our focus is on chance constrained programs with the objective of finding the best solution which is feasible with a preset probability guarantee. A qualitative analysis of the uncertain data we have to treat (dependent random variables, multimodal, multidimensional, difficult to characterize through classical distributions) has lead us to design a non parametric method, the so-called "robust binomial approach", valid whatever the joint distribution and which is based on robust optimization and statistical hypothesis testing. We also propose a methodology for adapting approximate algorithms for solving stochastic problems by integrating the robust binomial approach when verifying for solution feasibility. The paractical relevance of our approach is validated through two problems arising in the compilation of dataflow application for manycore platforms. The first problem treats the stochastic partitioning of networks of processes on a fixed set of nodes, by taking into account the load of each node and the uncertainty affecting the weight of the processes. For finding stochastic solutions, a semi-greedy iterative algorithm has been proposed which allowed measuring the robustness and cost of the solutions with regard to those for the deterministic version of the problem. The second problem consists in studying the global placement and routing of dataflow applications on a clusterized architecture. The purpose being to place the processes on clusters such that it exists a feasible routing, a GRASP heuristic has been conceived first for the deterministic case and afterwards extended for the chance constrained variant of the problem
Bodin, Bruno. "Analyse d'Applications Flot de Données pour la Compilation Multiprocesseur." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2013. http://tel.archives-ouvertes.fr/tel-00922578.
Silva, Bruno de Abreu. "Gerenciamento de tags na arquitetura ChipCflow - uma máquina a fluxo de dados dinâmica." Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-17052011-085128/.
The alternative architectures and softwares researches have been growing in the last years. These researches are happening due to the advance of hardware technology and such advances must be complemented by improvements on design methodologies, test and verification techniques in order to use technology effectively. Many of the alternative architectures and softwares, in general, explore the parallelism of applications, differently to von Neumann model. Among high performance alternative architectures, there is the Dataflow Architecture. In this kind of architecture, the execution of programs is determined by data availability, thus the parallelism is intrinsic in these systems. The dataflow architectures become again a highlighted research area due to hardware advances, in particular, the advances of Reconfigurable Computing and FPGAs (Field-Programmable Gate Arrays). ChipCflow project is a tool for execution of algorithms using dynamic dataflow graph in FPGA. The main goal in this module of the ChipCflow project is to define the tagged-token format, the iterative operators that will manipulate the tags of tokens and to implement them
Arnesen, Adam T. "Increasing Design Productivity for FPGAs Through IP Reuse and Meta-Data Encapsulation." BYU ScholarsArchive, 2011. https://scholarsarchive.byu.edu/etd/2614.
Arras, Paul-Antoine. "Ordonnancement d'applications à flux de données pour les MPSoC embarqués hybrides comprenant des unités de calcul programmables et des accélérateurs matériels." Thesis, Bordeaux, 2015. http://www.theses.fr/2015BORD0031/document.
Although numerous electronic devices are nowadays able to play video contents in real time and offer high-quality reproduction, video decoding in embedded systems has not become a trivial process yet. As a mater of fact, recent codecs such as H.264 and HEVC exhibit such a complexity that resorting to mixed sofware-hardware architecture is almost unavoidable. However, programming efficiently this kind of platforms is well-known to be tricky. This thesis addresses the issue of developing streaming applications for hybrid embedded targets and executing them efficiently, and proposes several contributions. The first one is an extension of the classical list-scheduling heuristics to take memory constraints into account. Te second one is a datafow execution model compatible with most existing models and with a large set of hardware platforms, as well as a dynamic scheduler. Lastly, numerous developments have been carried out on a real-world architecture from STMicroelectronics so as to demonstrate the feasibility of the approach
Glanon, Philippe Anicet. "Deployment of loop-intensive applications on heterogeneous multiprocessor architectures." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASG029.
Cyber-physical systems (CPSs) are distributed computing-intensive systems, that integrate a wide range of software applications and heterogeneous processing resources, each interacting with the other ones through different communication resources to process a large volume of data sensed from physical, chemical or biological processes. An essential issue in the design stage of these systems is to predict the timing behaviour of software applications and to provide performance guarantee to these applications. In order tackle this issue, efficient static scheduling strategies are required to deploy the computations of software applications on the processing architectures. These scheduling strategies should deal with several constraints, which include the loop-carried dependency constraints between the computational programs as well as the resource and communication constraints of the processing architectures intended to execute these programs. Actually, loops being one of the most time-critical parts of many computing-intensive applications, the optimal timing behaviour and performance of the applications depends on the optimal schedule of loops structures enclosed in the computational programs executed by the applications. Therefore, to provide performance guarantee for the applications, the scheduling strategies should efficiently explore and exploit the parallelism embedded in the repetitive execution patterns of loops while ensuring the respect of resource and communications constraints of the processing architectures of CPSs. Scheduling a loop under resource and communication constraints is a complex problem. To solve it efficiently, heuristics are obviously necessary. However, to design efficient heuristics, it is important to characterize the set of optimal solutions for the scheduling problem. An optimal solution for a scheduling problem is a schedule that achieve an optimal performance goal. In this thesis, we tackle the study of resource-constrained and communication-constrained scheduling of loop-intensive applications on heterogeneous multiprocessor architectures with the goal of optimizing throughput performance for the applications. In order to characterize the set of optimal scheduling solutions and to design efficient scheduling heuristics, we use synchronous dataflow (SDF) model of computation to describe the loop structures specified in the computational programs of software applications and we design software pipelined scheduling strategies based on the structural and mathematical properties of the SDF model
Farabet, Clément. "Analyse sémantique des images en temps-réel avec des réseaux convolutifs." Phd thesis, Université Paris-Est, 2013. http://tel.archives-ouvertes.fr/tel-00965622.
Wei, Ching-Jen, and 魏慶仁. "AN EFFICIENT DATAFLOW ARCHITECTURE FOR MACRO ACTOR PROCESSING." Thesis, 1995. http://ndltd.ncl.edu.tw/handle/96458741982320537024.
大同工學院
資訊工程研究所
83
The dataflow execution which potentially exploits the parallelism hidden in the program can achieve the speedup efficiently. According to the availability of operands, the corresponding instructions can be fired if all of the necessary operands have arrived, that is, instruction can be executed without waiting for previous instruction if no dependency encountered. Through dataflow execution model can exploit all levels of parallelism, advantages of locality in sequential execution is not adopted. The proposed architecture adopts both the advantages of locality and exploitation of parallelism. The processor applies double queues to reduce the bubble instruction between match and execution units, so that the utilization is promoted. The simplicity of architecture is also one of the features. Based on the high level dataflow language, SISAL, and the related translator, the software environment is built for our architecture. According to the partition criteria, the dataflow graph (DFG) is partitioned and allocated to the processing elements. Simulation results under the Scientific and Engineering Software (SES) show this an efficient processor.
"Constraint extension to dataflow network." 2004. http://library.cuhk.edu.hk/record=b5891959.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.
Includes bibliographical references (leaves 90-93).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 2 --- Preliminaries --- p.4
Chapter 2.1 --- Constraint Satisfaction Problems --- p.4
Chapter 2.2 --- Dataflow Networks --- p.5
Chapter 2.3 --- The Lucid Programming Language --- p.9
Chapter 2.3.1 --- Daton Domain --- p.10
Chapter 2.3.2 --- Constants --- p.10
Chapter 2.3.3 --- Variables --- p.10
Chapter 2.3.4 --- Dataflow Operators --- p.11
Chapter 2.3.5 --- Functions --- p.16
Chapter 2.3.6 --- Expression and Statement --- p.17
Chapter 2.3.7 --- Examples --- p.17
Chapter 2.3.8 --- Implementation --- p.19
Chapter 3 --- Extended Dataflow Network --- p.25
Chapter 3.1 --- Assertion Arcs --- p.25
Chapter 3.2 --- Selection Operators --- p.27
Chapter 3.2.1 --- The Discrete Choice Operator --- p.27
Chapter 3.2.2 --- The Discrete Committed Choice Operator --- p.29
Chapter 3.2.3 --- The Range Choice Operators --- p.29
Chapter 3.2.4 --- The Range Committed Choice Operators --- p.32
Chapter 3.3 --- Examples --- p.33
Chapter 3.4 --- E-Lucid --- p.39
Chapter 3.4.1 --- Modified Four Cockroaches Problem --- p.42
Chapter 3.4.2 --- Traffic Light Problem --- p.45
Chapter 3.4.3 --- Old Maid Problem --- p.48
Chapter 4 --- Implementation of E-Lucid --- p.54
Chapter 4.1 --- Overview --- p.54
Chapter 4.2 --- Definition of Terms --- p.56
Chapter 4.3 --- Function ELUCIDinterpreter --- p.57
Chapter 4.4 --- Function Edemand --- p.58
Chapter 4.5 --- Function transf ormD --- p.59
Chapter 4.5.1 --- Labelling Datastreams of Selection Operators --- p.59
Chapter 4.5.2 --- Removing Committed Choice Operators --- p.62
Chapter 4.5.3 --- "Removing asa, wvr, and upon" --- p.62
Chapter 4.5.4 --- Labelling Output Datastreams of if-then-else-fi --- p.63
Chapter 4.5.5 --- Transforming Statements to Daton Statements --- p.63
Chapter 4.5.6 --- Transforming Daton Expressions Recursively --- p.65
Chapter 4.5.7 --- An Example --- p.65
Chapter 4.6 --- "Functions constructCSP, f indC, and transf ormC" --- p.68
Chapter 4.7 --- An Example --- p.75
Chapter 4.8 --- Function backtrack --- p.77
Chapter 5 --- Related Works --- p.83
Chapter 6 --- Conclusion --- p.87
Chung, Hua-Yuan, and é¾è¯å . "VHDL Implementation of Scheduled Dataflow Architecture and the Register Context." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/79120224172051556308.
輔仁大學
資訊工程學系
98
Since the invention of microprocessors around 1970, CPU performance improvement together with the ILP had been the main focus in the computer industry. Around the year 2000, ILP seemed to have reached a limit, together with the power consumption and heat dissipation emerged multi-core era. The focus has shifted from ILP to TLP and efficient use of multi-core processors. However, the RAW hazard detection technique relies on complex hardware in the current computers which may cause the designers to make the CPU consume lot of energy and the design more complex. In this particular research we propose a totally different architecture and a different way to solve the RAW hazard. By using dataflow paradigm, we can naturally eliminate the RAW hazards. Besides, this architecture comes as a new paradigm to closely link the ILP and TLP by combining sequential and dataflow paradigm. This is named as Scheduled Dataflow Architecture (SDF). SDF is a non-blocking multithreaded decoupled dataflow architecture, because the main engine relies on dataflow paradigm. Since it is a decoupled architecture, the synchronization processor is responsible for data access and the execution processor is responsible for execution of all the instructions. Previously SDF was simulated in C++ and C languages [19-20]. For more precisely to imitate the hardware complexity, this simulation uses VHDL to implement SDF and simulated it by ModelSIM. We have also tested using Altera DE2 hardware. The main focus of this research is to measure the performance gain having more register context. When a multithreaded architecture is used, passing of data between threads can happen through the frame memory. If we use the register context, and efficiently pass the data to the following threads that need the results of the previous thread, several memory accesses can be reduced, thus improving the performance of a program. To test the SDF, we have also used the program into CycloneII FPGA chip of DE2 board. SDF uses at least 50% of the resource of CycloneII. CycloneII can synthesis SDF using at most four register sets. We used for all these synthesis and found that SDF requires at least two register sets to run multithreaded program concurrently.
CAI, FENG-ZHOU, and 蔡豐洲. "A pipeline bubbles reduction technique for the Monsoon dataflow architecture." Thesis, 1993. http://ndltd.ncl.edu.tw/handle/32228626885367156032.
Varadarajan, Keshavan. "A Coarse Grained Reconfigurable Architecture Framework Supporting Macro-Dataflow Execution." Thesis, 2012. http://etd.iisc.ernet.in/handle/2005/2302.
Lee, Yen-Lin, and 李彥霖. "Study on Reconfigurable System-On-Chip Architecture Based on Dataflow Computing." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/45120514829399151226.
國立交通大學
電機與控制工程系
89
Nowadays, the industry of information appliances and communication products is growing rapidly. System-On-Chip(SOC) design has been becoming the key to enable the explosive growth. However, the state-of-the-art SOC design is rather IP-based and provides several challenges to designers, especially in the transformation of algorithms and the integration of IP cores. To alleviate the difficulty of SOC design this thesis proposes a novel dataflow architecture for the integration of IP cores. The proposed architecture employs the Petri Nets to dynamically schedule the operations of DSP algorithms onto processing elements and, hence, efficiently performs dataflow computing. As a result, the Petri-Net-based scheduler is insensitive to the timing of computation and interprocessor communication, and the proposed SOC architecture is reconfigurable for DSP applications.
Alle, Mythri. "Compiling For Coarse-Grained Reconfigurable Architectures Based On Dataflow Execution Paradigm." Thesis, 2012. http://etd.iisc.ernet.in/handle/2005/2453.