Dissertations / Theses: 'Embedded multiprocessor systems'

1

Al-Hasawi, Waleed Isa. "Multiprocessor design for real-time embedded systems." Thesis, Loughborough University, 1987. https://dspace.lboro.ac.uk/2134/7474.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Valente, Frederico Miguel Goulão. "Static analysis on embedded heterogeneous multiprocessor systems." Master's thesis, Universidade de Aveiro, 2008. http://hdl.handle.net/10773/2180.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Gong, Shaojie, and Zhongping Deng. "Benchmarks for Embedded Multi-processors." Thesis, Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-660.

Full text

Abstract:

During the recent years, computer performance has increased dramatically. To measure

the performance of computers, benchmarks are ideal tools. Benchmarks exist in many

areas and point to different applications. For instance, in a normal PC, benchmarks can be

used to test the performance of the whole system which includes the CPU, graphic card,

memory system, etc. For multiprocessor systems, there also exist open source benchmark

programs. In our project, we gathered information about some open benchmark programs

and investigated their applicability for evaluating embedded multiprocessor systems

intended for radar signal processing. During our investigation, parallel cluster systems

and embedded multiprocessor systems were studied. Two benchmark programs, HPL and

NAS Parallel Benchmark were identified as particularly relevant for the application field.

The benchmark testing was done on a parallel cluster system which has an architecture

that is similar to the architecture of embedded multiprocessor systems, used for radar

signal processing.

APA, Harvard, Vancouver, ISO, and other styles

4

Khan, Jehangir. "Embedded multiprocessor architectures for automative driver assistance systems." Valenciennes, 2009. http://ged.univ-valenciennes.fr/nuxeo/site/esupversions/d494f35c-ba4b-4230-bb99-881df0742ab6.

Full text

Abstract:

Automotive crashes are responsible for the highest number of accidental deaths all over the world. Researchers, automotive manufacturers and government authorities around the world are continuously looking for solutions to this problem. Research has shown that half of the accidents can be avoided if a driver is alerted to an impending collision a fraction of a second in advance. A mechanism for warning the driver of an approaching danger is called a Driver Assistance System (DAS). Accident statistics show that a great majority of the vehicle crashes result from front-end collisions. Hence minimizing frontal collisions would significantly decrease road accidents. To predict a front-end collision sufficiently in advance, the obstacle must be detected from a distance. Moreover, for the DAS to be really effective, an imminent collision must be sensed in all circumstances, especially in poor weather where the DAS is needed most. A radar sensor fulfils both the prerequisites of long range obstacle detection and all-weather operation. However, only detecting obstacles can be useful to a certain extent. To establish whether an obstacle is on a collision course with the host vehicle, its trajectory must be foreseen before it comes close to the host vehicle. Determining the trajectory of a moving object requires its dynamic behavior to be monitored over a period of time. In a real traffic scenario more than one obstacle can pose danger to the host vehicle, hence trajectories of multiple objects have to be monitored simultaneously. An apparatus which is capable of performing such functions is called a Multiple Target Tracking (MTT) system. In this thesis we propose a DAS using the principles of Multiple Target Tracking to monitor the dynamics of obstacles hundreds of meters ahead and to avoid a collision of the host vehicle with them. While theoretically such a system offers one of the best answers to the road accident problem, its practical implementation is not a trivial task. It involves complex computations and consequently, needs a long processing time. However, to alert a driver to an approaching danger in real time, the computations must be performed very rapidly. We use multiple processors in our system to share the computation load and thereby reduce the processing time. Multiple processors running in parallel not only speed up the computation but also address the power consumption issues of the embedded systems. We use FPGA (Field Programmable Gate Array) as the implementation platform for our multiprocessor system. FPGAs offer the flexibility needed for the ever evolving embedded systems and they are very cost effective. A multiprocessor system implemented in an FPGA makes its architecture flexible and reconfigurable while the processors can be reprogrammed when needed. Thus FPGA based multiprocessor systems guarantee flexibility in hardware as well as in software therefore they scale very easily. We optimize the system architecture to minimize its hardware size while still meeting the realtime deadlines of the application. Minimized hardware not only leads to reducing energy consumption of the system but also enables us to fit the system in a smaller FPGA which plays an important role in reducing the cost of the system
Les accidents de véhicules automobiles sont responsables du plus grand nombre de décès dans le monde. Les chercheurs, les constructeurs automobiles et les autorités gouvernementales internationales sont continuellement à la recherche de solutions pour résoudre ce problème. La recherche a montré que la moitié des accidents peut être évitée si le conducteur est alerté d'une collision imminente une fraction de seconde à l'avance. Un mécanisme d'alerte d'un danger proche est appelé Driver Assistance Systems (DAS). Les statistiques montrent qu'une grande majorité des accidents de véhicules se passent à la suite d'une collision frontale. Minimiser les collisions frontales devrait donc diminuer considérablement les accidents de la route. Pour prévoir une collision frontale suffisamment à l'avance, l'obstacle doit être détecté à distance. En outre, pour que le système d’aide à la conduite soit réellement efficace, une collision imminente doit être prévue en tenant compte de toutes les circonstances : par exemple plus il fait mauvais, plus ce système est nécessaire. Un capteur radar remplit les conditions préalables de détection d'obstacles à longue portée en tenant compte des conditions météorologiques. Pour déterminer si un obstacle se trouve sur une trajectoire de collision avec le véhicule équipé, sa trajectoire doit être prévue avant qu'il n’arrive près du véhicule concerné. La détermination de la trajectoire d'un objet en mouvement exige que son comportement dynamique soit suivi sur une période de temps. Dans un scénario de trafic réel, plus d'un obstacle peut être considéré comme un danger, c’est pourquoi les trajectoires d'objets multiples doivent être surveillées simultanément. Un appareil capable d'assurer de telles fonctions est appelé un système de suivi d’obstacles multiples (Multiple Target Tracking : MTT). Dans cette thèse nous proposons un système d’aide à la conduite original utilisant les principes du MTT pour suivre la dynamique d’obstacles situés à plus d’une centaine de mètres et pour éviter une collision avec le véhicule équipé. En théorie, un tel système offre une des meilleures réponses au problème des accidents de la route, mais sa mise en œuvre reste difficile à réaliser. Elle implique des calculs complexes et, par conséquent, les besoins de traitement prennent du temps. Cependant, pour aviser le conducteur d'un danger imminent en temps réel, les calculs doivent être effectués très rapidement. Nous avons alors opté pour une solution optimale utilisant des processeurs afin de partager la charge de calcul et de réduire ainsi le temps de traitement. Les processeurs multiples fonctionnant en parallèle permettent non seulement d'accélérer le calcul, mais aussi d’optimiser la consommation d’énergie du système embarqué. Nous utilisons des FPGA (Field Programmable Gate Array) comme plateforme de mise en œuvre de notre système multiprocesseur. Les FPGA offrent la souplesse nécessaire pour les systèmes embarqués en constante évolution et sont très rentables. Un système multiprocesseur réalisé dans un FPGA rend son architecture flexible et reconfigurable et les processeurs peuvent être reprogrammés si nécessaire. Ainsi les systèmes multiprocesseurs à base de FPGA garantissent une souplesse du matériel ainsi que des logiciels, et par conséquent ces systèmes deviennent facilement évolutifs (scalables). Nous optimisons l'architecture du système afin de minimiser la taille du matériel tout en respectant les délais en temps réel de l’application. La minimisation du matériel ne conduit pas seulement à réduire la consommation d'énergie du système, mais nous permet aussi d'adapter le système dans un FPGA plus réduit, ce qui joue un rôle important dans la réduction du coût du système

APA, Harvard, Vancouver, ISO, and other styles

5

Nélis, Vincent. "Energy-aware real-time scheduling in embedded multiprocessor systems." Doctoral thesis, Universite Libre de Bruxelles, 2010. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/210058.

Full text

Abstract:

Nowadays, computer systems are everywhere. From simple portable devices such as watches and MP3 players to large stationary installations that control nuclear power plants, computer systems are now present in all aspects of our modern and every-day life. In about only 70 years, they have completely perturbed our way of life and they reached a so high degree of sophistication that they will be soon capable of driving our cars and cleaning our houses without any human intervention. As computer systems gain in responsibilities, it becomes essential that they provide both safety and reliability. Indeed, a failure in systems such as the anti-lock braking system (ABS) in cars could threaten human lives and generate catastrophic and irreversible consequences. Hence, for many years, researchers have addressed these emerging problems of system safety and reliability which come along with this fulgurant evolution.

This thesis provides a general overview of embedded real-time computer systems, i.e. a particular kind of computer system whose number grows daily. We provide the reader with some preliminary knowledge and a good understanding of the concepts that underlie this emerging technology. We focus especially on the theoretical problems related to the real-time issue and briefly summarizes the main solutions, together with their advantages and drawbacks. This brings the reader through all the conceptual layers constituting a computer system, from the software level---the logical part---that specifies both the system behavior and requirements to the hardware level---the physical part---that actually performs the expected treatments and reacts to the environment. In the meanwhile, we introduce the theoretical models that allow researchers for theoretical analyses which ensure that all the system requirements are fulfilled. Finally, we address the energy consumption problem in embedded systems. We describe the various factors of power dissipation in modern technologies and we introduce different solutions to reduce this consumption./Cette thèse se focalise sur un type de systèmes informatiques bien précis appelés “systèmes embarqués temps réel”. Un système est dit “embarqué” lorsqu’il est développé afin de servir un but bien précis. Un téléphone portable est un parfait exemple de système embarqué étant donné que toutes ses fonctionnalités sont rigoureusement définies avant même sa conception. Au contraire, un ordinateur personnel n’est généralement pas considéré comme un système embarqué, les concepteurs ne sachant pas à l’avance à quelles fins il sera utilisé. Une grande partie de ces systèmes embarqués ont des contraintes temporelles très fortes, ce qui les distingue encore plus des ordinateurs grand public. A titre d’exemple, lorsqu’un conducteur de voiture freine brusquement, l’ordinateur de bord déclenche l’application ABS et il est primordial que cette application soit traitée endéans une courte échéance. Autrement dit, cette fonctionnalité ABS doit être traitée prioritairement par rapport aux autres fonctionnalités du véhicule. Ce type de système embarqué est alors dit “temps réel”, dû à ces notions de temps et de priorités entre les applications. La problèmatique posée par les systèmes temps réel est la suivante. Comment déterminer, à tout moment, un ordre d’exécution des différentes fonctionnalités de telle sorte qu’elles soient toutes exécutées entièrement endéans leur échéance ?De plus, avec l’apparition récente des systèmes multiprocesseurs, cette problématique s’est fortement complexifiée, vu que le système doit à présent déterminer quelle fonctionnalité s’exécute à quel moment sur quel processeur afin que toutes les contraintes temporelles soient respectées. Pour finir, ces systèmes embarqués temp réel multiprocesseurs se sont rapidement retrouvés confrontés à un problème de consommation d’énergie. Leur demande en terme de performance (et donc en terme d’énergie) à évolué beaucoup plus rapidement que la capacité des batteries qui les alimentent. Ce problème est actuellement rencontré par de nombreux systèmes, tels que les téléphones portables par exemple. L’objectif de cette thèse est de parcourir les différents composants de tels système embarqués et de proposer des solutions afin de réduire leur consommation d’énergie.
Doctorat en Sciences
info:eu-repo/semantics/nonPublished

APA, Harvard, Vancouver, ISO, and other styles

6

Liang, Yuchen, and Syed Muhammad Zeeshan Iqbal. "OpenMPBench : An Open-Source Benchmark for Multiprocessor Based Embedded Systems." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-4556.

Full text

Abstract:

It is a new and open-source benchmark for multiprocessor based embedded system. It comprises a set of parallel implementations for seven classical algorithms that cover different computing features of general-purpose processor. The performance data including tables and figures is provided for guiding the potential users to evaluate the design of multiprocessor based embedded system. The parallel implementations for seven applications that cover four categories are shown according to the category: Automation and Industry Control * Bitcount * SUSAN * BASICMATH Network * Patricia * Dijkstra Office * Stringsearch Security * SHA Among them, Bitcount and Dijkstra involve more than one parallel application implemented for different functions or using different strategies. Bitcount consists three parallel applications, parallel Bitcnt_1, parallel Bitstring and parallel Bitcnts, that implemented bit counting with different strategy. Three parallel applications implemented for Dijkstra. One is for all-pairs shortest paths problem. Another two are for solving single-source shortest paths problem using single queue strategy and multiple queue strategy respectively. Stringsearch consists of Pratt-Boyer-Moore, Case-sensitive Boyer-Moore-Horspool, Case-Insensitive Boyer-Moore-Horspool, and Boyer-Moore-Horspool (Case-insensitive with accented character translation) implementations. Source code of sequential versions of these applications download from Mibench as well as the standard output based on x86-linux. For OpenMPBench, all parallel applications have been implemented in ANCI C language using POSIX threads. All libraries related to implementations are based on GNU standard library. Development environment is in UBUNTU 9.04 with 2.6.28-generic Linux kernel, GCC 4.2.4 compiler, and Emacs 22.1 editor. On the basis of current hardware condition, a workstation with 8 processors, shipped with UBUNTU 4.2.4, is selected for experiment environment. UBUNTU is a free GNU Linux version that offers all GNU standard library and GCC has been installed by default. In conclusion, we consider this experiment environment is available to simulate the multiprocessor based on embedded systems.
Det är en ny och öppen källkod riktmärke för multiprocessor baserade inbyggda system. Det innehåller en rad parallella implementationer i sju klassiska algoritmer som täcker olika datorer funktioner i allmänt bruk processor. Uppgifter om prestanda inklusive tabeller och siffror ges för att styra potentiella användare att utvärdera utformningen av multiprocessor baserade inbyggda system. De parallella implementeringar för sju ansökningar som omfattar fyra kategorier visas beroende på vilken kategori: Automation och industri Control * Bitcount * SUSAN * BASICMATH Nätverk * Patricia * Dijkstra Office * Stringsearch Säkerhet * SHA Bland dem, Bitcount och Dijkstra omfattar mer än en parallell ansökan genomförs för olika funktioner eller med hjälp av olika strategier. Bitcount består tre parallella program, parallell Bitcnt_1, parallell Bitstring och parallella Bitcnts, som genomförs bit räknar med olika strategi. Tre parallella ansökningar genomförs för Dijkstra. Den ena är för all-par kortaste stigar problem. Ytterligare två är för att lösa enda källa kortaste stigar problemet, använder en kö strategi och flera kö strategi respektive. Stringsearch består av Pratt-Boyer-Moore, skiftlägeskänslig Boyer-Moore-Horspool, skiftlägesokänslig Boyer-Moore-Horspool, och Boyer-Moore-Horspool (små bokstäver med accenttecken översättning) implementationer. Källkod sekventiell versioner av dessa program att hämta från Mibench liksom standard produktion baserad på x86-linux. För OpenMPBench har alla parallella ansökningar har genomförts i ANCI C-språk med POSIX trådar. Alla bibliotek i samband med implementationer är baserat på GNU standard bibliotek. Utvecklingsmiljö i Ubuntu 9.04 med 2.6.28-generic Linuxkärnan, GCC 4.2.4 kompilator och Emacs 22,1 redaktör. På grundval av nuvarande hårdvara skick, en arbetsstation med 8 processorer, som levereras med Ubuntu 4.2.4, har valts för experiment miljön. Ubuntu är ett gratis GNU Linux-version som kan erbjuda alla GNU Standard bibliotek och GCC har installerats som standard. Sammanfattningsvis anser vi att detta experiment miljön är tillgänglig för att simulera multiprocessor baserade på inbyggda system.
Yuchen Liang: phone no: 8641182120823 6-3-1, No. 44, Huabei Road Ganduan, Ganjingzi District, Dalian City, 116023, Liaoning Province, P. R. China Syed Muhammad Zeeshan Iqbal: phone no: 92415510275 Muhallah Gurunanak Pura, Street No: 7, House No:211, Faisalabad, Pakistan

APA, Harvard, Vancouver, ISO, and other styles

7

Shalan, Mohamed A. "Dynamic memory management for embedded real-time multiprocessor system-on-a-chip." Diss., Available online, Georgia Institute of Technology, 2003:, 2003. http://etd.gatech.edu/theses/available/etd-11252003-131621/unrestricted/shalanmohameda200312.pdf.

Full text

Abstract:

Thesis (Ph. D.)--Electrical and Computer Engineering, Georgia Institute of Technology, 2004.
Vincent Mooney, Committee Chair; John Barry, Committee Member; James Hamblen, Committee Member; Karsten Schwan, Committee Member; Linda Wills, Committee Member. Includes bibliography.

APA, Harvard, Vancouver, ISO, and other styles

8

Erbaş, Çaǧkan. "System-level modeling and design space exploration for multiprocessor embedded system-on-chip architectures." Amsterdam : Amsterdam : Vossiuspers ; Universiteit van Amsterdam [Host], 2006. http://dare.uva.nl/document/38007.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Rosén, Jakob. "Predictable Real-Time Applications on Multiprocessor Systems-on-Chip." Licentiate thesis, Linköpings universitet, ESLAB - Laboratoriet för inbyggda system, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-70138.

Full text

Abstract:

Being predictable with respect to time is, by definition, a fundamental requirement for any real-time system. Modern multiprocessor systems impose a challenge in this context, due to resource sharing conflicts causing memory transfers to become unpredictable. In this thesis, we present a framework for achieving predictability for real-time applications running on multiprocessor system-on-chip platforms. Using a TDMA bus, worst-case execution time analysis and scheduling are done simultaneously. Since the worst-case execution times are directly dependent on the bus schedule, bus access design is of special importance. Therefore, we provide an efficient algorithm for generating bus schedules, resulting in a minimized worst-case global delay. We also present a new approach considering the average-case execution time in a predictable context. Optimization techniques for improving the average-case execution time of tasks, for which predictability with respect to time is not required, have been investigated for a long time in many different contexts. However, this has traditionally been done without paying attention to the worst-case execution time. For predictable real-time applications, on the other hand, the focus has been solely on worst-case execution time optimization, ignoring how this affects the execution time in the average case. In this thesis, we show that having a good average-case global delay can be important also for real-time applications, for which predictability is required. Furthermore, for real-time applications running on multiprocessor systems-on-chip, we present a technique for optimizing for the average case and the worst case simultaneously, allowing for a good average case execution time while still keeping the worst case as small as possible. The proposed solutions in this thesis have been validated by extensive experiments. The results demonstrate the efficiency and importance of the presented techniques.

APA, Harvard, Vancouver, ISO, and other styles

10

Tucci, Primiano <1986&gt. "Hardware/Software Design of Dynamic Real-Time Schedulers for Embedded Multiprocessor Systems." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2013. http://amsdottorato.unibo.it/5594/.

Full text

Abstract:

The new generation of multicore processors opens new perspectives for the design of embedded systems. Multiprocessing, however, poses new challenges to the scheduling of real-time applications, in which the ever-increasing computational demands are constantly flanked by the need of meeting critical time constraints. Many research works have contributed to this field introducing new advanced scheduling algorithms. However, despite many of these works have solidly demonstrated their effectiveness, the actual support for multiprocessor real-time scheduling offered by current operating systems is still very limited. This dissertation deals with implementative aspects of real-time schedulers in modern embedded multiprocessor systems. The first contribution is represented by an open-source scheduling framework, which is capable of realizing complex multiprocessor scheduling policies, such as G-EDF, on conventional operating systems exploiting only their native scheduler from user-space. A set of experimental evaluations compare the proposed solution to other research projects that pursue the same goals by means of kernel modifications, highlighting comparable scheduling performances. The principles that underpin the operation of the framework, originally designed for symmetric multiprocessors, have been further extended first to asymmetric ones, which are subjected to major restrictions such as the lack of support for task migrations, and later to re-programmable hardware architectures (FPGAs). In the latter case, this work introduces a scheduling accelerator, which offloads most of the scheduling operations to the hardware and exhibits extremely low scheduling jitter. The realization of a portable scheduling framework presented many interesting software challenges. One of these has been represented by timekeeping. In this regard, a further contribution is represented by a novel data structure, called addressable binary heap (ABH). Such ABH, which is conceptually a pointer-based implementation of a binary heap, shows very interesting average and worst-case performances when addressing the problem of tick-less timekeeping of high-resolution timers.

APA, Harvard, Vancouver, ISO, and other styles

11

Hegde, Sridhar. "FUNCTIONAL ENHANCEMENT AND APPLICATIONS DEVELOPMENT FOR A HYBRID, HETEROGENEOUS SINGLE-CHIP MULTIPROCESSOR ARCHITECTURE." UKnowledge, 2004. http://uknowledge.uky.edu/gradschool_theses/252.

Full text

Abstract:

Reconfigurable and dynamic computer architecture is an exciting area of research that is rapidly expanding to meet the requirements of compute intense real and non-real time applications in key areas such as cryptography, signal/radar processing and other areas. To meet the demands of such applications, a parallel single-chip heterogeneous Hybrid Data/Command Architecture (HDCA) has been proposed. This single-chip multiprocessor architecture system is reconfigurable at three levels: application, node and processor level. It is currently being developed and experimentally verified via a three phase prototyping process. A first phase prototype with very limited functionality has been developed. This initial prototype was used as a base to make further enhancements to improve functionality and performance resulting in a second phase virtual prototype, which is the subject of this thesis. In the work reported here, major contributions are in further enhancing the functionality of the system by adding additional processors, by making the system reconfigurable at the node level, by enhancing the ability of the system to fork to more than two processes and by designing some more complex real/non-real time applications which make use of and can be used to test and evaluate enhanced and new functionality added to the architecture. A working proof of concept of the architecture is achieved by Hardware Description Language (HDL) based development and use of a Virtual Prototype of the architecture. The Virtual Prototype was used to evaluate the architecture functionality and performance in executing several newly developed example applications. Recommendations are made to further improve the system functionality.

APA, Harvard, Vancouver, ISO, and other styles

12

Janka, Randall Scott. "A model-continuous specification and design methodology for embedded multiprocessor signal processing systems." Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/15630.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Li, Jiayin. "ENERGY-AWARE OPTIMIZATION FOR EMBEDDED SYSTEMS WITH CHIP MULTIPROCESSOR AND PHASE-CHANGE MEMORY." UKnowledge, 2012. http://uknowledge.uky.edu/ece_etds/7.

Full text

Abstract:

Over the last two decades, functions of the embedded systems have evolved from simple real-time control and monitoring to more complicated services. Embedded systems equipped with powerful chips can provide the performance that computationally demanding information processing applications need. However, due to the power issue, the easy way to gain increasing performance by scaling up chip frequencies is no longer feasible. Recently, low-power architecture designs have been the main trend in embedded system designs. In this dissertation, we present our approaches to attack the energy-related issues in embedded system designs, such as thermal issues in the 3D chip multiprocessor (CMP), the endurance issue in the phase-change memory(PCM), the battery issue in the embedded system designs, the impact of inaccurate information in embedded system, and the cloud computing to move the workload to remote cloud computing facilities. We propose a real-time constrained task scheduling method to reduce peak temperature on a 3D CMP, including an online 3D CMP temperature prediction model and a set of algorithm for scheduling tasks to different cores in order to minimize the peak temperature on chip. To address the challenging issues in applying PCM in embedded systems, we propose a PCM main memory optimization mechanism through the utilization of the scratch pad memory (SPM). Furthermore, we propose an MLC/SLC configuration optimization algorithm to enhance the efficiency of the hybrid DRAM + PCM memory. We also propose an energy-aware task scheduling algorithm for parallel computing in mobile systems powered by batteries. When scheduling tasks in embedded systems, we make the scheduling decisions based on information, such as estimated execution time of tasks. Therefore, we design an evaluation method for impacts of inaccurate information on the resource allocation in embedded systems. Finally, in order to move workload from embedded systems to remote cloud computing facility, we present a resource optimization mechanism in heterogeneous federated multi-cloud systems. And we also propose two online dynamic algorithms for resource allocation and task scheduling. We consider the resource contention in the task scheduling.

APA, Harvard, Vancouver, ISO, and other styles

14

Reiche, Myrgård Martin. "Acceleration of deep convolutional neural networks on multiprocessor system-on-chip." Thesis, Uppsala universitet, Avdelningen för datorteknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-385904.

Full text

Abstract:

In this master thesis some of the most promising existing frameworks and implementations of deep convolutional neural networks on multiprocessor system-on-chips (MPSoCs) are researched and evaluated. The thesis’ starting point was a previousthesis which evaluated possible deep learning models and frameworks for object detection on infra-red images conducted in the spring of 2018. In order to fit an existing deep convolutional neural network (DCNN) on a Multiple-Processor-System on Chip it needs modifications. Most DCNNs are trained on Graphic processing units (GPUs) with a bit width of 32 bit. This is not optimal for a platform with hard memory constraints such as the MPSoC which means it needs to be shortened. The optimal bit width depends on the network structure and requirements in terms of throughput and accuracy although most of the currently available object detection networks drop significantly when reduced below 6 bits width. After reducing the bit width, the network needs to be quantized and pruned for better memory usage. After quantization it can be implemented using one of many existing frameworks. This thesis focuses on Xilinx CHaiDNN and DNNWeaver V2 though it touches a little on revision, HLS4ML and DNNWeaver V1 as well. In conclusion the implementation of two network models on Xilinx Zynq UltraScale+ ZCU102 using CHaiDNN were evaluated. Conversion of existing network were done and quantization tested though not fully working. The results were a two to six times more power efficient implementation in comparison to GPU inference.

APA, Harvard, Vancouver, ISO, and other styles

15

An, Xin. "High Level Design and Control of Adaptive Multiprocessor Systems-on-Chip." Phd thesis, Université de Grenoble, 2013. http://tel.archives-ouvertes.fr/tel-00904884.

Full text

Abstract:

La conception de systèmes embarqués modernes est de plus en plus complexe, car plus de fonctionnalités sont intégrées dans ces systèmes. En même temps, afin de répondre aux exigences de calcul tout en conservant une consommation d'énergie de faible niveau, MPSoCs sont apparus comme les principales solutions pour tels systèmes embarqués. En outre, les systèmes embarqués sont de plus en plus adaptatifs, comme l'adaptabilité peut apporter un certain nombre d'avantages, tels que la flexibilité du logiciel et l'efficacité énergétique. Cette thèse vise la conception sécuritaire de ces MPSoCs adaptatifs. Tout d'abord, chaque configuration de système doit être analysée en ce qui concerne ses propriétés fonctionnelles et non fonctionnelles. Nous présentons un cadre abstraite de conception et d'analyse qui permet des décisions d'implémentation rapide et rentable. Ce cadre est conçu comme un support de raisonnement intermédiaire pour les environnements de co-conception de logiciel / matériel au niveau de système. Il peut élaguer l'espace de conception à sa plus grande portée, et identifier les candidats de solutions de conception de manière rapide et efficace. Dans ce cadre, nous utilisons un codage basé sur l'horloge abstraite pour modéliser les comportements du système. Différents scénarios d'applications de mapping et de planification sur MPSoCs sont analysés via les traces d'horloge qui représentent les simulations du système. Les propriétés d'intérêt sont l'exactitude du comportement fonctionnel, la performance temporelle et la consommation d'énergie. Deuxièmement, la gestion de la reconfiguration de MPSoCs adaptatifs doit être abordée. Nous sommes particulièrement intéressés par les MPSoCs implémentés sur des architectures reconfigurables (ex. FPGAs) qui offrent une bonne flexibilité et une efficacité de calcul pour les MPSoCs adaptatifs. Nous proposons un cadre général de conception basé sur la technique de la synthèse de contrôleurs discrets (DCS) pour résoudre ce problème. L'avantage principal de cette technique est qu'elle permet une synthèse d'un contrôleur automatique selon une spécification des objectifs de contrôle. Dans ce cadre, le comportement de reconfiguration du système est modélisé en termes d'automates synchrones en parallèle. Le problème de calcul de la gestion reconfiguration selon de multiples objectifs concernant, par exemple, les usages des ressources, la performance et la consommation d'énergie, est codé comme un problème de DCS. Le langage de programmation BZR existant et l'outil Sigali sont employés pour effectuer DCS et générer un contrôleur qui satisfait aux exigences du système. Finalement, nous étudions deux façons différentes de combiner les deux cadres de conception proposées pour MPSoCs adaptatifs. Tout d'abord, ils sont combinés pour construire un flot de conception complet pour MPSoCs adaptatifs. Deuxièmement, ils sont combinés pour présenter la façon dont le manager run-time calculé par le second cadre peut être intégré dans le premier cadre afin de réaliser des simulations et des analyses combinées de MPSoCs adaptatifs.

APA, Harvard, Vancouver, ISO, and other styles

16

Klingler, Randall S. "Compilation and Generation of Multi-Processor on a Chip Real-Time Embedded Systems." Diss., CLICK HERE for online access, 2007. http://contentdm.lib.byu.edu/ETD/image/etd1941.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Qamhieh, Manar. "Scheduling of parallel real-time DAG tasks on multiprocessor systems." Thesis, Paris Est, 2015. http://www.theses.fr/2015PEST1030/document.

Full text

Abstract:

Les applications temps réel durs sont celles qui doivent exécuter en respectant des contraintes temporelles. L'ordonnancement temps réel a bien été étudié sur mono-processeurs depuis plusieurs années. Récemment, l'utilisation d'architectures multiprocesseurs a augmenté dans les applications industrielles et des architectures parallèles sont proposées pour que le logiciel devienne compatible avec ces plateformes. L'ordonnancement multiprocesseurs de tâches parallèles dépendantes n'est pas une simple généralisation du cas mono-processeur et la problématique d'ordonnancement devient plus complexe et difficile. Dans cette thèse, nous étudions le problème d'ordonnancement temps réel de graphes de tâches parallèles acycliques sur des plateformes multiprocesseurs. Dans ce modèle, un graphe est composé d'un ensemble de sous-tâches dépendantes sous contraintes de précédence qui expriment les relations de précédences entre les sous-tâches. L'ordre d'exécution des sous-tâches est dynamique, c'est-à-dire que les sous-tâches peuvent s'exécuter en parallèle ou séquentiellement par rapport aux décisions de l'ordonnanceur temps réel. Pour traiter les contraintes de précédence, nous proposons deux méthodes pour l'ordonnancement des graphes : par transformation du modèle de graphe de sous tâches parallèles en un modèle de tâches séquentielles indépendantes, plus simple à ordonnancer et par ordonnancement direct des graphes en prenant en compte les relations de dépendance entre les sous-tâches. Nous proposons un ordonnancement des graphes en prenant directement en compte les paramètres temporels des graphes et un ordonnancement au niveau des sous-tâches, par rapport à des paramètres temporels attribués aux sous-tâches par un algorithme spécifique. Enfin, nous prouvons que les deux méthodes d'ordonnancement de graphes ne sont pas comparables. Nous fournissons alors des résultats de simulation pour comparer ces méthodes en utilisant les algorithmes d'ordonnancement globaux EDF et DM. Nous avons développé un logiciel nommé YARTISS pour générer des graphes aléatoires et réaliser les simulations
The interest for multiprocessor systems has recently been increased in industrial applications, and parallel programming API's have been introduced to benefit from new processing capabilities. The use of multiprocessors for real-time systems, whose execution is performed based on certain temporal constraints is now investigated by the industry. Real-time scheduling problem becomes more complex and challenging in that context. In multiprocessor systems, a hard real-time scheduler is responsible for allocating ready jobs to available processors of the systems while respecting their timing parameters. In this thesis, we study the problem of real-time scheduling of parallel Directed Acyclic Graph (DAG) tasks on homogeneous multiprocessor systems. In this model, a DAG task consists of a set of subtasks that execute under precedence constraints. At all times, the real-time scheduler is responsible for determining how subtasks execute, either sequentially or in parallel, based on the available processors of the system. We propose two DAG scheduling approaches to determine the execution form of DAG tasks. The first approach is the DAG Stretching algorithm, from the Model Transformation approach, which forces DAG tasks to execute as sequentially as possible. The second approach is the Direct Scheduling, which aims at scheduling DAG tasks while respecting their internal dependencies. We provide real-time schedulability analyses for Direct Scheduling at DAG-Level and at Subtask-Level. Due to the incomparability of DAG scheduling approaches, we use extensive simulations to compare performance of global EDF with global DM scheduling using our simulation tool YARTISS

APA, Harvard, Vancouver, ISO, and other styles

18

Abich, Geancarlo. "Extending FreeRTOS to support dynamic and distributed task mapping in multiprocessor systems." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2017. http://hdl.handle.net/10183/164048.

Full text

Abstract:

Sistemas de Multiprocessados Embarcados são uma realidade, tanto no setor da indústria e quanto no setor acadêmico. Esses dispositivos oferecem capacidades de processamento paralelo objetivando cobrir requisitos cada vez maiores de aplicações complexas. A carga de trabalho subjacente das aplicações é suscetível a variação em tempo de execução o que, se não for tratada adequadamente, pode levar a degradação de eficiência em desempenho e energia. O aumento contínuo da complexidade da carga de trabalho das aplicações, bem como do tamanho dos sistemas multiprocessados emergentes, requer soluções de mapeamento dinâmicas e distribuídas. A maioria das técnicas de mapeamento propostas são implementações personalizadas, considerando um sistema operacional interno desenvolvido para uma arquitetura de processador específica. Essa prática restringe sua aplicação em outras plataformas, levando a um design extra, revalidação e, consequentemente, um custo oculto que pode ser um tanto quanto alto. Neste cenário, esta dissertação propõe a extensão do FreeRTOS para suportar mapeamento dinâmico e distribuído de tarefas em sistemas multiprocessados. O FreeRTOS tem portabilidade para mais de 30 arquiteturas de processadores embarcados, aumentando a portabilidade de software e reduzindo o tempo de desenvolvimento. A extensão proposta utiliza técnicas de mapeamento que permitem ao FreeRTOS atender a altas demandas de mapeamento de aplicações em tempo de execução. Outra contribuição deste trabalho é o desenvolvimento de um framework que permite a exploração de grandes sistemas fornecendo, simultaneamente, resultados para depuração. O framework proposto possibilita a geração automática de plataformas multiprocessadas considerando seu tamanho, a arquitetura do processador e um conjunto de aplicações. A descrição da plataforma resultante é altamente escalável permitindo extração de dados em tempo de execução e alta depuração. Estas características permitiram validar a extensão do FreeRTOS proposta em mais de uma arquitetura de processador da família ARM Cortex-M. Os casos de teste foram executados em plataformas de grande escala e em diferentes níveis de abstração com casos de mais de 120 aplicações incorporando mais de 600 tarefas processadas. Os resultados mostram que a extensão proposta apresenta resultados melhores ou iguais à literatura.
Embedded Multiprocessor systems are a reality, in both industry and academia sectors. Such devices offer parallel processing capabilities, aiming at covering the increasing requirements of complex applications. Underlying application workloads are susceptible to variation at runtime, which if not properly handled, may lead to the performance and power efficiency degradation. The continuous increase in the complexity of application workload and the size of emerging multiprocessor systems, calls for dynamic and distributed mapping solutions. The majority of the promoted mapping techniques are bespoke implementations, which consider an in-house operating system developed to a particular processor architecture. This practice restricts its adoption in other platforms, leading to extra design time, re-validation and, consequentially, a hidden cost that may well be quite high. In this scenario, this dissertation proposes a FreeRTOS extension that integrates the support to dynamic and distributed tasks mapping in multiprocessor systems. FreeRTOS is portable to more than 30 embedded processors architectures, increasing software portability and reducing development time. The proposed extension employs mapping techniques allowing FreeRTOS for handle high demands of application mapping in runtime. Another contribution of this work is the development of a framework, which enables the exploration of large systems while providing debugging facilities. The proposed framework provides the automatic generation of multiprocessor platforms, considering parameters of size, processor architecture, and an application set. The resulting platform description is high scalable while allows runtime data extraction and high debugging. These features allowed to validate the proposed FreeRTOS extension in more than one processor architecture from ARM Cortex-M family. Test cases were executed on large-scale platforms and at different levels of abstraction with cases of more than 120 applications incorporating more than 600 tasks processed. The results show that the proposed extension presents better or equal results to the literature.

APA, Harvard, Vancouver, ISO, and other styles

19

Zeng, Gang, Tetsuo Yokoyama, Hiroyuki Tomiyama, and Hiroaki Takada. "A Generalized Framework for Energy Savings in Real-Time Multiprocessor Systems." IEEE, 2008. http://hdl.handle.net/2237/12101.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Guan, Nan. "New Techniques for Building Timing-Predictable Embedded Systems." Doctoral thesis, Uppsala universitet, Avdelningen för datorteknik, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-209623.

Full text

Abstract:

Embedded systems are becoming ubiquitous in our daily life. Due to close interaction with physical world, embedded systems are typically subject to timing constraints. At design time, it must be ensured that the run-time behaviors of such systems satisfy the pre-specified timing constraints under any circumstance. In this thesis, we develop techniques to address the timing analysis problems brought by the increasing complexity of underlying hardware and software on different levels of abstraction in embedded systems design. On the program level, we develop quantitative analysis techniques to predict the cache hit/miss behaviors for tight WCET estimation, and study two commonly used replacement policies, MRU and FIFO, which cannot be analyzed adequately using the state-of-the-art qualitative cache analysis method. Our quantitative approach greatly improves the precision of WCET estimation and discloses interesting predictability properties of these replacement policies, which are concealed in the qualitative analysis framework. On the component level, we address the challenges raised by multi-core computing. Several fundamental problems in multiprocessor scheduling are investigated. In global scheduling, we propose an analysis method to rule out a great part of impossible system behaviors for better analysis precision, and establish conditions to guarantee the bounded responsiveness of computing tasks. In partitioned scheduling, we close a long standing open problem to generalize the famous Liu and Layland's utilization bound in uniprocessor real-time scheduling to multiprocessor systems. We also propose to use cache partitioning for multi-core systems to avoid contentions on shared caches, and solve the underlying schedulability analysis problem. On the system level, we present techniques to improve the Real-Time Calculus (RTC) analysis framework in both efficiency and precision. First, we have developed Finitary Real-Time Calculus to solve the scalability problem of the original RTC due to period explosion. The key idea is to only maintain and operate on a limited prefix of each curve that is relevant to the final results during the whole analysis procedure. We further improve the analysis precision of EDF components in RTC, by precisely bounding the response time of each computation request.

APA, Harvard, Vancouver, ISO, and other styles

21

Alfranseder, Martin Verfasser], and Christian [Akademischer Betreuer] [Siemers. "Efficient and robust dynamic scheduling and synchronization in practical embedded real-time multiprocessor systems / Martin Alfranseder ; Betreuer: Christian Siemers." Clausthal-Zellerfeld : Technische Universität Clausthal, 2016. http://d-nb.info/1231365064/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Pop, Ruxandra. "Mapping Concurrent Applications to Multiprocessor Systems with Multithreaded Processors and Network on Chip-Based Interconnections." Licentiate thesis, Linköpings universitet, Institutionen för datavetenskap, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-64256.

Full text

Abstract:

Network on Chip (NoC) architectures provide scalable platforms for designing Systems on Chip (SoC) with large number of cores. Developing products and applications using an NoC architecture offers many challenges and opportunities. A tool which can map an application or a set of applications to a given NoC architecture will be essential. In this thesis we first survey current techniques and we present our proposals for mapping and scheduling of concurrent applications to NoCs with multithreaded processors as computational resources. NoC platforms are basically a special class of Multiprocessor Embedded Systems (MPES). Conventional MPES architectures are mostly bus-based and, thus, are exposed to potential difficulties regarding scalability and reusability. There has been a lot of research on MPES development including work on mapping and scheduling of applications. Many of these results can also be applied to NoC platforms. Mapping and scheduling are known to be computationally hard problems. A large range of exact and approximate optimization algorithms have been proposed for solving these problems. The methods include Branch-and–Bound (BB), constructive and transformative heuristics such as List Scheduling (LS), Genetic Algorithms (GA) and various types of Mathematical Programming algorithms. Concurrent applications are able to capture a typical embedded system which is multifunctional. Concurrent applications can be executed on an NoC which provides a large computational power with multiple on-chip computational resources. Improving the time performances of concurrent applications which are running on Network on Chip (NoC) architectures is mainly correlated with the ability of mapping and scheduling methodologies to exploit the Thread Level Parallelism (TLP) of concurrent applications through the available NoC parallelism. Matching the architectural parallelism to the application concurrency for obtaining good performance-cost tradeoffs is another aspect of the problem. Multithreading is a technique for hiding long latencies of memory accesses, through the overlapped execution of several threads. Recently, Multi-Threaded Processors (MTPs) have been designed providing the architectural infrastructure to concurrently execute multiple threads at hardware level which, usually, results in a very low context switching overhead. Simultaneous Multi-Threaded Processors (SMTPs) are superscalar processor architectures which adaptively exploit the coarse grain and the fine grain parallelism of applications, by simultaneously executing instructions from several thread contexts. In this thesis we make a case for using SMTPs and MTPs as NoC resources and show that such a multiprocessor architecture provides better time performances than an NoC with solely General-purpose Processors (GP). We have developed a methodology for task mapping and scheduling to an NoC with mixed SMTP, MTP and GP resources, which aims to maximize the time performance of concurrent applications and to satisfy their soft deadlines. The developed methodology was evaluated on many configurations of NoC-based platforms with SMTP, MTP and GP resources. The experimental results demonstrate that the use of SMTPs and MTPs in NoC platforms can significantly speed-up applications.

APA, Harvard, Vancouver, ISO, and other styles

23

Kunz, Leonardo. "Memória transacional em hardware para sistemas embarcados multiprocessados conectados por redes-em-chip." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2010. http://hdl.handle.net/10183/28739.

Full text

Abstract:

A Memória Transacional (TM) surgiu nos últimos anos como uma nova solução para sincronização em sistemas multiprocessados de memória compartilhada, permitindo explorar melhor o paralelismo das aplicações ao evitar limitações inerentes ao mecanismo de locks. Neste modelo, o programador define regiões de código que devem executar de forma atômica. O sistema tenta executá-las de forma concorrente, e, em caso de conflito nos acessos à memória, toma as medidas necessárias para preservar a atomicidade e isolamento das transações, na maioria das vezes abortando e reexecutando uma das transações. Um dos modelos mais aceitos de memória transacional em hardware é o LogTM, implementado neste trabalho em um MPSoC embarcado que utiliza uma NoC para interconexão. Os experimentos fazem uma comparação desta implementação com locks, levando-se em consideração performance e energia do sistema. Além disso, este trabalho mostra que o tempo que uma transação espera para reiniciar sua execução após ter abortado (chamado de backoff delay on abort) tem impactos significativos na performance e energia. Uma análise deste impacto é feita utilizando-se de três políticas de backoff. Um mecanismo baseado em um handshake entre transações, chamado Abort handshake, é proposto como solução para o problema. Os resultados dos experimentos são dependentes da aplicação e configuração do sistema e indicam ganhos da TM na maioria dos casos em relação ao mecanismo de locks. Houve redução de até 30% no tempo de execução e de até 32% na energia de aplicações de baixa demanda de sincronização. Em um segundo momento, é feita uma análise do backoff delay on abort na performance e energia de aplicações utilizando três políticas de backoff em comparação com o mecanismo Abort handshake. Os resultados mostram que o mecanismo proposto apresenta redução de até 20% no tempo de execução e de até 53% na energia comparado à melhor política de backoff dentre as analisadas. Para aplicações com alta demanda de sincronização, a TM mostra redução no tempo de execução de até 63% e redução de energia de até 71% em comparação com o mecanismo de locks.
Transactional Memory (TM) has emerged in the last years as a new solution for synchronization on shared memory multiprocessor systems, allowing a better exploration of the parallelism of the applications by avoiding inherent limitations of the lock mechanism. In this model, the programmer defines regions of code, called transactions, to execute atomically. The system tries to execute transactions concurrently, but in case of conflict on memory accesses, it takes the appropriate measures to preserve the atomicity and isolation, usually aborting and re-executing one of the transactions. One of the most accepted hardware transactional memory model is LogTM, implemented in this work in an embedded MPSoC that uses an NoC as interconnection mechanism. The experiments compare this implementation with locks, considering performance and energy. Furthermore, this work shows that the time a transaction waits to restart after abort (called backoff delay on abort) has significant impact on performance and energy. An analysis of this impact is done using three backoff policies. A novel mechanism based on handshake of transactions, called Abort handshake, is proposed as a solution to this issue. The results of the experiments depends on application and system configuration and show TM benefits in most cases in comparison to the locks mechanism, reaching reduction on the execution time up to 30% and reduction on the energy consumption up to 32% on low contention workloads. After that, an analysis of the backoff delay on abort on the performance and energy is presented, comparing to the Abort handshake mechanism. The proposed mechanism shows reduction of up to 20% on the execution time and up to 53% on the energy, when compared to the best backoff policy. For applications with a high degree of synchronization, TM shows reduction on the execution time up to 63% and energy savings up to 71% compared to locks.

APA, Harvard, Vancouver, ISO, and other styles

24

Silva, Gustavo Girão Barreto da. "Estudo sobre o impacto da hierarquia de memória em MPSoCs baseados em NoC." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2009. http://hdl.handle.net/10183/67147.

Full text

Abstract:

Ao longo dos últimos anos, os sistemas embarcados vêm se tornando cada vez mais complexos tanto em termos de hardware quanto de software. Ultimamente têm-se adotado como solução o uso de MPSoCs (sistemas multiprocessados integrados em chip) para uma maior eficiência energética e computacional nestes sistemas. Com o uso de diversos elementos de processamento, redes-em-chip (NoC - networks-on-chip) aparecem como soluções de melhor desempenho do que barramentos. Nestes ambientes cujo desempenho depende da eficiência do modelo de comunicação, a hierarquia de memória se torna um elemento chave. Baseando-se neste cenário, este trabalho realiza uma investigação sobre o impacto da hierarquia de memória em MPSoCs baseados em NoC. Dentro deste escopo foi desenvolvida uma nova organização de memória fisicamente centralizada com diferentes espaços de endereçamentos denominada nDMA. Este trabalho também apresenta uma comparação entre a nova organização e outras três organizações bastante difundidas tais como memória distribuída, memória compartilhada e memória compartilhada distribuída. Estas duas ultimas adotam um modelo de coerência de cache baseado em diretório completamente desenvolvido em hardware. Os modelos de memória foram implementados na plataforma virtual SIMPLE (SIMPLE Multiprocessor Platform Environment). Resultados experimentais mostram uma forte dependência com relação à carga de comunicação gerada pelas aplicações. O modelo de memória distribuída apresenta melhores resultados conforme a carga de comunicação das aplicações é baixa. Por outro lado, o novo modelo de memória fisicamente compartilhado com diferentes espaços de endereçamento apresenta melhores resultados conforme a carga de comunicação das aplicações é alta. Também foram realizados experimentos objetivando analisar o desempenho dos modelos de memória em situações de alta latência de comunicação na rede. Resultados mostram melhores resultados do modelo de memória distribuída quando a carga de comunicação das aplicações é alta e, caso contrário, o modelo nDMA apresenta melhores resultados. Por fim, foram analisados os desempenhos dos modelos de memória durante o processo de migração de tarefas. Neste caso, os modelos de memória compartilhada e compartilhada distribuída apresentaram melhores resultados devido ao fato de que não se faz necessária o envio dos dados da aplicação nestes modelos e também devido ao menor tamanho de código se comparado com os outros modelos.
In the past few the years, embedded systems have become even more complex both on terms of hardware and software. Lately, the use of MPSoCs (Multi-Processor Systems-on-Chip) has been adopted on these systems for a better energetic and computational efficiency. Due to the use of several processing elements, Networks-on-Chip arise as better performance solutions than buses. Considering this scenario, this work performs an investigation on the impact of memory hierarchy in NoC-based MPSoCs. In this context, a new physically centralized and shared memory organization with different address spaces named nDMA was developed. This work also presents a comparison between the new memory organization and three different well-known memory hierarchy models such as distributed memory and shared and distributed shared memories that make use of a fully hardware cache coherence solution. The memory models were implemented in the SIMPLE (SIMPLE Multiprocessor Platform Environment) virtual platform. Experimental results shows a strong dependency on the application communication workload. The distributed memory model presents better results as the application communication workload is low. On the other hand, the new memory model (physically shared with different address spaces) presents better results as the application communication workload is high. There were also experiments aiming at observing the performance of the memory models in situations where the communication latency on the network is high. Results show better results of the distributed memory model when the application communication workload is high, and the nDMA model presents better results otherwise. Finally, the performance of the memory models during a task migration process were evaluated. In this case, the shared memory and distributed shared memory models presented better results due to the fact that in this case the data memory does not need to be transferred from one point to another and also due to the low size of the memory code in these cases if compared to other memory models.

APA, Harvard, Vancouver, ISO, and other styles

25

Gamatié, Abdoulaye. "Design and Analysis for Multi-Clock and Data-Intensive Applications on Multiprocessor Systems-on-Chip." Habilitation à diriger des recherches, Université des Sciences et Technologie de Lille - Lille I, 2012. http://tel.archives-ouvertes.fr/tel-00756967.

Full text

Abstract:

Avec l'intégration croissante des fonctions, les systèmes embarqués modernes deviennent très intelligents et sophistiqués. Les exemples les plus emblématiques de cette tendance sont les téléphones portables de dernière génération, qui offrent à leurs utilisateurs un large panel de services pour la communication, la musique, la vidéo, la photographie, l'accès à Internet, etc. Ces services sont réalisés au travers d'un certain nombre d'applications traitant d'énormes quantités d'informations, qualifiées d'applications de traitements intensifs de données. Ces applications sont également caractérisées par des comportements multi-horloges car elles comportent souvent des composants fonctionnant à des rythmes différents d'activations lors de l'exécution. Les systèmes embarqués ont souvent des contraintes temps réel. Par exemple, une application de traitement vidéo se voit généralement imposer des contraintes de taux ou de délai d'affichage d'images. Pour cette raison, les plates-formes d'exécution doivent souvent fournir la puissance de calcul requise. Le parallélisme joue un rôle central dans la réponse à cette attente. L'intégration de plusieurs cœurs ou processeurs sur une seule puce, menant aux systèmes multiprocesseurs sur puce (en anglais, "multiprocessor systems-on-chip - MPSoCs") est une solution-clé pour fournir aux applications des performances suffisantes, à un coût réduit en termes d'énergie pour l'exécution. Afin de trouver un bon compromis entre performance et consommation d'énergie, l'hétérogénéité des ressources est exploitée dans les MPSoC en incluant des unités de traitements aux caractéristiques variées. Typiquement, des processeurs classiques sont combinés avec des accélérateurs (unités de traitements graphiques ou accélérateurs matériels). Outre l'hétérogénéité, l'adaptativité est une autre caractéristique importante des systèmes embarqués modernes. Elle permet de gérer de manière souple les paramètres de performances en fonction des variations de l'environnement et d'une plate-forme d'exécution d'un système. Dans un tel contexte, la complexité du développement des systèmes embarqués modernes paraît évidente. Elle soulève un certain nombre de défis traités dans nos contributions, comme suit : 1) tout d'abord, puisque les MPSoC sont des systèmes distribués, comment peut-on aborder avec succès la correction de leur conception, de telle sorte que les propriétés fonctionnelles des applications multi-horloges déployées puissent être garanties ? Cela est étudié en considérant une méthodologie de distribution "correcte-par-construction" pour ces applications sur plates-formes multiprocesseurs. 2) Ensuite, pour les applications de traitement intensif de données à exécuter sur de telles plates-formes, comment peut-on aborder leur conception et leur analyse de manière adéquate, tout en tenant pleinement compte de leur caractère réactif et de leur parallélisme potentiel ? 3) Enfin, en considérant l'exécution de ces applications sur des MPSoC, comment peut-on analyser leurs propriétés non fonctionnelles (par exemple, temps d'exécution ou énergie), afin de pouvoir prédire leurs performances ? La réponse à cette question devrait alors servir à l'exploration d'espaces complexes de conception. Nos travaux visent à répondre aux trois défis ci-dessus de manière pragmatique, en adoptant une vision basée sur des modèles. Pour cela, ils considèrent deux paradigmes complémentaires de modélisation flot de données : la "modélisation polychrone" liée à l'approche synchrone réactive, et la "modélisation de structures répétitives" liée à la programmation orientée tableaux pour le parallélisme de données. Le premier paradigme permet de raisonner sur des systèmes multi-horloges dans lesquels les composants interagissent, sans supposer l'existence d'une horloge de référence. Le second paradigme est quant à lui suffisamment expressif pour permettre la spécification du parallélisme massif d'un système.

APA, Harvard, Vancouver, ISO, and other styles

26

Castro, Eberval Oliveira. "Multiprocessador em eletronica reconfiguravel para aplicações roboticas." [s.n.], 2007. http://repositorio.unicamp.br/jspui/handle/REPOSIP/259583.

Full text

Abstract:

Orientador: Marconi Kolm Madrid
Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação
Made available in DSpace on 2018-08-10T03:57:30Z (GMT). No. of bitstreams: 1 Castro_EbervalOliveira_M.pdf: 4698124 bytes, checksum: 0a0a438cbad7212bdba90f2c96875871 (MD5) Previous issue date: 2007
Resumo: A solução de modelos dinâmicos de robôs em tempo real é um dos principais desafios da robótica. Este trabalho propõe um multiprocessador de quatro núcleos fortemente acoplados, o SMM-4 (Sistema Multiprocessado Monolítico), consistindo de uma arquitetura de processamento paralelo monolítica sintetizada em FPGA para aplicações em controle de sistemas robóticos. Uma análise quantitativa e qualitativa é realizada em contraste a sistemas uniprocessadores, evidenciando os ganhos obtidos através desta abordagem em FPGA. O SMM-4 foi desenvolvido no Laboratório de Sistemas Modulares Robóticos (LSMR/Unicamp) como uma das alternativas para o cálculo das equações dos modelos de robôs em tempo real
Abstract: The solution of robots¿ dynamic models in real-time is one of major challenges of the robotics. This work presents a strongly coupled quad-core multiprocessor ¿ the MMS-4 (Monolithic Multiprocessor System) ¿ consisting of a monolithical parallel processing architecture synthesized on FPGA for applications on robotic control systems. A quantitative and qualitative analysis is performed in contrast with uniprocessor systems for the purpose of evince the benefits obtained choosing this approach in FPGA. The MMS-4 was developed at Robotic Modular Systems Laboratory (LSMR/Unicamp) as an alternative to calculate the equations systems of robots¿ models on real-time
Mestrado
Automação
Mestre em Engenharia Elétrica

APA, Harvard, Vancouver, ISO, and other styles

27

Legout, Vincent. "Ordonnancement temps réel multiprocesseur pour la réduction de la consommation énergétique des systèmes embarqués." Thesis, Paris, ENST, 2014. http://www.theses.fr/2014ENST0019/document.

Full text

Abstract:

Réduire la consommation énergétique des systèmes temps réel embarqués multiprocesseurs est devenu un enjeu important notammentpour augmenter leur autonomie. Nous réduisons la consommation statique des processeurs en exploitant leurs états basseconsommation. Dans un état basse-consommation, la consommation énergétique est fortement réduite mais un délai de transition et une pénalité sont nécessaires pour revenir à l'état actif. Nous proposons dans cette thèse les premiers algorithmes d'ordonnancement tempsréel multiprocesseurs optimaux pour réduire la consommation énergétique des systèmes temps réel dur et des systèmes temps réel àcriticité mixte. Ces algorithmes d'ordonnancement permettent d'activer les état basse-consommation les plus économes en énergie.Chaque algorithme d'ordonnancement est divisé en deux parties. La première partie hors-ligne génère un ordonnancement en utilisant laprogrammation linéaire en nombres entiers pour minimiser la consommation énergétique. La seconde partie est en-ligne et augmente lataille des périodes d'inactivité les tâches terminent leur exécution plus tôt que prévu. Dans le cadre des systèmes temps réel à criticitémixte, nous profitons du fait que les tâches de plus faible criticité peuvent tolérer des dépassements d'échéances pour être plus agressifhors-ligne afin de réduire davantage la consommation énergétique. Les résultats montrent que les algorithmes proposés utilisent demanière plus efficace les états basse-consommation. La consommation énergétique lorsque ceux-ci sont activés est en effet jusqu'à dix fois plus faible qu'avec les algorithmes d'ordonnancement multiprocesseurs existants
Reducing the energy consumption of multiprocessor real-time embedded systems is a growing concern to increase their autonomy. In thisthesis, we aim to reduce the energy consumption of the processors, it includes both static and dynamic consumption and it is nowdominated by static consumption as the semiconductor technology moves to deep sub-micron scale. Existing solutions mainly focused ondynamic consumption. On the other hand, we target static consumption by efficiently using the low-power states of the processors. In alow-power state, the processor is not active and the deeper the low-power state is, the lower is the energy consumption but the higher isthe transition delay to come back to the active state. In this thesis, we propose the first optimal multiprocessor real-time schedulingalgorithms minimizing the static energy consumption. They optimize the duration of the idle periods to activate the most appropriate lowpowerstates. We target hard real-time systems with periodic tasks and also mixed-criticality systems where tasks with lower criticalitiescan tolerate deadline misses, therefore allowing us to be more aggressive while trying to reduce the energy consumption. We use anadditional task to model the idle time and mixed integer linear programming to compute offline a schedule minimizing the energyconsumption. Evaluations have been performed using existing optimal multiprocessor real-time scheduling algorithms. Results show thatthe energy consumption while processors are idle is up to ten times reduced with our solutions compared to the existing multiprocessor real-time scheduling algorithms

APA, Harvard, Vancouver, ISO, and other styles

28

Kianzad, Vida. "System synthesis for embedded multiprocessors." College Park, Md. : University of Maryland, 2006. http://hdl.handle.net/1903/3471.

Full text

Abstract:

Thesis (Ph. D.) -- University of Maryland, College Park, 2006.
Thesis research directed by: Electrical Engineering. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.

APA, Harvard, Vancouver, ISO, and other styles

29

Cordovilla, Mesonero Mikel. "Environnement de développement d’applications multipériodiques sur plateforme multicoeur. : La boîte à outils SchedMCore." Thesis, Toulouse, ISAE, 2012. http://www.theses.fr/2012ESAE0011/document.

Full text

Abstract:

Les logiciels embarqués critiques de contrôle-commande sont soumis à des contraintes fortes englobant le déterminisme, la correction logique et la correction temporelle. Nous supposons que les spécifications sont exprimées à l'aide du langage formel de description d'architectures logicielles temps réel multipériodiques Prelude. L'objectif de cette thèse est, à partir d'un programme Prelude ou d'un ensemble de tâches temps réel dépendantes, de générer un code multithreadé exécutable sur une architecture multicœur tout en respectant la sémantique initiale. Pour cela, nous avons développé une boîte à outil, SchedMCore,permettant : - d'une part, la vérification formelle de l'ordonnançabilité. La vérification proposée est basée sur le parcours exhaustif du comportement avec pas de temps discret. Il est alors possible d'analyser des politiques en-ligne (FP, gEDF, gLLF et LLREF) mais également de calculer une affectation de priorité fixe valide et une séquence valide hors-ligne.- d'autre part, l'exécution multithreadée sur une cible multicœur. L'exécutif encode les politiques proposées étudiées dans la partie d'analyse d'ordonnançabilité, à savoir les quatre politiques en-ligne ainsi que les séquences valides générées. L'exécutif permet 3 modes d'utilisation, allant de la simulation temporelle à l'exécution temps précis des comportements des tâches. Il est compatible Posix et facilement portable sur divers OS
A real-time control-command embedded system is subject to strong constraints such as determinism, logical and temporal correctness. We assume that the specifications are expressed using the formal software architecture description language Prelude, dedicated to real-time multiperiodic applications. The goal of this thesis is, given a Prelude program or dependent real-time taskset, to generate amultithreaded executable code over a multicore architecture while respecting the original semantic. To do so we have developed a toolbox, SchedMcore, that allows: - the formal verification of schedulability. The verification is based on the exhaustive exploration of the behaviour with a discret time frame. It is possible to analyse on-line policies (FP, gEDF, gLLF et LLREF), as well as to compute a fixed valid priority assignment and a valid off-line sequence.- the multithreaded execution over a multicore target. The framework encodes the same policies as those studied in the first part (the four on-line policies and the generated sequences). The framework provides three usage modes, from temporal simulation to time accurate execution. The executive is compatible with Posix and easily portable on several OS

APA, Harvard, Vancouver, ISO, and other styles

30

Afshar, Sara. "Lock-Based Resource Sharing for Real-Time Multiprocessors." Doctoral thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-37215.

Full text

Abstract:

Embedded systems are widely used in the industry and are typically resource constrained, i.e., resources such as processors, I/O devices, shared buffers or shared memory might be limited in the system. Hence, techniques that can enable an efficient usage of processor bandwidths in such systems are of great importance. Locked-based resource sharing protocols are proposed as a solution to overcome resource limitation by allowing the available resources in the system to be safely shared. In recent years, due to a dramatic enhancement in the functionality of systems, a shift from single-core processors to multi-core processors has become inevitable from an industrial perspective to tackle the raised challenges due to increased system complexity. However, the resource sharing protocols are not fully mature for multi-core processors. The two classical multi-core processor resource sharing protocols, spin-based and suspension-based protocols, although providing mutually exclusive access to resources, can introduce long blocking delays to tasks, which may be unacceptable for many industrial applications. In this thesis we enhance the performance of resource sharing protocols for partitioned scheduling, which is the de-facto scheduling standard for industrial real-time multi-core processor systems such as in AUTOSAR, in terms of timing and memory requirements. A new scheduling approach uses a resource efficient hybrid approach combining both partitioned and global scheduling where the partitioned scheduling is used to schedule the major number of tasks in the system. In such a scheduling approach applications with critical task sets use partitioned scheduling to achieve higher level of predictability. Then the unused bandwidth on each core that is remained from partitioning is used to schedule less critical task sets using global scheduling to achieve higher system utilization. These scheduling schema however lacks a proper resource sharing protocol since the existing protocols designed for partitioned and global scheduling cannot be directly applied due to the complex hybrid structure of these scheduling frameworks. In this thesis we propose a resource sharing solution for such a complex structure. Further, we provide the blocking bounds incurred to tasks under the proposed protocols and enhance the schedulability analysis, which is an essential requirement for real-time systems, with the provided blocking bounds.

APA, Harvard, Vancouver, ISO, and other styles

31

Aguiar, Alexandra da Costa Pinto de. "On the virtualization of multiprocessed embedded systems." Pontif?cia Universidade Cat?lica do Rio Grande do Sul, 2013. http://tede2.pucrs.br/tede2/handle/tede/5252.

Full text

Abstract:

Made available in DSpace on 2015-04-14T14:50:11Z (GMT). No. of bitstreams: 1 458137.pdf: 2745165 bytes, checksum: e05abd1f1e63fc82908d29186a3b9ee2 (MD5) Previous issue date: 2013-08-30
Virtualization has become a hot topic in embedded systems for both academia and industry development. Among its main advantages, we can highlight (i) software design quality; (ii) security levels of the system; (iii) software reuse, and; (iv) hardware utilization. However, it still presents constraints that have lessened the excitement towards itself, since the greater concerns are its implicit overhead and whether it is worthy or not. Thus, we discuss matters related to virtualization in embedded systems and study alternatives to multiprocessed MIPS architecture to support virtualization.
Virtualiza??o surgiu como novidade em sistemas embarcados tanto no meio acad?mico quanto para o desenvolvimento na ind?stria. Entre suas principais vantagens, pode-se destacar aumento: (i) na qualidade de projeto de software; (ii) nos n?veis de seguran?a do sistema; (iii) nos ?ndices de reuso de software, e; (iv) na utiliza??o de hardware. No entanto, ainda existem problemas que diminu?ram o entusiasmo com rela??o ao seu uso, j? que existe um overhead impl?cito que pode impossibilitar seu uso. Assim, este trabalho discute as quest?es relacionadas ao uso de virtualiza??o em sistemas embarcados e apresenta estudos voltados para que arquiteturas MIPS multiprocessadas tenham suporte ? virtualiza??o.

APA, Harvard, Vancouver, ISO, and other styles

32

Aguiar, Alexandra da Costa Pinto de. "On the virtualization of multiprocessed embedded systems." Pontifícia Universidade Católica do Rio Grande do Sul, 2014. http://hdl.handle.net/10923/5855.

Full text

Abstract:

Made available in DSpace on 2014-05-21T02:01:43Z (GMT). No. of bitstreams: 1 000458137-Texto+Completo-0.pdf: 2745165 bytes, checksum: e05abd1f1e63fc82908d29186a3b9ee2 (MD5) Previous issue date: 2014
Virtualization has become a hot topic in embedded systems for both academia and industry development. Among its main advantages, we can highlight (i) software design quality; (ii) security levels of the system; (iii) software reuse, and; (iv) hardware utilization. However, it still presents constraints that have lessened the excitement towards itself, since the greater concerns are its implicit overhead and whether it is worthy or not. Thus, we discuss matters related to virtualization in embedded systems and study alternatives to multiprocessed MIPS architecture to support virtualization.
Virtualização surgiu como novidade em sistemas embarcados tanto no meio acadêmico quanto para o desenvolvimento na indústria. Entre suas principais vantagens, pode-se destacar aumento: (i) na qualidade de projeto de software; (ii) nos níveis de segurança do sistema; (iii) nos índices de reuso de software, e; (iv) na utilização de hardware. No entanto, ainda existem problemas que diminuíram o entusiasmo com relação ao seu uso, já que existe um overhead implícito que pode impossibilitar seu uso. Assim, este trabalho discute as questões relacionadas ao uso de virtualização em sistemas embarcados e apresenta estudos voltados para que arquiteturas MIPS multiprocessadas tenham suporte à virtualização.

APA, Harvard, Vancouver, ISO, and other styles

33

Wei, Kiong Chin. "Communication interfaces for a distributed embedded multiprocessor system." Thesis, Nottingham Trent University, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.443329.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Coutinho, André Tavares. "DRM analysis using a simulator of multiprocessor embedded system." Master's thesis, Universidade de Aveiro, 2008. http://hdl.handle.net/10773/1950.

Full text

Abstract:

Mestrado em Engenharia Electrónica e Telecomunicações
Os sistemas multiprocessador são uma tecnologia emergente. O projecto Hijdra, que está a ser desenvolvido na “NXP semiconductors Research” é um sistema multiprocessador de tempo real que corre aplicações com constrangimentos do tipo “hard” e “soft”. Nestes sistemas, os processadores comunicam através de uma rede de silício. As aplicações que correm no sistema multiprocessador consistem em múltiplas tarefas que correm em processadores embutidos. Achar soluções para o mapeamento das tarefas é o maior problema destes sistemas. Uma aplicação para este sistema que tem vindo a ser estudada é o “Car Radio”. Esta dissertação diz respeito a uma aplicação de rádio digital (DRM) na arquitectura Hijdra. Neste contexto, uma aplicação de um receptor de DRM foi estudada. Um modelo de análise de “Data Flow” foi extraído a partir da aplicação, foi estudada a latência introduzida na rede de silício pela introdução de um novo processador (acelerador de Viterbi) e foi estudada a possibilidade do mapeamento das várias tarefas da aplicação em diferentes processadores a correr em paralelo. Muitas estratégias ainda ficaram por definir a fim de optimizar o desempenho da aplicação do receptor de DRM de modo a esta poder trabalhar de uma forma mais eficaz. ABSTRACT: Multiprocessor systems are an emerging technology. The Hijdra project being developed at NXP semiconductors Research is a multiprocessor system running with both hard and soft real time streaming media jobs. These jobs consist of multiple tasks running on embedded multiprocessors. Finding good solutions for job mapping is the main problem of these systems. One application which has being studied for Hijdra is the “Car Radio”. This thesis concerns the study of a digital radio receptor application (DRM) in Hijdra architecture. In this context, a data flow model of analysis was extracted from the application, the latency introduced by the addition of a new tile (Viterbi accelerator) and eventual speed gains were studied and the possibility of mapping the different tasks of the application in different processors was foreseen. Many strategies were yet to be defined in order to optimize the application performance so it can work more effectively in the multiprocessor system.

APA, Harvard, Vancouver, ISO, and other styles

35

Bérard-Deroche, Émilie. "Distribution d'une architecture modulaire intégrée dans un contexte hélicoptère." Phd thesis, Toulouse, INPT, 2017. http://oatao.univ-toulouse.fr/19923/1/BERARD_DEROCHE_Emilie.pdf.

Full text

Abstract:

Les architectures modulaires intégrées (IMA) sont une évolution majeure de l'architecture des systèmes avioniques. Elles permettent à plusieurs systèmes de se partager des ressources matérielles sans interférer dans leur fonctionnement grâce à un partitionnement spatial (zones mémoires prédéfinies) et temporel (ordonnancement statique) dans les processeurs ainsi qu'une réservation des ressources sur les réseaux empruntés. Ces allocations statiques permettent de vérifier le déterminisme général des différents systèmes: chaque système doit respecter des exigences de bout-en-bout dans une architecture asynchrone. Une étude pire cas permet d'évaluer les situations amenant aux limites du système et de vérifier que les exigences de bouten- bout sont satisfaites dans tous les cas. Les architectures IMA utilisés dans les avions centralisent physiquement des modules de calcul puissants dans des baies avioniques. Dans le cadre d'une étude de cas hélicoptère, ces baies ne sont pas envisageables pour des raisons d'encombrement: des processeurs moins puissants, utilisés à plus de 80%, composent ces architectures. Pour ajouter de nouvelles fonctionnalités ainsi que de nouveaux équipements, le souhait est de distribuer la puissance de traitement sur un plus grand nombre de processeurs dans le cadre d'une architecture globale asynchrone. Deux problématiques fortes ont été mises en avant tout au long de cette thèse. La première est la répartition des fonctions avioniques associée à une contrainte d'ordonnancement hors-ligne sur les différents processeurs. La deuxième est la satisfaction des exigences de communication de bout-en-bout, dépendantes de l'allocation et l'ordonnancement des fonctions ainsi que des latences de communication sur les réseaux. La contribution majeure de cette thèse est la recherche d'un compromis entre la distribution des architectures IMA sur un plus grand nombre de processeurs et la satisfaction des exigences de communication de bout-en-bout. Nous répondons à cet enjeu de la manière suivante: - Nous formalisons dans un premier temps un modèle de partitions communicantes tenant en compte des contraintes d'allocation et d'ordonnancement des partitions d'une part et des contraintes de communication de bout-en-bout entre partitions d'autre part. - Nous présentons dans un deuxième temps une recherche exhaustive des architectures valides. Nous proposons l'allocation successive des fonctions avioniques en considérant au même niveau la problématique d'ordonnancement et la satisfaction des exigences de bout-en-bout avec des latences de communication figées. Cette méthode itérative permet de construire des allocations de partitions partiellement valides. La construction des ordonnancements dans chacun des processeurs est cependant une démarche coûteuse dans le cadre d'une recherche exhaustive. - Nous avons conçu dans un troisième temps une heuristique gloutonne pour réduire l'espace de recherche associé aux ordonnancements. Elle permet de répondre aux enjeux de distribution d'une architecture IMA dans un contexte hélicoptère. - Nous nous intéressons dans un quatrième temps à l'impact des latences de communication de bout-en-bout sur des architectures distribuées données. Nous proposons pour celles-ci les choix de réseaux basés sur les latences de communication admissibles entre les différentes fonctions avioniques. Les méthodes que nous proposons répondent au besoin industriel de l'étude de cas hélicoptère, ainsi qu'à celui de systèmes de plus grande taille.

APA, Harvard, Vancouver, ISO, and other styles

36

Robino, Francesco. "A model-based design approach for heterogeneous NoC-based MPSoCs on FPGA." Licentiate thesis, KTH, Elektroniksystem, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-145521.

Full text

Abstract:

Network-on-chip (NoC) based multi-processor systems-on-chip (MPSoCs) are promising candidates for future multi-processor embedded platforms, which are expected to be composed of hundreds of heterogeneous processing elements (PEs) to potentially provide high performances. However, together with the performances, the systems complexity will increase, and new high level design techniques will be needed to efficiently model, simulate, debug and synthesize them. System-level design (SLD) is considered to be the next frontier in electronic design automation (EDA). It enables the description of embedded systems in terms of abstract functions and interconnected blocks. A promising complementary approach to SLD is the use of models of computation (MoCs) to formally describe the execution semantics of functions and blocks through a set of rules. However, also when this formalization is used, there is no clear way to synthesize system-level models into software (SW) and hardware (HW) towards a NoC-based MPSoC implementation, i.e., there is a lack of system design automation (SDA) techniques to rapidly synthesize and prototype system-level models onto heterogeneous NoC-based MPSoCs. In addition, many of the proposed solutions require large overhead in terms of SW components and memory requirements, resulting in complex and customized multi-processor platforms. In order to tackle the problem, a novel model-based SDA flow has been developed as part of the thesis. It starts from a system-level specification, where functions execute according to the synchronous MoC, and then it can rapidly prototype the system onto an FPGA configured as an heterogeneous NoC-based MPSoC. In the first part of the thesis the HeartBeat model is proposed as a model-based technique which fills the abstraction gap between the abstract system-level representation and its implementation on the multiprocessor prototype. Then details are provided to describe how this technique is automated to rapidly prototype the modeled system on a flexible platform, permitting to adjust the system specification until the designer is satisfied with the results. Finally, the proposed SDA technique is improved defining a methodology to automatically explore possible design alternatives for the modeled system to be implemented on a heterogeneous NoC-based MPSoC. The goal of the exploration is to find an implementation satisfying the designer's requirements, which can be integrated in the proposed SDA flow. Through the proposed SDA flow, the designer is relieved from implementation details and the design time of systems targeting heterogeneous NoC-based MPSoCs on FPGA is significantly reduced. In addition, it reduces possible design errors proposing a completely automated technique for fast prototyping. Compared to other SDA flows, the proposed technique targets a bare-metal solution, avoiding the use of an operating system (OS). This reduces the memory requirements on the FPGA platform comparing to related work targeting MPSoC on FPGA. At the same time, the performance (throughput) of the modeled applications can be increased when the number of processors of the target platform is increased. This is shown through a wide set of case studies implemented on FPGA.

QC 20140609

APA, Harvard, Vancouver, ISO, and other styles

37

Cheung, Chun Shing. "MPSoC simulation and implementation of KPN applications." Diss., [Riverside, Calif.] : University of California, Riverside, 2009. http://proquest.umi.com/pqdweb?index=0&did=1953733101&SrchMode=2&sid=1&Fmt=2&VInst=PROD&VType=PQD&RQT=309&VName=PQD&TS=1268328981&clientId=48051.

Full text

Abstract:

Thesis (Ph. D.)--University of California, Riverside, 2009.
Includes abstract. Title from first page of PDF file (viewed March 8, 2010). Available via ProQuest Digital Dissertations. Includes bibliographical references (p. 123-137). Also issued in print.

APA, Harvard, Vancouver, ISO, and other styles

38

Shee, Seng Lin Computer Science &amp Engineering Faculty of Engineering UNSW. "ADAPT : architectural and design exploration for application specific instruction-set processor technologies." Awarded by:University of New South Wales, 2007. http://handle.unsw.edu.au/1959.4/35404.

Full text

Abstract:

This thesis presents design automation methodologies for extensible processor platforms in application specific domains. The work presents first a single processor approach for customization; a methodology that can rapidly create different processor configurations by the removal of unused instructions sets from the architecture. A profile directed approach is used to identify frequently used instructions and to eliminate unused opcodes from the available instruction pool. A coprocessor approach is next explored to create an SoC (System-on-Chip) to speedup the application while reducing energy consumption. Loops in applications are identified and accelerated by tightly coupling a coprocessor to an ASIP (Application Specific Instruction-set Processor). Latency hiding is used to exploit the parallelism provided by this architecture. A case study has been performed on a JPEG encoding algorithm; comparing two different coprocessor approaches: a high-level synthesis approach and our custom coprocessor approach. The thesis concludes by introducing a heterogenous multi-processor system using ASIPs as processing entities in a pipeline configuration. The problem of mapping each algorithmic stage in the system to an ASIP configuration is formulated. We proposed an estimation technique to calculate runtimes of the configured multiprocessor system without running cycle-accurate simulations, which could take a significant amount of time. We present two heuristics to efficiently search the design space of a pipeline-based multi ASIP system and compare the results against an exhaustive approach. In our first approach, we show that, on average, processor size can be reduced by 30%, energy consumption by 24%, while performance is improved by 24%. In the coprocessor approach, compared with the use of a main processor alone, a loop performance improvement of 2.57x is achieved using the custom coprocessor approach, as against 1.58x for the high level synthesis method, and 1.33x for the customized instruction approach. Energy savings are 57%, 28% and 19%, respectively. Our multiprocessor design provides a performance improvement of at least 4.03x for JPEG and 3.31x for MP3, for a single processor design system. The minimum cost obtained using our heuristic was within 0.43% and 0.29% of the optimum values for the JPEG and MP3 benchmarks respectively.

APA, Harvard, Vancouver, ISO, and other styles

39

Huang, Jia [Verfasser], Alois [Akademischer Betreuer] Knoll, and Petru [Akademischer Betreuer] Eles. "Towards an Integrated Framework for Reliability-Aware Embedded System Design on Multiprocessor System-on-Chips / Jia Huang. Gutachter: Alois Knoll ; Petru Eles. Betreuer: Alois Knoll." München : Universitätsbibliothek der TU München, 2014. http://d-nb.info/1063724333/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Georgiev, Kiril. "Débogage des systèmes embarqués multiprocesseur basé sur la ré-exécution déterministe et partielle." Thesis, Grenoble, 2012. http://www.theses.fr/2012GRENM086/document.

Full text

Abstract:

Les plates-formes MPSoC permettent de satisfaire les contraintes de performance, de flexibilité et de consommation énergétique requises par les systèmes embarqués émergents. Elles intègrent un nombre important de processeurs, des blocs de mémoire et des périphériques, hiérarchiquement organisés par un réseau d'interconnexion. Le développement du logiciel est réputé difficile, notamment dû à la gestion d'un grand nombre d'entités (tâches/threads/processus). L'exécution concurrente de ces entités permet d'exploiter efficacement l'architecture mais complexifie le processus de mise au point et notamment l'analyse des erreurs. D'une part, les exécutions peuvent être non-déterministes notamment dû à la concurrence, c'est à dire qu'elles peuvent se dérouler d'une manière différente à chaque reprise. En conséquence, il n'est pas garanti qu'une erreur se produirait durant la phase de mise au point. D'autre part, la complexité de l'architecture et de l'exécution peut rendre trop important le nombre d'éléments à analyser afin d'identifier une erreur. Il pourrait donc être difficile de se focaliser sur des éléments potentiellement fautifs. Un des défis majeurs du développement logiciel MPSoC est donc de réduire le temps de la mise au point. Dans cette thèse, nous proposons une méthodologie de mise au point qui aide le développeur à identifier les erreurs dans le logiciel MPSoC. Notre premier objectif est de déboguer une même exécution plusieurs fois afin d'analyser des sources potentielles de l'erreur jusqu'à son identification. Nous avons donc identifié les sources de non-déterminisme MPSoC et proposé des mécanismes de ré-exécution déterministe les plus adaptés. Notre deuxième objectif vise à minimiser les ressources pour reproduire une exécution afin de satisfaire la contrainte MPSoC de maîtrise de l'intrusion. Nous avons donc utilisé des mécanismes efficaces de ré-exécution déterministe et considéré qu'une partie du comportement non-déterministe. Le troisième objectif est de permettre le passage à l'échelle, c'est à dire de déboguer des exécutions caractérisées par un nombre d'éléments de plus en plus croissant. Nous avons donc proposé une méthode qui permet de circonscrire et de déboguer qu'une partie de l'exécution. De plus, cette méthode s'applique aux différents types d'architectures et d'applications MPSoC
MPSoC platforms provide high performance, low power consumption and flexi-bility required by the emerging embedded systems. They incorporate many proces-sing units, memory blocs and peripherals, hierarchically organized by interconnec-tion network. The software development is known to be difficult, namely due to themanagement of multiple entities (tasks/threads/processes). The concurrent execu-tion of these entities allows to exploit efficiently the architecture but complicatesthe refinement process of the software and especially the debugging activity. Onthe one hand, the executions of the software can be non-deterministic, namely dueto the concurrency, i.e. they perform differently each time. Consequently, thereis no guaranties that an error will occur during the debugging activity. On theother hand, the complexity of the architecture and the execution can increase theelements to be analyzed in the debugging process. As a result, it can be difficultto concentrate on the potentially faulty elements. Therefore, one of the most im-portant challenges in the development process of MPSoC software is to reduce thetime of the refinement process.In this thesis, we propose a new methodology to refine the MPSoC softwarewhich helps the developers to do the debugging activity. Our first objective is tobe able to debug the same execution several times in order to analyze potentialsources of the error. To do so, we identified the sources of non-determinism in theMPSoC software executions and propose the most appropriate methods to recordand replay them. Our second objective is to reduce the execution overhead requi-red by the record mechanisms to limit the intrusiveness which is an importantMPSoC constraint. To accomplish this objective, we consider a part of the non-deterministic behaviour and selected efficient record-replay methods. The thirdobjective is to provide a scalable solution, i.e. to be able to debug more and morecomplex executions, characterized by an increasing number of elements. Therefore,we propose a partial replay method which allows to isolate and debug a fraction ofthe execution elements. Moreover, this method applies to different types of archi-tectures and applications MPSoC

APA, Harvard, Vancouver, ISO, and other styles

41

Pepe, Pedro Carlos Fazolino 1978. "Escalonamento dinâmico de tensão e frequência em multiprocessadores para aplicações com especificação de qualidade por taxa mínima de processamento de entradas." [s.n.], 2012. http://repositorio.unicamp.br/jspui/handle/REPOSIP/260883.

Full text

Abstract:

Orientador: Alice Maria Bastos Hubinger Tokarnia
Dissertação (mestrado) - Universidade Estadual de Campinas,Faculdade de Engenharia Elétrica e de Computação
Made available in DSpace on 2018-08-21T13:10:12Z (GMT). No. of bitstreams: 1 Pepe_PedroCarlosFazolino_M.pdf: 4573450 bytes, checksum: d2aa117fafd3213b052c1164eaabed1f (MD5) Previous issue date: 2012
Resumo: Este trabalho apresenta quatro algoritmos de escalonamento dinâmico de Tensão e Frequência (DVFS) em sistemas multiprocessador baseado em caminhos de execução. Nossos alvos são aplicações multimídia executadas em sistemas embarcados, com especificação de qualidade por taxa mínima de entradas (QoS) processadas. Uma fração mínima de entradas, geralmente quadros de dados, precisa ser completamente processada no tempo máximo de resposta especificado. O objetivo dos algoritmos é atuar em quatro cenários que correspondem a sistemas com diferentes possibilidades de escalonamento dinâmico de tensão e frequência e diferentes capacidades de monitoramento da qualidade de serviço. No primeiro cenário, todos os pacotes de dados de entrada recebidos devem ser processados dentro do tempo máximo especificado e o nível de tensão/frequência pode ser ajustado no início da execução da aplicação, sendo o mesmo para todos os processadores. Este cenário é referência para comparação de resultados para os outros cenários. Para o segundo cenário, o nível de tensão/frequência pode ser definido individualmente para um processador, no início da execução de cada tarefa, e dados de entrada de classes específicas podem ser descartados. O terceiro cenário possibilita, além do descarte de classes específicas de dados de entrada, o ajuste do nível de tensão/frequência de cada tarefa de acordo com a classe de dados de entrada a ser processada. O algoritmo desenvolvido para o quarto cenário trata dinamicamente de alterações na distribuição probabilística das classes de entrada, calculando novos níveis de tensão/frequência para as tarefas e classes de entrada de modo que a especificação de qualidade continue a ser satisfeita, de forma eficiente. Para uma aplicação de cancelamento de eco acústico, executada em 4 processadores, com taxa mínima de processamento igual a 50%, o algoritmo de escalonamento de tensão e frequência, no cenário 3, conseguiu reduzir o consumo de energia em cerca de 71%, comparado ao cenário 1. No cenário 4, simulamos para esta aplicação uma modificação simultânea de 10 pontos percentuais na distribuição das classes de entrada em 3 tarefas causando aumentos do número de descartes. O algoritmo proposto para o cenário 4 manteve a qualidade mínima com um aumento de apenas 6% no consumo de energia, quando comparado ao consumo de energia da configuração inicial definida para o cenário 3
Abstract: This work presents four execution-path based Dynamic Voltage/Frequency Scaling (DVFS) algorithms for multiprocessor systems. The targets are embedded systems multimedia applications, with minimum input data completion rate specification (QoS). A minimum fraction of input data, usually data frames, should be processed within the specified deadline. These algorithms aim to operate in four scenarios corresponding to systems with different possibilities of dynamic voltage and frequency scheduling and different QoS monitoring capabilities. In the first scenario, all received data frames should be treated within the deadline and the voltage/frequency operational level can be adjusted at the beginning of the application execution, and must be the same for all processors. This scenario is a reference for comparison of results obtained for the other scenarios. For the second scenario, the voltage/frequency operational level can be set individually for each processor at the beginning of each task execution, and input data frames of specific input classes can be discarded. The third scenario allows, besides discarding specific classes of input data, it is possible to adjust the operation level for each task, according to the class of the input data to be treated. The algorithm for the fourth scenario operates online, computing new voltage/frequency levels and making new decisions about class discarding to cope with changes in probability distribution of input classes. Its goal is to maintain the specified quality with low energy consumption. In an application of acoustic echo cancellation running on a system with 4 processors, with a rate of inputs completely processed specified as 50%, the algorithm for scenario 3 achieved a reduction in consumption close to 71%, comparing to the results for scenario 1. During simulation, this application has been subjected to simultaneous changes of 10% in the input class distributions of three discarding tasks, reducing system quality. The algorithm for scenario 4, maintained the minimum quality with just 6% increase in power consumption, when compared to the consumption of the initial configuration for scenario 3
Mestrado
Engenharia de Computação
Mestre em Engenharia Elétrica

APA, Harvard, Vancouver, ISO, and other styles

42

Dardaillon, Mickaël. "Compilation d'applications flot de données paramétriques pour MPSoC dédiés à la radio logicielle." Thesis, Lyon, INSA, 2014. http://www.theses.fr/2014ISAL0102/document.

Full text

Abstract:

Le développement de la radio logicielle fait suite à l’évolution rapide du domaine des télécommunications. Les besoins en performance et en dynamicité ont donné naissance à des MPSoC dédiés à la radio logicielle. La spécialisation de ces MPSoC rend cependant leur pro- grammation et leur vérification complexes. Des travaux proposent d’atténuer cette complexité par l’utilisation de paradigmes tels que le modèle de calcul flot de données. Parallèlement, le besoin de modèles flexibles et vérifiables a mené au développement de nouveaux modèles flot de données paramétriques. Dans cette thèse, j’étudie la compilation d’applications utilisant un modèle de calcul flot de données paramétrique et ciblant des plateformes de radio logicielle. Après un état de l’art du matériel et logiciel du domaine, je propose un raffinement de l’ordonnancement flot de données, et présente son application à la vérification des tailles mémoires. Ensuite, j’introduis un nouveau format de haut niveau pour définir le graphe et les acteurs flot de données, ainsi que le flot de compilation associé. J’applique ces concepts à la génération de code optimisé pour la plateforme de radio logicielle Magali. La compilation de parties du protocole LTE permet d’évaluer les performances du flot de compilation proposé
The emergence of software-defined radio follows the rapidly evolving telecommunication domain. The requirements in both performance and dynamicity has engendered software- defined-radio-dedicated MPSoCs. Specialization of these MPSoCs make them difficult to program and verify. Dataflow models of computation have been suggested as a way to mi- tigate this complexity. Moreover, the need for flexible yet verifiable models has led to the development of new parametric dataflow models. In this thesis, I study the compilation of parametric dataflow applications targeting software-defined-radio platforms. After a hardware and software state of the art in this field, I propose a new refinement of dataflow scheduling, and outline its application to buffer size’s verification. Then, I introduce a new high-level format to define dataflow actors and graph, with the associated compilation flow. I apply these concepts to optimised code generation for the Magali software-defined-radio platform. Compilation of parts of the LTE protocol are used to evaluate the performances of the proposed compilation flow

APA, Harvard, Vancouver, ISO, and other styles

43

Bodin, Bruno. "Analyse d'Applications Flot de Données pour la Compilation Multiprocesseur." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2013. http://tel.archives-ouvertes.fr/tel-00922578.

Full text

Abstract:

Les systèmes embarqués sont des équipements électroniques et informatiques, soumis à de nombreuses contraintes et dont le fonctionnement doit être continu. Pour définir le comportement de ces systèmes, les modèles de programmation dataflows sont souvent utilisés. Ce choix de modèle est motivé d'une part, parce qu'ils permettent de décrire un comportement cyclique, nécessaire aux systèmes embarqués ; et d'autre part, parce que ces modèles s'apprêtent à des analyses qui peuvent fournir des garanties de fonctionnement et de performance essentielles. La société Kalray propose une architecture embarquée, le MPPA. Il est accompagné du langage de programmation ΣC. Ce langage permet alors de décrire des applications sous forme d'un modèle dataflow déjà très étudié, le modèle Cyclo-Static Dataflow Graph(CSDFG). Cependant, les CSDFG générés par ce langage sont souvent trop complexes pour permettre l'utilisation des techniques d'analyse existantes. L'objectif de cette thèse est de fournir des outils algorithmiques qui résolvent les différentes étapes d'analyse nécessaires à l'étude d'une application ΣC, mais dans un temps d'exécution raisonnable, et sur des instances de grande taille. Nous étudions trois problèmes d'analyse distincts : le test de vivacité, l'évaluation du débit maximal, et le dimensionnement mémoire. Pour chacun de ces problèmes, nous fournissons des méthodes algorithmiques rapides, et dont l'efficacité a été vérifiée expérimentalement. Les méthodes que nous proposons sont issues de résultats sur les ordonnancements périodiques ; elles fournissent des résultats approchés et sans aucune garantie de performance. Pour pallier cette faiblesse, nous proposons aussi de nouveaux outils d'analyse basés sur les ordonnancements K-périodiques. Ces ordonnancements généralisent nos travaux d'ordonnancement périodiques et nous permettrons dans un avenir proche de concevoir des méthodes d'analyse bien plus efficaces.

APA, Harvard, Vancouver, ISO, and other styles

44

Brandão, Jesse Wayde. "Analysis of the truncated response model for fixed priority on HMPSoCs." Master's thesis, Universidade de Aveiro, 2014. http://hdl.handle.net/10773/14836.

Full text

Abstract:

Mestrado em Engenharia Electrónica e Telecomunicações
With the ever more ubiquitous nature of embedded systems and their increasingly demanding applications, such as audio/video decoding and networking, the popularity of MultiProcessor Systems-on-Chip (MPSoCs) continues to increase. As such, their modern uses often involve the execution of multiple applications on the same system. Embedded systems often have applications that are faced with timing restrictions, some of which are deadlines, throughput and latency. The resources available to the applications running on these systems are nite and, therefore, applications need to share the available resources while guaranteeing that their timing requirements are met. These guarantees are established via schedulers which may employ some of the many techniques devised for the arbitration of resource usage among applications. The main technique considered in this dissertation is the Preemptive Fixed Priority (PFP) scheduling technique. Also, there is a growing trend in the usage of the data ow computational model for analysis of applications on MultiProcessor System-on-Chips (MPSoCs). Data ow graphs are functionally intuitive, and have interesting and useful analytical properties. This dissertation intends to further previous work done in temporal analysis of PFP scheduling of Real-Time applications on MPSoCs by implementing the truncated response model for PFP scheduling and analyzing the its results. This response model promises tighter bounds for the worst case response times of the actors in a low priority data ow graph by considering the worst case response times over consecutive rings of an actor rather than just a single ring. As a follow up to this work, we also introduce in this dissertation a burst analysis technique for actors in a data ow graph.
Com a natureza cada vez mais ubíqua de sistemas embutidos e as suas aplicações cada vez mais exigentes, como a decodicação de áudio/video e rede, a popularidade de MultiProcessor Systems-on-Chip (MPSoCs) continua a aumentar. Como tal, os seus usos modernos muitas vezes envolvem a execução de várias aplicações no mesmo sistema. Sistemas embutidos, frequentemente correm aplicações que são confrontadas com restrições temporais, algumas das quais são prazos, taxa de transferência e latência. Os recursos disponíveis para as aplicações que estão a correr nestes sistemas são finitos e, portanto, as aplicações necessitam de partilhar os recursos disponíveis, garantindo simultaneamente que os seus requisitos temporais sejam satisfeitos. Estas garantias são estabelecidas por meio escalonadores que podem empregar algumas das muitas técnicas elaboradas para a arbitragem de uso de recursos entre as aplicações. A técnica principal considerada nesta dissertação é Preemptive Fixed Priority (PFP). Além disso existe uma tendência crescente no uso do modelo computacional data flow para a análise de aplicações a correr em MPSoCs. Grafos data flow são funcionalmente intuitivos e possuem propriedades interessantes e úteis. Esta dissertação pretende avançar trabalho prévio na área de escalonamento PFP de aplicações ai implementar o modelo de resposta truncatedo para escalonamento PFP e analisar os seus resultados. Este modelo de resposta promete limites mais estritos para os tempos de resposta de pior caso para atores num grafo de baixa prioridade ao considerar os tempos de resposta de pior caso ao longo de várias execuções consecutivas de um actor em vez de uma só. Como seguimento a este trabalho, também introduzimos nesta dissertação uma técnica para a análise de execuções em rajada de atores num grafo data flow.

APA, Harvard, Vancouver, ISO, and other styles

45

Dobiás̆, Petr. "Contribution à l’ordonnancement dynamique, tolérant aux fautes, de tâches pour les systèmes embarqués temps-réel multiprocesseurs." Thesis, Rennes 1, 2020. http://www.theses.fr/2020REN1S024.

Full text

Abstract:

La thèse se focalise sur le placement et l’ordonnancement dynamique des tâches sur les systèmes embarqués multiprocesseurs pour améliorer leur fiabilité tout en tenant compte des contraintes telles que le temps réel ou l’énergie. Afin d’évaluer les performances du système, le nombre de tâches rejetées, la complexité de l’algorithme et la résilience estimée en injectant des fautes sont principalement analysés. La recherche est appliquée (i) à l’approche de « primary/backup » qui est une technique de tolérance aux fautes basée sur deux copies d’une tâche et (ii) aux algorithmes de placement pour les petits satellites appelés CubeSats. Quant à l’approche de « primary/backup », l’objectif principal est d’étudier les stratégies d’allocation des processeurs, de proposer de nouvelles méthodes d’amélioration pour l’ordonnancement et d’en choisir une qui diminue considérablement la durée de l’exécution de l’algorithme sans dégrader les performances du système. En ce qui concerne les CubeSats, l’idée est de regrouper tous les processeurs à bord et de concevoir des algorithmes d’ordonnancement afin de rendre les CubeSats plus robustes. Les scénarios provenant de deux CubeSats réels sont étudiés et les résultats montrent qu’il est inutile de considérer les systèmes ayant plus de six processeurs et que les algorithmes proposés fonctionnent bien même avec des capacités énergétiques limitées et dans un environnement hostile
The thesis is concerned with online mapping and scheduling of tasks on multiprocessor embedded systems in order to improve the reliability subject to various constraints regarding e.g. time, or energy. To evaluate system performances, the number of rejected tasks, algorithm complexity and resilience assessed by injecting faults are analysed. The research was applied to: (i) the primary/backup approach technique, which is a fault tolerant one based on two task copies, and (ii) the scheduling algorithms for small satellites called CubeSats. The chief objective for the primary/backup approach is to analyse processor allocation strategies, devise novel enhancing scheduling methods and to choose one, which significantly reduces the algorithm run-time without worsening the system performances. Regarding CubeSats, the proposed idea is to gather all processors built into satellites on one board and design scheduling algorithms to make CubeSats more robust as to the faults. Two real CubeSat scenarios are analysed and it is found that it is useless to consider systems with more than six processors and that the presented algorithms perform well in a harsh environment and with energy constraints

APA, Harvard, Vancouver, ISO, and other styles

46

Cheng, Yu-Min, and 鄭育旻. "Hardware Software Partitioning for Embedded Multiprocessor FPGA Systems." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/22g8x2.

Full text

Abstract:

碩士
國立臺北科技大學
電腦與通訊研究所
94
Multiprocessor architecture is nowadays gradually applying on embedded systems because the System-on-a-chip developed rapidly. Hardware software codesign is becoming a novel and practical solution for modern system design. The hardware software partitioning is an important step in the hardware software codesign. In this thesis, we propose a Genetic with Hardware Oriented (GHO) algorithm for hardware-software partitioning on multiprocessor embedded systems. The GHO algorithm combines Genetic algorithm (GA) and Hardware Oriented partitioning method to generate a system partitioning solution which has high performance and low memory size under satisfaction with system constraints. The system constraints for hardware-software partitioning include system execution time, cost, power consumption and number of processor. Finally, three design examples, namely a simple CDFG (Control and Data Flow Graph), Adaptive Pulse Code Modulation (ADPCM) system and Joint Photographic Experts Group (JPEG) encoding system, are used to illustrate the feasibility of our proposed GHO partitioning method. Experiment results show our purposed GHO algorithm can obtain a solution which has shortering system execution time and lesser memory used size.

APA, Harvard, Vancouver, ISO, and other styles

47

Fan, Yang-Hsin, and 范揚興. "Hardware-Software Partitioning Methodology for Embedded Multiprocessor Systems." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/pubp2v.

Full text

Abstract:

博士
國立臺北科技大學
電腦與通訊研究所
97
Embedded systems have increasingly diverse functions, as well as powerful computational capabilities and real-time services, resulting in a situation in which embedded systems with only one processor can not complete designs. Alternatively, embedded multiprocessor systems provide powerful and flexible hardware and software architecture capabilities to satisfy design requirements. However, embedded multiprocessor systems have several hardware-software partitioning problems. For instance, the significantly increasing number of hardware and software tasks is difficult to coordinate when hardware and software interact with each other. Additionally, system constraints have multiple trade-off problems, ranging from low energy dissipation to fast execution time, high slice utilization to minimal memory usage, concurrence and the number of processors. Moreover, system resources are efficiently allocated after hardware-software partitioning. Furthermore, hardware-software partitioning achieves low energy dissipation, a fast execution time, and high slice utilization. Finally, a hardware-software partitioning approach can evaluate rapidly various system constraints. This dissertation proposes a novel hardware-software partitioning methodology, Genetic and Hardware-Oriented partitioning (GHO), that can solve hardware-software partitioning problems of embedded multiprocessor systems. The GHO can determine each task to be implemented as either a hardware or software component from hundreds of thousands of hardware-software partitioning candidates. Additionally, the partitioning results of GHO can meet simultaneously the criteria of energy dissipation, execution time, memory size, slice capacity, and the number of processors. Also, GHO allocates resources efficiently for slice capacity and memory usage. Furthermore, the GHO obtains low energy dissipation, fast execution time and high slice utilization. Specifically, the GHO can rapidly assess various specifications of system constraints that enable the design of embedded multiprocessor systems to comply with time to market delivery requirements. Three real design examples, i.e. ADPCM encoder/decoder system, JPEG encoder system and Purnaprajna benchmark, demonstrate the effectiveness of the proposed GHO. Each design example is partitioned by the system constraints of execution time, slice utilization and energy dissipation, respectively. Experimental results indicate that the proposed GHO decreases execution time by an average of 39.47%, increases slice utilization by 30.73%, as well as reduces energy dissipation by 79.38%.

APA, Harvard, Vancouver, ISO, and other styles

48

Lin, Lan-Hsin, and 林藍芯. "A Novel Approach of HW/SW Partitioning for Embedded Multiprocessor Systems." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/17996773602461302920.

Full text

Abstract:

碩士
國立中正大學
資訊工程研究所
92
To speed the time-to-market cycle, the codesign of hardware and software has become one of the kernel technologies in modern embedded systems. To achieve this objective, we must develop the hardware and software concurrently and begin the software design targeting at the “virtual hardware platforms” before the hardware platform is available. This can lead to the better system design and reduce the risks that arise from the rapid changes of system specifications. An incorrect HW/SW-partitioning will result in time-consuming design and expensive optimizations of the whole system. Therefore, how to partition the system into hardware and software parts has become one of the critical issues in system level. This paper presents a novel HW/SW-partitioning approach, which targets at embedded systems consisting of multiprocessor for time, area, and power constraints. Our approach is two-fold: partitioning phase and scheduling phase. In the partitioning phase, for an embedded system with n processors, recursive spectral bisection (RSB) has been used to partition an application program into n blocks and then these blocks are mapped into software components. We try to move tasks from software components to hardware components in order to meet the deadline constraint. In the scheduling phase, we derive an approach to adapt the load in each processor by exchanging tasks between hardware and software components not only to meet the deadline constraint of the system but also to reduce the cost of the system. Finally, we conclude this paper and describe the work we will continue in the near future.

APA, Harvard, Vancouver, ISO, and other styles

49

Hu, Ching-Yuan, and 胡慶源. "Enhancement of Simulated Annealing Algorithm for Hardware-Software Partitioning on Embedded Multiprocessor FPGA Systems." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/7n2m8v.

Full text

Abstract:

碩士
國立臺北科技大學
電資碩士班
97
The trend of information product is requiring low power consumption, low cost and high performance. In this thesis, we propose an Enhancement of Simulated Annealing Algorithm (ESA) to meet the requirements of low power consumption, low cost and high performance for embedded multiprocessor FPGA system. A cost function based on simulated annealing is proposed to balance between power, cost and performance requirements. Experimental results shown that the simulation time of proposed algorithm is reduced by 64% than original simulated annealing algorithm and 115% than Genetic Algorithm. In this thesis, we use two examples, ADPCM and JPEG, to verify the performance of proposed algorithm. If we use Enhancement of Simulated Annealing algorithm to search for the best solution for JPEG hardware software partitioning, the experimental results shown that using three processors is the optimal. To compare experimental results under two processors and system constraints, JEPG system design have 132% reduced logic elements and 162% power consumption by proposed ESA algorithm.

APA, Harvard, Vancouver, ISO, and other styles

50

Vadlamani, Ramakrishna P. "Approaches to multiprocessor error recovery using an on-chip interconnect subsystem." 2010. https://scholarworks.umass.edu/theses/380.

Full text

Abstract:

For future multicores, a dedicated interconnect subsystem for on-chip monitors was found to be highly beneficial in terms of scalability, performance and area. In this thesis, such a monitor network (MNoC) is used for multicores to support selective error identification and recovery and maintain target chip reliability in the context of dynamic voltage and frequency scaling (DVFS). A selective shared memory multiprocessor recovery is performed using MNoC in which, when an error is detected, only the group of processors sharing an application with the affected processors are recovered. Although the use of DVFS in contemporary multicores provides significant protection from unpredictable thermal events, a potential side effect can be an increased processor exposure to soft errors. To address this issue, a flexible fault prevention and recovery mechanism has been developed to selectively enable a small amount of per-core dual modular redundancy (DMR) in response to increased vulnerability, as measured by the processor architectural vulnerability factor (AVF). Our new algorithm for DMR deployment aims to provide a stable effective soft error rate (SER) by using DMR in response to DVFS caused by thermal events. The algorithm is implemented in real-time on the multicore using MNoC and controller which evaluates thermal information and multicore performance statistics in addition to error information. DVFS experiments with a multicore simulator using standard benchmarks show an average 6% improvement in overall power consumption and a stable SER by using selective DMR versus continuous DMR deployment.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Embedded multiprocessor systems'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles