Tesis: "Coarse-Grained Reconfigurable Architecture"

1

Guo, Yuanqing. "Mapping applications to a coarse-grained reconfigurable architecture". Enschede : University of Twente [Host], 2006. http://doc.utwente.nl/57121.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

2

Lee, Jong-Suk Mark. "FleXilicon: a New Coarse-grained Reconfigurable Architecture for Multimedia and Wireless Communications". Diss., Virginia Tech, 2010. http://hdl.handle.net/10919/77094.

Texto completo

Resumen

High computing power and flexibility are important design factors for multimedia and wireless communication applications due to the demand for high quality services and frequent evolution of standards. The ASIC (Application Specific Integrated Circuit) approach provides an area efficient, high performance solution, but is inflexible. In contrast, the general purpose processor approach is flexible, but often fails to provide sufficient computing power. Reconfigurable architectures, which have been introduced as a compromise between the two extreme solutions, have been applied successfully for multimedia and wireless communication applications. In this thesis, we investigated a new coarse-grained reconfigurable architecture called FleXilicon which is designed to execute critical loops efficiently, and is embedded in an SOC with a host processor. FleXilicon improves resource utilization and achieves a high degree of loop level parallelism (LLP). The proposed architecture aims to mitigate major shortcomings with existing architectures through adoption of three schemes, (i) wider memory bandwidth, (ii) adoption of a reconfigurable controller, and (iii) flexible wordlength support. Increased memory bandwidth satisfies memory access requirement in LLP execution. New design of reconfigurable controller minimizes overhead in reconfiguration and improves area efficiency and reconfiguration overhead. Flexible word-length support improves LLP by increasing the number of processing elements executable. The simulation results indicate that FleXilicon reduces the number of clock cycles and increases the speed for all five applications simulated. The speedup ratios compared with conventional architectures are as large as two orders of magnitude for some applications. VLSI implementation of FleXilicon in 65 nm CMOS process indicates that the proposed architecture can operate at a high frequency up to 1 GHz with moderate silicon area.
Ph. D.

Los estilos APA, Harvard, Vancouver, ISO, etc.

3

Yang, Yu. "BENCHMARK OF TRIGGERED INSTRUCTION BASED COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR RADIO BASE STATION". Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-177446.

Texto completo

Resumen

Spatially-programmed architectures such as FPGA are among the most prevailing hardware in various application areas. However FPGA suffers from great overheads such as area, latency and power efficiency. Coarse-grained Reconfigurable Architecture (CGRA) is designed in order to compensate these disadvantages of FPGA. In this thesis, a Triggered Instruction based novel CGRA designed by Intel is evaluated. Benchmark work in this thesis focuses on signal processing area. Three performance limiting functions, Channel Estimation, Radix-2 FFT and Interleaving are selected from LTE Uplink Receiver PHY Benchmark which is an open source benchmark, and implemented and analyzed in Triggered Instruction Architecture (TIA). Throughput-area relationships and throughput/area-area relationships are summarized in curves using a resource estimation method. The benchmark result shows that TIA offers good flexibility for temporal and spatial execution, and a mix of them. Designs in TIA are scalable and adjustable according to different performance requirement. Moreover, based on the development work, this thesis discusses development flow of TIA, various programming techniques, low latency mapping solutions, code size comparison, development environment and integration of heterogeneous system with TIA.

Los estilos APA, Harvard, Vancouver, ISO, etc.

4

Malik, Omer. "Pragma-Based Approach For Mapping DSP Functions On A Coarse Grained Reconfigurable Architecture". Licentiate thesis, KTH, Elektronik och Inbyggda System, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-166410.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

5

Zhao, Xin. "High efficiency coarse-grained customised dynamically reconfigurable architecture for digital image processing and compression technologies". Thesis, University of Edinburgh, 2012. http://hdl.handle.net/1842/6187.

Texto completo

Resumen

Digital image processing and compression technologies have significant market potential, especially the JPEG2000 standard which offers outstanding codestream flexibility and high compression ratio. Strong demand for high performance digital image processing and compression system solutions is forcing designers to seek proper architectures that offer competitive advantages in terms of all performance metrics, such as speed and power. Traditional architectures such as ASIC, FPGA and DSPs have limitations in either low flexibility or high power consumption. On the other hand, through the provision of a degree of flexibility similar to that of a DSP and performance and power consumption advantages approaching that of an ASIC, coarse-grained dynamically reconfigurable architectures are proving to be strong candidates for future high performance digital image processing and compression systems. This thesis investigates dynamically reconfigurable architectures and especially the newly emerging RICA paradigm. Case studies such as Reed- Solomon decoder and WiMAX OFDM timing synchronisation engine are implemented in order to explore the potential of RICA-based architectures and the possible optimisation approaches such as eliminating conditional branches, reducing memory accesses and constructing kernels. Based on investigations in this thesis, a novel customised dynamically reconfigurable architecture targeting digital image processing and compression applications is devised, which can be tailored to adopt different applications. A demosaicing engine based on the Freeman algorithm is designed and implemented on the proposed architecture as the pre-processing module in a digital imaging system. An efficient data buffer rotating scheme is designed with the aim of reducing memory accesses. Meanwhile an investigation targeting mapping the demosaicing engine onto a dual-core RICA platform is performed. After optimisation, the performance of the proposed engine is carefully evaluated and compared in aspects of throughput and consumed computational resources. When targeting the JPEG2000 standard, the core tasks such as 2-D Discrete Wavelet Transform (DWT) and Embedded Block Coding with Optimal Truncation (EBCOT) are implemented and optimised on the proposed architecture. A novel 2-D DWT architecture based on vector operations associated with RICA paradigm is developed, and the complete DWT application is highly optimised for both throughput and area. For the EBCOT implementation, a novel Partial Parallel Architecture (PPA) for the most computationally intensive module in EBCOT, termed Context Modeling (CM), is devised. Based on the algorithm evaluation, an ARM core is integrated into the proposed architecture for performance enhancement. A Ping-Pong memory switching mode with carefully designed communication scheme between RICA based architecture and ARM is proposed. Simulation results demonstrate that the proposed architecture for JPEG2000 offers significant advantage in throughput.

Los estilos APA, Harvard, Vancouver, ISO, etc.

6

Saraswat, Rohit. "A Finite Domain Constraint Approach for Placement and Routing of Coarse-Grained Reconfigurable Architectures". DigitalCommons@USU, 2010. https://digitalcommons.usu.edu/etd/689.

Texto completo

Resumen

Scheduling, placement, and routing are important steps in Very Large Scale Integration (VLSI) design. Researchers have developed numerous techniques to solve placement and routing problems. As the complexity of Application Specific Integrated Circuits (ASICs) increased over the past decades, so did the demand for improved place and route techniques. The primary objective of these place and route approaches has typically been wirelength minimization due to its impact on signal delay and design performance. With the advent of Field Programmable Gate Arrays (FPGAs), the same place and route techniques were applied to FPGA-based design. However, traditional place and route techniques may not work for Coarse-Grained Reconfigurable Architectures (CGRAs), which are reconfigurable devices offering wider path widths than FPGAs and more flexibility than ASICs, due to the differences in architecture and routing network. Further, the routing network of several types of CGRAs, including the Field Programmable Object Array (FPOA), has deterministic timing as compared to the routing fabric of most ASICs and FPGAs reported in the literature. This necessitates a fresh look at alternative approaches to place and route designs. This dissertation presents a finite domain constraint-based, delay-aware placement and routing methodology targeting an FPOA. The proposed methodology takes advantage of the deterministic routing network of CGRAs to perform a delay aware placement.

Los estilos APA, Harvard, Vancouver, ISO, etc.

7

Das, Satyajit. "Architecture and Programming Model Support for Reconfigurable Accelerators in Multi-Core Embedded Systems". Thesis, Lorient, 2018. http://www.theses.fr/2018LORIS490/document.

Texto completo

Resumen

La complexité des systèmes embarqués et des applications impose des besoins croissants en puissance de calcul et de consommation énergétique. Couplé au rendement en baisse de la technologie, le monde académique et industriel est toujours en quête d'accélérateurs matériels efficaces en énergie. L'inconvénient d'un accélérateur matériel est qu'il est non programmable, le rendant ainsi dédié à une fonction particulière. La multiplication des accélérateurs dédiés dans les systèmes sur puce conduit à une faible efficacité en surface et pose des problèmes de passage à l'échelle et d'interconnexion. Les accélérateurs programmables fournissent le bon compromis efficacité et flexibilité. Les architectures reconfigurables à gros grains (CGRA) sont composées d'éléments de calcul au niveau mot et constituent un choix prometteur d'accélérateurs programmables. Cette thèse propose d'exploiter le potentiel des architectures reconfigurables à gros grains et de pousser le matériel aux limites énergétiques dans un flot de conception complet. Les contributions de cette thèse sont une architecture de type CGRA, appelé IPA pour Integrated Programmable Array, sa mise en œuvre et son intégration dans un système sur puce, avec le flot de compilation associé qui permet d'exploiter les caractéristiques uniques du nouveau composant, notamment sa capacité à supporter du flot de contrôle. L'efficacité de l'approche est éprouvée à travers le déploiement de plusieurs applications de traitement intensif. L'accélérateur proposé est enfin intégré à PULP, a Parallel Ultra-Low-Power Processing-Platform, pour explorer le bénéfice de ce genre de plate-forme hétérogène ultra basse consommation
Emerging trends in embedded systems and applications need high throughput and low power consumption. Due to the increasing demand for low power computing and diminishing returns from technology scaling, industry and academia are turning with renewed interest toward energy efficient hardware accelerators. The main drawback of hardware accelerators is that they are not programmable. Therefore, their utilization can be low is they perform one specific function and increasing the number of the accelerators in a system on chip (SoC) causes scalability issues. Programmable accelerators provide flexibility and solve the scalability issues. Coarse-Grained Reconfigurable Array (CGRA) architecture consisting of several processing elements with word level granularity is a promising choice for programmable accelerator. Inspired by the promising characteristics of programmable accelerators, potentials of CGRAs in near threshold computing platforms are studied and an end-to-end CGRA research framework is developed in this thesis. The major contributions of this framework are: CGRA design, implementation, integration in a computing system, and compilation for CGRA. First, the design and implementation of a CGRA named Integrated Programmable Array (IPA) is presented. Next, the problem of mapping applications with control and data flow onto CGRA is formulated. From this formulation, several efficient algorithms are developed using internal resources of a CGRA, with a vision for low power acceleration. The algorithms are integrated into an automated compilation flow. Finally, the IPA accelerator is augmented in PULP - a Parallel Ultra-Low-Power Processing-Platform to explore heterogeneous computing

Los estilos APA, Harvard, Vancouver, ISO, etc.

8

Peyret, Thomas. "Architecture matérielle et flot de programmation associé pour la conception de systèmes numériques tolérants aux fautes". Thesis, Lorient, 2014. http://www.theses.fr/2014LORIS348/document.

Texto completo

Resumen

Que ce soit dans l’automobile avec des contraintes thermiques ou dans l’aérospatial et lenucléaire soumis à des rayonnements ionisants, l’environnement entraîne l’apparition de fautesdans les systèmes électroniques. Ces fautes peuvent être transitoires ou permanentes et vontinduire des résultats erronés inacceptables dans certains contextes applicatifs. L’utilisation decomposants dits « rad-hard » est parfois compromise par leurs coûts élevés ou les difficultésd’approvisionnement liés aux règles d’exportation.Cette thèse propose une approche conjointe matérielle et logicielle indépendante de la technologied’intégration permettant d’utiliser des composants numériques programmables dans desenvironnements susceptibles de générer des fautes. Notre proposition comporte la définitiond’une Architecture Reconfigurable à Gros Grains (CGRA) capable d’exécuter des codes applicatifscomplets mais aussi l’ensemble des mécanismes matériels et logiciels permettant de rendrecette architecture tolérante aux fautes. Ce résultat est obtenu par l’association de redondance etde reconfiguration dynamique du CGRA en s’appuyant sur une banque de configurations généréepar une chaîne de programmation complète. Cette chaîne outillée repose sur un flot permettantde porter un code sous forme de Control and Data Flow Graph (CDFG) sur l’architecture enobtenant un grand nombre de configurations différentes et qui permet d’exploiter au mieux lepotentiel de l’architecture.Les travaux, qui ont été validés aux travers d’expériences sur des applications du domaine dutraitement du signal et de l’image, ont fait l’objet de publications en conférences internationaleset de dépôts de brevets
Whether in automotive with heat stress or in aerospace and nuclear field subjected to cosmic,neutron and gamma radiation, the environment can lead to the development of faults in electronicsystems. These faults, which can be transient or permanent, will lead to erroneous results thatare unacceptable in some application contexts. The use of so-called rad-hard components issometimes compromised due to their high costs and supply problems associated with exportrules.This thesis proposes a joint hardware and software approach independent of integrationtechnology for using digital programmable devices in environments that generate faults. Ourapproach includes the definition of a Coarse Grained Reconfigurable Architecture (CGRA) ableto execute entire application code but also all the hardware and software mechanisms to make ittolerant to transient and permanent faults. This is achieved by the combination of redundancyand dynamic reconfiguration of the CGRA based on a library of configurations generated by acomplete conception flow. This implemented flow relies on a flow to map a code represented as aControl and Data Flow Graph (CDFG) on the CGRA architecture by obtaining directly a largenumber of different configurations and allows to exploit the full potential of architecture.This work, which has been validated through experiments with applications in the field ofsignal and image processing, has been the subject of two publications in international conferencesand of two patents

Los estilos APA, Harvard, Vancouver, ISO, etc.

9

Zain-ul-Abdin. "Programming of coarse-grained reconfigurable architectures". Doctoral thesis, Örebro universitet, Akademin för naturvetenskap och teknik, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:oru:diva-15246.

Texto completo

Resumen

Coarse-grained reconfigurable architectures, which offer massive parallelism coupled with the capability of undergoing run-time reconfiguration, are gaining attention in order to meet not only the increased computational demands of high-performance embedded systems, but also to fulfill the need of adaptability to functional requirements of the application. This thesis focuses on the programming aspects of such coarse-grained reconfigurable computing devices, including the relevant computation models that are capable of exposing different kinds of parallelism inherent in the application and the ability of these models to capture the adaptability requirements of the application. The thesis suggests the occam-pi language for programming of a broad class of coarse-grained reconfigurable architectures as an intermediate language; we call it intermediate, since we believe that the applicationprogramming is best done in a high-level domain-specific language. The salient properties of the occam-pi language are explicit concurrency with built-in mechanisms for interprocessorcommunication, provision for expressing dynamic parallelism, support for the expression of dynamic reconfigurations, and placement attributes. To evaluate the programming approach, a compiler framework was extended to support the language extensions in the occam-pi language, and backends were developed to target two different coarse-grained reconfigurable architectures. XPP and Ambric. The results on XPP reveal that the occam-pi based implementations produce comparable throughput to those of NML programs, while programming at a much higher level of abstraction than that of NML. Similarly the two occam-pi implementations of autofocus criterion calculation targeted to the Ambric platform outperform the CPU implementation by factors of 11-23. Thus, the results of the implemented case-studies suggest that the occam-pi language based approach simplifies the development of applications employing run-time reconfigurable devices without compromising the performance benefits.

Los estilos APA, Harvard, Vancouver, ISO, etc.

10

Ul-Abdin, Zain. "Programming of Coarse-Grained Reconfigurable Architectures". Doctoral thesis, Högskolan i Halmstad, Centrum för forskning om inbyggda system (CERES), 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-15050.

Texto completo

Resumen

Coarse-grained reconfigurable architectures, which offer massive parallelism coupled with the capability of undergoing run-time reconfiguration, are gaining attention in order to meet not only the increased computational demands of high-performance embedded systems, but also to fulfill the need of adaptability to functional requirements of the application. This thesis focuses on the programming aspects of such coarse-grained reconfigurable computing devices, including the relevant computation models that are capable of exposing different kinds of parallelism inherent in the application and the ability of these models to capture the adaptability requirements of the application. The thesis suggests the occam-pi language for programming of a broad class of coarse-grained reconfigurable architectures as an intermediate language; we call it intermediate, since we believe that the applicationprogramming is best done in a high-level domain-specific language. The salient properties of the occam-pi language are explicit concurrency with built-in mechanisms for interprocessorcommunication, provision for expressing dynamic parallelism, support for the expression of dynamic reconfigurations, and placement attributes. To evaluate the programming approach, a compiler framework was extended to support the language extensions in the occam-pi language, and backends were developed to target two different coarse-grained reconfigurable architectures. XPP and Ambric. The results on XPP reveal that the occam-pi based implementations produce comparable throughput to those of NML programs, while programming at a much higher level of abstraction than that of NML. Similarly the two occam-pi implementations of autofocus criterion calculation targeted to the Ambric platform outperform the CPU implementation by factors of 11-23. Thus, the results of the implemented case-studies suggest that the occam-pi language based approach simplifies the development of applications employing run-time reconfigurable devices without compromising the performance benefits.

Los estilos APA, Harvard, Vancouver, ISO, etc.

11

Bag, Zeki Ozan. "Energy-Aware Coarse Grained Reconfigurable Architectures Using Dynamically Reconfigurable Isolation Cells". Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-108217.

Texto completo

Resumen

This thesis presents a self adaptive power management system to improve energy efficiency of coarse-grained reconfigurable architectures (CGRAs). CGRAs can host multiple applications on a single platform. Moreover, a single application may have multiple versions which have different degree of parallelism (fully serial, partially serial, fully parallel etc.). Selection of the optimum application version depends on runtime conditions such as resource availability on the platform. A traditional worst case design to satisfy its specifications results in undesirable power efficiency. Existing solutions to this problem offer costly hardware to mainly employ dynamic voltage and frequency scaling (DVFS). We propose exploiting reconfiguration of available resources on CGRA. Our solution makes use of dynamically reconfigurable isolation cells (DRICs) instead of dedicated hardware. We also introduce autonomous parallelism, voltage and frequency selection (APVFS) to realize DVFS functionality and to select the optimum version. Three applications are used for simulations, namely; matrix multiplication, finite impulse response filter (FIR) and fast Fourier transform (FFT). Results show that up to 72 % and 55 % power and energy can be saved respectively. Synthesis of the fabric shows considerable reduction in area overheads compared to existing designs employing DVFS.

Los estilos APA, Harvard, Vancouver, ISO, etc.

12

Han, Wei. "Multi-core architectures with coarse-grained dynamically reconfigurable processors for broadband wireless access technologies". Thesis, University of Edinburgh, 2010. http://hdl.handle.net/1842/3812.

Texto completo

Resumen

Broadband Wireless Access technologies have significant market potential, especially the WiMAX protocol which can deliver data rates of tens of Mbps. Strong demand for high performance WiMAX solutions is forcing designers to seek help from multi-core processors that offer competitive advantages in terms of all performance metrics, such as speed, power and area. Through the provision of a degree of flexibility similar to that of a DSP and performance and power consumption advantages approaching that of an ASIC, coarse-grained dynamically reconfigurable processors are proving to be strong candidates for processing cores used in future high performance multi-core processor systems. This thesis investigates multi-core architectures with a newly emerging dynamically reconfigurable processor – RICA, targeting WiMAX physical layer applications. A novel master-slave multi-core architecture is proposed, using RICA processing cores. A SystemC based simulator, called MRPSIM, is devised to model this multi-core architecture. This simulator provides fast simulation speed and timing accuracy, offers flexible architectural options to configure the multi-core architecture, and enables the analysis and investigation of multi-core architectures. Meanwhile a profiling-driven mapping methodology is developed to partition the WiMAX application into multiple tasks as well as schedule and map these tasks onto the multi-core architecture, aiming to reduce the overall system execution time. Both the MRPSIM simulator and the mapping methodology are seamlessly integrated with the existing RICA tool flow. Based on the proposed master-slave multi-core architecture, a series of diverse homogeneous and heterogeneous multi-core solutions are designed for different fixed WiMAX physical layer profiles. Implemented in ANSI C and executed on the MRPSIM simulator, these multi-core solutions contain different numbers of cores, combine various memory architectures and task partitioning schemes, and deliver high throughputs at relatively low area costs. Meanwhile a design space exploration methodology is developed to search the design space for multi-core systems to find suitable solutions under certain system constraints. Finally, laying a foundation for future multithreading exploration on the proposed multi-core architecture, this thesis investigates the porting of a real-time operating system – Micro C/OS-II to a single RICA processor. A multitasking version of WiMAX is implemented on a single RICA processor with the operating system support.

Los estilos APA, Harvard, Vancouver, ISO, etc.

13

Balavendran, Joseph Rani Deepika. "Gamification to Solve a Mapping Problem in Electrical Engineering". Thesis, University of North Texas, 2020. https://digital.library.unt.edu/ark:/67531/metadc1703330/.

Texto completo

Resumen

Coarse-Grained Reconfigurable Architectures (CGRAs) are promising in developing high performance low-power portable applications. In this research, we crowdsource a mapping problem using gamification to harnass human intelligence. A scientific puzzle game, Untangled, was developed to solve a mapping problem by encapsulating architectural characteristics. The primary motive of this research is to draw insights from the mapping solutions of players who possess innate abilities like decision-making, creative problem-solving, recognizing patterns, and learning from experience. In this dissertation, an extensive analysis was conducted to investigate how players' computational skills help to solve an open-ended problem with different constraints. From this analysis, we discovered a few common strategies among players, and subsequently, a library of dictionaries containing identified patterns from players' solutions was developed. The findings help to propose a better version of the game that incorporates these techniques recognized from the experience of players. In the future, an updated version of the game that can be developed may help low-performance players to provide better solutions for a mapping problem. Eventually, these solutions may help to develop efficient mapping algorithms, In addition, this research can be an exemplar for future researchers who want to crowdsource such electrical engineering problems and this approach can also be applied to other domains.

Los estilos APA, Harvard, Vancouver, ISO, etc.

14

Malla, Tika Kumari. "Case Studies to Learn Human Mapping Strategies in a Variety of Coarse-Grained Reconfigurable Architectures". Thesis, University of North Texas, 2017. https://digital.library.unt.edu/ark:/67531/metadc984195/.

Texto completo

Resumen

Computer hardware and algorithm design have seen significant progress over the years. It is also seen that there are several domains in which humans are more efficient than computers. For example in image recognition, image tagging, natural language understanding and processing, humans often find complicated algorithms quite easy to grasp. This thesis presents the different case studies to learn human mapping strategy to solve the mapping problem in the area of coarse-grained reconfigurable architectures (CGRAs). To achieve optimum level performance and consume less energy in CGRAs, place and route problem has always been a major concern. Making use of human characteristics can be helpful in problems as such, through pattern recognition and experience. Therefore to conduct the case studies a computer mapping game called UNTANGLED was analyzed as a medium to convey insights of human mapping strategies in a variety of architectures. The purpose of this research was to learn from humans so that we can come up with better algorithms to outperform the existing algorithms. We observed how human strategies vary as we present them with different architectures, different architectures with constraints, different visualization as well as how the quality of solution changes with experience. In this work all the case studies obtained from exploiting human strategies provide useful feedback that can improve upon existing algorithms. These insights can be adapted to find the best architectural solution for a particular domain and for future research directions for mapping onto mesh-and- stripe based CGRAs.

Los estilos APA, Harvard, Vancouver, ISO, etc.

15

Muir, Mark I. R. "Re-targetable tools and methodologies for the efficient deployment of high-level source code on coarse-grained dynamically reconfigurable architectures". Thesis, University of Edinburgh, 2009. http://hdl.handle.net/1842/27072.

Texto completo

Resumen

Reconfigurable computing traditionally consists of a data path machine (such as an FPGA) acting as a co-processor to a conventional microprocessor. This involves partitioning the application such that the data path intensive parts are implemented on the reconfigurable fabric, and the control flow intensive parts are implemented on the microprocessor. Often the two parts have to be written in different languages. New highly parallel data path architectures allow parallelism approaching that of FPGAs, but are able to be reconfigured very rapidly. As a result, it is possible to use these architectures to perform control flow in a manner similar to a microprocessor, and thus a complete program can be described from an unmodified high-level language (in particular C). This overcomes the historical instruction-level parallelism (ILP) wall. To make full use of the available parallelism, existing microprocessor tool flows are insufficient. Data path machines are typically programmed via HDL tools from the ASIC design world. This expresses algorithms at a lower level than the application algorithms are typically developed in. The work in this thesis builds upon earlier work to allow applications to be described from high-level languages, by employing low-level optimisations in the compiler back-end and working from the assembly, to maximise parallel efficiency. This consists of scheduling, where known techniques are used to pack instructions into basic blocks that map well to the reconfigurable core (optimising spatial efficiency); then automatic pipelining is applied to dramatically improve the achievable throughput (optimising temporal efficiency). Together these can be thought of as 'instruction-level parallelism done right'. Speed-ups of more than an order of magnitude were achieved, yielding throughputs of 180-380MPixels/s on typical image signal processing tasks, matching the performance of hard-wired ASICs. Furthermore, conventional software-based simulation technologies for data path machines are too slow for use in application verification. This thesis demonstrates how a high-speed software emulator can be created for self-controlled dynamically reconfigurable data path machines, using a static serialisation of the data paths in each configuration context. This yields run-time performance several orders of magnitude higher than existing techniques, making it suitable for use in feedback-directed optimisation.

Los estilos APA, Harvard, Vancouver, ISO, etc.

16

Kim, Yoonjin. "DESIGNING COST-EFFECTIVE COARSE-GRAINED RECONFIGURABLE ARCHITECTURE". 2009. http://hdl.handle.net/1969.1/ETD-TAMU-2009-05-649.

Texto completo

Resumen

Application-specific optimization of embedded systems becomes inevitable to satisfy the market demand for designers to meet tighter constraints on cost, performance and power. On the other hand, the flexibility of a system is also important to accommodate the short time-to-market requirements for embedded systems. To compromise these incompatible demands, coarse-grained reconfigurable architecture (CGRA) has emerged as a suitable solution. A typical CGRA requires many processing elements (PEs) and a configuration cache for reconfiguration of its PE array. However, such a structure consumes significant area and power. Therefore, designing cost-effective CGRA has been a serious concern for reliability of CGRA-based embedded systems. As an effort to provide such cost-effective design, the first half of this work focuses on reducing power in the configuration cache. For power saving in the configuration cache, a low power reconfiguration technique is presented based on reusable context pipelining achieved by merging the concept of context reuse into context pipelining. In addition, we propose dynamic context compression capable of supporting only required bits of the context words set to enable and the redundant bits set to disable. Finally, we provide dynamic context management capable of reducing reduce power consumption in configuration cache by controlling a read/write operation of the redundant context words In the second part of this dissertation, we focus on designing a cost-effective PE array to reduce area and power. For area and power saving in a PE array, we devise a costeffective array fabric addresses novel rearrangement of processing elements and their interconnection designs to reduce area and power consumption. In addition, hierarchical reconfigurable computing arrays are proposed consisting of two reconfigurable computing blocks with two types of communication structure together. The two computing blocks have shared critical resources and such a sharing structure provides efficient communication interface between them with reducing overall area. Based on the proposed design approaches, a CGRA combining the multiple design schemes is shown to verify the synergy effect of the integrated approach. Experimental results show that the integrated approach reduces area by 23.07% and power by up to 72% when compared with the conventional CGRA.

Los estilos APA, Harvard, Vancouver, ISO, etc.

17

Varadarajan, Keshavan. "A Coarse Grained Reconfigurable Architecture Framework Supporting Macro-Dataflow Execution". Thesis, 2012. http://etd.iisc.ernet.in/handle/2005/2302.

Texto completo

Resumen

A Coarse-Grained Reconfigurable Architecture (CGRA) is a processing platform which constitutes an interconnection of coarse-grained computation units (viz. Function Units (FUs), Arithmetic Logic Units (ALUs)). These units communicate directly, viz. send-receive like primitives, as opposed to the shared memory based communication used in multi-core processors. CGRAs are a well-researched topic and the design space of a CGRA is quite large. The design space can be represented as a 7-tuple (C, N, T, P, O, M, H) where each of the terms have the following meaning: C -choice of computation unit, N -choice of interconnection network, T -Choice of number of context frame (single or multiple), P -presence of partial reconfiguration, O choice of orchestration mechanism, M -design of memory hierarchy and H host-CGRA coupling. In this thesis, we develop an architectural framework for a Macro-Dataflow based CGRA where we make the following choice for each of these parameters: C -ALU, N -Network-on-Chip (NoC), T -Multiple contexts, P -support for partial reconfiguration, O -Macro Dataflow based orchestration, M -data memory banks placed at the periphery of the reconfigurable fabric (reconfigurable fabric is the name given to the interconnection of computation units), H -loose coupling between host processor and CGRA, enabling our CGRA to execute an application independent of the host-processor’s intervention. The motivations for developing such a CGRA are: To execute applications efficiently through reduction in reconfiguration time (i.e. the time needed to transfer instructions and data to the reconfigurable fabric) and reduction in execution time through better exploitation of all forms of parallelism: Instruction Level Parallelism (ILP), Data Level Parallelism (DLP) and Thread/Task Level Parallelism (TLP). We choose a macro-dataflow based orchestration framework in combination with partial reconfiguration so as to ease exploitation of TLP and DLP. Macro-dataflow serves as a light weight synchronization mechanism. We experiment with two variants of the macro-dataflow orchestration units, namely: hardware controlled orchestration unit and the compiler controlled orchestration unit. We employ a NoC as it helps reduce the reconfiguration overhead. To permit customization of the CGRA for a particular domain through the use of domain-specific custom-Intellectual Property (IP) blocks. This aids in improving both application performance and makes it energy efficient. To develop a CGRA which is completely programmable and accepts any program written using the C89 standard. The compiler and the architecture were co-developed to ensure that every feature of the architecture could be automatically programmed through an application by a compiler. In this CGRA framework, the orchestration mechanism (O) and the host-CGRA coupling (H) are kept fixed and we permit design space exploration of the other terms in the 7-tuple design space. The mode of compilation and execution remains invariant of these changes, hence referred to as a framework. We now elucidate the compilation and execution flow for this CGRA framework. An application written in C language is compiled and is transformed into a set of temporal partitions, referred to as HyperOps in this thesis. The macro-dataflow orchestration unit selects a HyperOp for execution when all its inputs are available. The instructions and operands for a ready HyperOp are transferred to the reconfigurable fabric for execution. Each ALU (in the computation unit) is capable of waiting for the availability of the input data, prior to issuing instructions. We permit the launch and execution of a temporal partition to progress in parallel, which reduces the reconfiguration overhead. We further cut launch delays by keeping loops persistent on fabric and thus eliminating the need to launch the instructions. The CGRA framework has been implemented using Bluespec System Verilog. We evaluate the performance of two of these CGRA instances: one for cryptographic applications and another instance for linear algebra kernels. We also run other general purpose integer and floating point applications to demonstrate the generic nature of these optimizations. We explore various microarchitectural optimizations viz. pipeline optimizations (i.e. changing value of T ), different forms of macro dataflow orchestration such as hardware controlled orchestration unit and compiler-controlled orchestration unit, different execution modes including resident loops, pipeline parallelism, changes to the router etc. As a result of these optimizations we observe 2.5x improvement in performance as compared to the base version. The reconfiguration overhead was hidden through overlapping launching of instructions with execution making. The perceived reconfiguration overhead is reduced drastically to about 9-11 cycles for each HyperOp, invariant of the size of the HyperOp. This can be mainly attributed to the data dependent instruction execution and use of the NoC. The overhead of the macro-dataflow execution unit was reduced to a minimum with the compiler controlled orchestration unit. To benchmark the performance of these CGRA instances, we compare the performance of these with an Intel Core 2 Quad running at 2.66GHz. On the cryptographic CGRA instance, running at 700MHz, we observe one to two orders of improvement in performance for cryptographic applications and up to one order of magnitude performance degradation for linear algebra CGRA instance. This relatively poor performance of linear algebra kernels can be attributed to the inability in exploiting ILP across computation units interconnected by the NoC, long latency in accessing data memory placed at the periphery of the reconfigurable fabric and unavailability of pipelined floating point units (which is critical to the performance of linear algebra kernels). The superior performance of the cryptographic kernels can be attributed to higher computation to load instruction ratio, careful choice of custom IP block, ability to construct large HyperOps which allows greater portion of the communication to be performed directly (as against communication through a register file in a general purpose processor) and the use of resident loops execution mode. The power consumption of a computation unit employed on the cryptography CGRA instance, along with its router is about 76mW, as estimated by Synopsys Design Vision using the Faraday 90nm technology library for an activity factor of 0.5. The power of other instances would be dependent on specific instantiation of the domain specific units. This implies that for a reconfigurable fabric of size 5 x 6 the total power consumption is about 2.3W. The area and power ( 84mW) dissipated by the macro dataflow orchestration unit, which is common to both instances, is comparable to a single computation unit, making it an effective and low overhead technique to exploit TLP.

Los estilos APA, Harvard, Vancouver, ISO, etc.

18

Kwok, Zion Siu-On. "Register file architecture optimization in a coarse-grained reconfigurable array". Thesis, 2005. http://hdl.handle.net/2429/16551.

Texto completo

Resumen

This thesis investigates the impact of the global and local register file architecture on a reconfigurable system based on the ADRES architecture. The register files consume a significant amount of area on the reconfigurable device, and their architecture has a strong impact on the performance. We found that the global registers should be tightly connected to as many functional units as possible, while the connection of the local register files to their neighbours is less critical. We found that the global register file should contain 14 registers, while each local register file should only contain two registers. We used these results to propose two new architectures that demonstrate between -33% and 383% higher instructions per cycle per unit area compared to the original 4x4 and 8x8 array architectures, with 56% and 88% average improvement over a set of benchmarks for the new 4x4 and 8x8 array architectures, respectively.
Applied Science, Faculty of
Electrical and Computer Engineering, Department of
Graduate

Los estilos APA, Harvard, Vancouver, ISO, etc.

19

Alle, Mythri. "Compiling For Coarse-Grained Reconfigurable Architectures Based On Dataflow Execution Paradigm". Thesis, 2012. http://etd.iisc.ernet.in/handle/2005/2453.

Texto completo

Resumen

Coarse-Grained Reconfigurable Architectures(CGRAs) can be employed for accelerating computational workloads that demand both flexibility and performance. CGRAs comprise a set of computation elements interconnected using a network and this interconnection of computation elements is referred to as a reconfigurable fabric. The size of application that can be accommodated on the reconfigurable fabric is limited by the size of instruction buffers associated with each Compute element. When an application cannot be accommodated entirely, application is partitioned such that each of these partitions can be executed on the reconfigurable fabric. These partitions are scheduled by an orchestrator. The orchestrator employs dynamic dataflow execution paradigm. Dynamic dataflow execution paradigm has inherent support for synchronization and helps in exploitation of parallelism that exists across application partitions. In this thesis, we present a compiler that targets such CGRAs. The compiler presented in this thesis is capable of accepting applications specified in C89 standard. To enable architectural design space exploration, the compiler is designed such that it can be customized for several instances of CGRAs employing dataflow execution paradigm at the orchestrator. This can be achieved by specifying the appropriate configuration parameters to the compiler. The focus of this thesis is to provide efficient support for various kinds of parallelism while ensuring correctness. The compiler is designed to support fine-grained task level parallelism that exists across iterations of loops and function calls. Additionally, compiler can also support pipeline parallelism, where a loop is split into multiple stages that execute in a pipelined manner. The prototype compiler, which targets multiple instances of a CGRA, is demonstrated in this thesis. We used this compiler to target multiple variants of CGRAs employing dataflow execution paradigm. We varied the reconfigur-able fabric, orchestration mechanism employed, size of instruction buffers. We also choose applications from two different domains viz. cryptography and linear algebra. The execution time of the CGRA (the best among all instances) is compared against an Intel Quad core processor. Cryptography applications show a performance improvement ranging from more than one order of magnitude to close to two orders of magnitude. These applications have large amounts of ILP and our compiler could successfully expose the ILP available in these applications. Further, the domain customization also played an important role in achieving good performance. We employed two custom functional units for accelerating Cryptography applications and compiler could efficiently use them. In linear algebra kernels we observe multiple iterations of the loop executing in parallel, effectively exploiting loop-level parallelism at runtime. Inspite of this we notice close to an order of magnitude performance degradation. The reason for this degradation can be attributed to the use of non-pipelined floating point units, and the delays involved in accessing memory. Pipeline parallelism was demonstrated using this compiler for FFT and QR factorization. Thus, the compiler is capable of efficiently supporting different kinds of parallelism and can support complete C89 standard. Further, the compiler can also support different instances of CGRAs employing dataflow execution paradigm.

Los estilos APA, Harvard, Vancouver, ISO, etc.

20

Biswas, Prasenjit. "Hardware Consolidation Of Systolic Algorithms On A Coarse Grained Runtime Reconfigurable Architecture". Thesis, 2011. http://etd.iisc.ernet.in/handle/2005/2108.

Texto completo

Resumen

Application domains such as Bio-informatics, DSP, Structural Biology, Fluid Dynamics, high resolution direction finding, state estimation, adaptive noise cancellation etc. demand high performance computing solutions for their simulation environments. The core computations of these applications are in Numerical Linear Algebra (NLA) kernels. Direct solvers are predominantly required in the domains like DSP, estimation algorithms like Kalman Filter etc, where the matrices on which operations need to be performed are either small or medium sized, but dense. Faddeev's Algorithm is often used for solving dense linear system of equations. Modified Faddeev's algorithm (MFA) is a general algorithm on which LU decomposition, QR factorization or SVD of matrices can be realized. MFA has the good property of realizing a host of matrix operations by computing the Schur complements on four blocked matrices, thereby reducing the overall computation requirements. We will use MFA as a representative Direct Solver in this work. We further discuss Given's rotation based QR algorithm for Decomposition of any matrix, often used to solve the linear least square problem. Systolic Array Architectures are widely accepted ASIC solutions for NLA algorithms. But the \can of worms" associated with this traditional solution spawns the need for alternative solutions. While popular custom hardware solution in form of systolic arrays can deliver high performance, but because of their rigid structure they are not scalable and reconfigurable, and hence not commercially viable. We show how a Reconfigurable computing platform can serve to contain the \can of worms". REDEFINE, a coarse grained runtime reconfigurable architecture has been used for systolic actualization of NLA kernels. We elaborate upon streaming NLA-specific enhancements to REDEFINE in order to meet expected performance goals. We explore the need for an algorithm aware custom compilation framework. We bring about a proposition to realize Faddeev's Algorithm on REDEFINE. We show that REDEFINE performs several times faster than traditional GPPs. Further we direct our interest to QR Decomposition to be the next NLA kernel as it ensures better stability than LU and other decompositions. We use QR Decomposition as a case study to explore the design space of the proposed solution on REDEFINE. We also investigate the architectural details of the Custom Functional Units (CFU) for these NLA kernels. We determine the right size of the sub-array in accordance with the optimal pipeline depth of the core execution units and the number of such units to be used per sub-array. The framework used to realize QR Decomposition can be generalized for the realization of other algorithms dealing with decompositions like LU, Faddeev's Algorithm, Gauss-Jordon etc with different CFU definitions .

Los estilos APA, Harvard, Vancouver, ISO, etc.

21

"Scalable Register File Architecture for CGRA Accelerators". Master's thesis, 2016. http://hdl.handle.net/2286/R.I.40738.

Texto completo

Resumen

abstract: Coarse-grained Reconfigurable Arrays (CGRAs) are promising accelerators capable of accelerating even non-parallel loops and loops with low trip-counts. One challenge in compiling for CGRAs is to manage both recurring and nonrecurring variables in the register file (RF) of the CGRA. Although prior works have managed recurring variables via rotating RF, they access the nonrecurring variables through either a global RF or from a constant memory. The former does not scale well, and the latter degrades the mapping quality. This work proposes a hardware-software codesign approach in order to manage all the variables in a local nonrotating RF. Hardware provides modulo addition based indexing mechanism to enable correct addressing of recurring variables in a nonrotating RF. The compiler determines the number of registers required for each recurring variable and configures the boundary between the registers used for recurring and nonrecurring variables. The compiler also pre-loads the read-only variables and constants into the local registers in the prologue of the schedule. Synthesis and place-and-route results of the previous and the proposed RF design show that proposed solution achieves 17% better cycle time. Experiments of mapping several important and performance-critical loops collected from MiBench show proposed approach improves performance (through better mapping) by 18%, compared to using constant memory.
Dissertation/Thesis
Masters Thesis Computer Science 2016

Los estilos APA, Harvard, Vancouver, ISO, etc.

22

Shehan, Basher [Verfasser]. "Dynamic coarse grained reconfigurable architectures / presented by Basher Shehan". 2010. http://d-nb.info/1010124390/34.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

23

Merchant, Farhad. "Algorithm-Architecture Co-Design for Dense Linear Algebra Computations". Thesis, 2015. http://etd.iisc.ernet.in/2005/3958.

Texto completo

Resumen

Achieving high computation efficiency, in terms of Cycles per Instruction (CPI), for high-performance computing kernels is an interesting and challenging research area. Dense Linear Algebra (DLA) computation is a representative high-performance computing ap- plication, which is used, for example, in LU and QR factorizations. Unfortunately, mod- ern off-the-shelf microprocessors fall significantly short of achieving theoretical lower bound in CPI for high performance computing applications. In this thesis, we perform an in-depth analysis of the available parallelisms and propose suitable algorithmic and architectural variation to significantly improve the computation efficiency. There are two standard approaches for improving the computation effficiency, first, to perform application-specific architecture customization and second, to do algorithmic tuning. In the same manner, we first perform a graph-based analysis of selected DLA kernels. From the various forms of parallelism, thus identified, we design a custom processing element for improving the CPI. The processing elements are used as building blocks for a commercially available Coarse-Grained Reconfigurable Architecture (CGRA). By per- forming detailed experiments on a synthesized CGRA implementation, we demonstrate that our proposed algorithmic and architectural variations are able to achieve lower CPI compared to off-the-shelf microprocessors. We also benchmark against state-of-the-art custom implementations to report higher energy-performance-area product. DLA computations are encountered in many engineering and scientific computing ap- plications ranging from Computational Fluid Dynamics (CFD) to Eigenvalue problem. Traditionally, these applications are written in highly tuned High Performance Comput- ing (HPC) software packages like Linear Algebra Package (LAPACK), and/or Scalable Linear Algebra Package (ScaLAPACK). The basic building block for these packages is Ba- sic Linear Algebra Subprograms (BLAS). Algorithms pertaining LAPACK/ScaLAPACK are written in-terms of BLAS to achieve high throughput. Despite extensive intellectual efforts in development and tuning of these packages, there still exists a scope for fur- ther tuning in this packages. In this thesis, we revisit most prominent and widely used compute bound algorithms like GMM for further exploitation of Instruction Level Parallelism (ILP). We further look into LU and QR factorizations for generalizations and exhibit higher ILP in these algorithms. We first accelerate sequential performance of the algorithms in BLAS and LAPACK and then focus on the parallel realization of these algorithms. Major contributions in the algorithmic tuning in this thesis are as follows: Algorithms: We present graph based analysis of General Matrix Multiplication (GMM) and discuss different types of parallelisms available in GMM We present analysis of Givens Rotation based QR factorization where we improve GR and derive Column-wise GR (CGR) that can annihilate multiple elements of a column of a matrix simultaneously. We show that the multiplications in CGR are lower than GR We generalize CGR further and derive Generalized GR (GGR) that can annihilate multiple elements of the columns of a matrix simultaneously. We show that the parallelism exhibited by GGR is much higher than GR and Householder Transform (HT) We extend generalizations to Square root Free GR (also knows as Fast Givens Rotation) and Square root and Division Free GR (SDFG) and derive Column-wise Fast Givens, and Column-wise SDFG . We also extend generalization for complex matrices and derive Complex Column-wise Givens Rotation Coarse-grained Recon gurable Architectures (CGRAs) have gained popularity in the last decade due to their power and area efficiency. Furthermore, CGRAs like REDEFINE also exhibit support for domain customizations. REDEFINE is an array of Tiles where each Tile consists of a Compute Element and a Router. The Routers are responsible for on-chip communication, while Compute Elements in the REDEFINE can be domain customized to accelerate the applications pertaining to the domain of interest. In this thesis, we consider REDEFINE base architecture as a starting point and we design Processing Element (PE) that can execute algorithms in BLAS and LAPACK efficiently. We perform several architectural enhancements in the PE to approach lower bound of the CPI. For parallel realization of BLAS and LAPACK, we attach this PE to the Router of REDEFINE. We achieve better area and power performance compared to the yesteryear customized architecture for DLA. Major contributions in architecture in this thesis are as follows: Architecture: We present design of a PE for acceleration of GMM which is a Level-3 BLAS operation We methodically enhance the PE with different features for improvement in the performance of GMM For efficient realization of Linear Algebra Package (LAPACK), we use PE that can efficiently execute GMM and show better performance For further acceleration of LU and QR factorizations in LAPACK, we identify macro operations encountered in LU and QR factorizations, and realize them on a reconfigurable data-path resulting in 25-30% lower run-time

Los estilos APA, Harvard, Vancouver, ISO, etc.

24

Jiang, Jun-Bin y 江俊賓. "A Predicate-Aware Modulo Scheduling for Coarse Grained Reconfigurable Architectures". Thesis, 2011. http://ndltd.ncl.edu.tw/handle/qrf68u.

Texto completo

Resumen

碩士
國立交通大學
電機學院IC設計產業專班
100
To balance the efficiency and flexibility, a coarse-grain reconfigurable architecture (CGRA) is proposed, which exploits the parallelism of a program without compromising of its flexibility. However, how to find more operation parallelism is a complicated problem for compilation. Modulo scheduling is one of the most adopted operation scheduling techniques in recent years, which introduces more parallelism by overlapping the iterations of a loop. Although modulo scheduling parallelizes lots of operations, we still observe that hardware resources is limited by 37.8% conditional executed operations. In this research, we propose a predicate-aware modulo scheduling which may map two disjoint operations into a same processing element to reduce the requirements of hardware resources; meanwhile, the corresponding architecture is also proposed. In addition, a weighted cost value mapping decision selection heuristic is designed to improve execution performance for the reconfigurable architecture. Our experimental results indicate that the initial interval of a loop of the selected benchmarks can be reduced by 12% to 25.2% compared with a related work and there is still 18 % reduction when compared with the related work that are equipped more resources.

Los estilos APA, Harvard, Vancouver, ISO, etc.

25

"Register File Organization for Coarse-Grained Reconfigurable Architectures: Compiler-Microarchitecture Perspective". Master's thesis, 2014. http://hdl.handle.net/2286/R.I.25844.

Texto completo

Resumen

abstract: Coarse-Grained Reconfigurable Architectures (CGRA) are a promising fabric for improving the performance and power-efficiency of computing devices. CGRAs are composed of components that are well-optimized to execute loops and rotating register file is an example of such a component present in CGRAs. Due to the rotating nature of register indexes in rotating register file, it is very challenging, if at all possible, to hold and properly index memory addresses (pointers) and static values. In this Thesis, different structures for CGRA register files are investigated. Those structures are experimentally compared in terms of performance of mapped applications, design frequency, and area. It is shown that a register file that can logically be partitioned into rotating and non-rotating regions is an excellent choice because it imposes the minimum restriction on underlying CGRA mapping algorithm while resulting in efficient resource utilization.
Dissertation/Thesis
Masters Thesis Computer Science 2014

Los estilos APA, Harvard, Vancouver, ISO, etc.

26

Obeid, Abdulfattah Mohammad. "Architectural Synthesis of a Coarse-Grained Run-Time-Reconfigurable Accelerator for DSP Applications". Phd thesis, 2006. https://tuprints.ulb.tu-darmstadt.de/668/1/ObeidDissG_Part1v2.pdf.

Texto completo

Resumen

Given all its merits and potential, Reconfigurable Computing has attracted lots of research work. Reconfiguration costs as well as new Reconfigurable Computing specific challenges have so far been the main obstacles hindering reaching optimal reconfigurable computing solutions. Because of the flexibility offered by Reconfigurable Computing many new design parameters that were previously unknown now exist. Dynamic reconfiguration, partial reconfiguration, context management and HW/SW issues are among these. Depending on the target set of applications, different design decisions can be made in order to optimize the reconfigurable solution according to the target application constraints. In this thesis the HPad, an efficient coarse-grained dynamically reconfigurable solution targeted for DSP computation, is proposed. The HPad architecture was greatly influenced by reported VLSI architectures of a variety of DSP algorithms. Based on observations of the characteristics of these DSP algorithms and their architectures the HPad was chosen to be a heterogeneous and dynamically reconfigurable coarse grained solution. The HPad features partial, dynamic, and background reconfiguration capabilities. In addition, the HPad data path architecture is tailored to efficiently realize the studied DSP applications. Through the use of local reconfiguration interface sockets around each processing element, the dynamic reconfiguration problem is partitioned and efficiently solved. The HPad was modeled and synthesized with a parameterizable VHDL code written at the RTL level. Parameterizing the code was beneficial since it permitted generation of new designs simply by changing a few constants and recompiling. The model consisted of several thousand lines of code. Mapping and routing of several pipelined architectures of DSP algorithms were examined to demonstrate the suitability and validity of the HPad to the proposed scope of

Los estilos APA, Harvard, Vancouver, ISO, etc.

27

Obeid, Abdulfattah Mohammad [Verfasser]. "Architectural synthesis of a coarse-grained run-time-reconfigurable accelerator for DSP applications / Abdulfattah Mohammad Obeid". 2006. http://d-nb.info/979006651/34.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Tesis sobre el tema "Coarse-Grained Reconfigurable Architecture"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros