Log in

Relevant bibliographies by topics / Very Long Instruction Word (VLIW) Processors

Contents

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers

Academic literature on the topic 'Very Long Instruction Word (VLIW) Processors'

Author: Grafiati

Published: 4 June 2021

Last updated: 6 September 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Very Long Instruction Word (VLIW) Processors.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Very Long Instruction Word (VLIW) Processors"

1

Mego, Roman, and Tomas Fryza. "Instruction mapping techniques for processors with very long instruction word architectures." Journal of Electrical Engineering 73, no. 6 (2022): 387–95. http://dx.doi.org/10.2478/jee-2022-0053.

Full text

Abstract:

Abstract This paper presents an instruction mapping technique for generating a low-level assembly code for digital signal processing algorithms. This technique helps developers to implement retargetable kernel functions with the performance benefits of the low-level assembly languages. The approach is aimed at exceptionally long instruction word (VLIW) architectures, which benefits the most from the proposed method. Mapped algorithms are described by the signal-flow graphs, which are used to find possible parallel operations. The algorithm is converted into low-level code and mapped to the target architecture. This process also introduces the optimization of instruction mapping priority, which leads to the more effective code. The technique was verified on selected kernels, compared to the common programming methods, and proved that it is suitable for VLIW architectures and for portability to other systems.

APA, Harvard, Vancouver, ISO, and other styles

2

CHEN, YUNG-YUAN. "INCORPORATING FAULT-TOLERANT FEATURES IN VLIW PROCESSORS." International Journal of Reliability, Quality and Safety Engineering 12, no. 05 (2005): 397–411. http://dx.doi.org/10.1142/s0218539305001914.

Full text

Abstract:

In recent years, very long instruction word (VLIW) processor has attracted much attention in that it offers a high instruction level parallelism and reduces the hardware design complexity. In this paper, we present two fault-tolerant schemes for VLIW processors. The first one is termed as test-instruction scheme which is based on the concept of instruction duplication to detect the errors. The process of test-instruction scheme consists of the error detection, error rollback recovery and reconfiguration. The second approach is called self-checking scheme which adopts the concept of self-checking logic to detect the errors. A real-time error recovery procedure is developed to conquer the errors. We implement the proposed designs of fault-tolerant VLIW processor in VHDL and employ the fault injection and fault simulation to validate our schemes. The main contribution of this research is to present the complete frameworks from error detection to error recovery for fault-tolerant design of VLIW processors. Experience learned from this investigation is that the issues of error detection and error recovery entail considering together. Without taking both issues into account simultaneously, the outcomes may lead to the improper conclusions.

APA, Harvard, Vancouver, ISO, and other styles

3

Zhang, Haifeng, Xiaoti Wu, Yuyu Du, et al. "A Heterogeneous RISC-V Processor for Efficient DNN Application in Smart Sensing System." Sensors 21, no. 19 (2021): 6491. http://dx.doi.org/10.3390/s21196491.

Full text

Abstract:

Extracting features from sensing data on edge devices is a challenging application for which deep neural networks (DNN) have shown promising results. Unfortunately, the general micro-controller-class processors which are widely used in sensing system fail to achieve real-time inference. Accelerating the compute-intensive DNN inference is, therefore, of utmost importance. As the physical limitation of sensing devices, the design of processor needs to meet the balanced performance metrics, including low power consumption, low latency, and flexible configuration. In this paper, we proposed a lightweight pipeline integrated deep learning architecture, which is compatible with open-source RISC-V instructions. The dataflow of DNN is organized by the very long instruction word (VLIW) pipeline. It combines with the proposed special intelligent enhanced instructions and the single instruction multiple data (SIMD) parallel processing unit. Experimental results show that total power consumption is about 411 mw and the power efficiency is about 320.7 GOPS/W.

APA, Harvard, Vancouver, ISO, and other styles

4

Li, Hong Yan. "Research on Cipher Coprocessor Instruction Level Parallelism Compiler." Applied Mechanics and Materials 130-134 (October 2011): 2907–10. http://dx.doi.org/10.4028/www.scientific.net/amm.130-134.2907.

Full text

Abstract:

The important method of studying cipher coprocessor is focus on system architecture of processor in combination with reconfigurable design technique. How to improve performance of cipher coprocessor is important. Based on very long instruction word (VLIW) structure and reconfigurable design technique, specific instruction cipher coprocessor is designed. In this paper, the cipher coprocessor instruction level parallelism compilation technique is studied to enhance the cipher coprocessor performance by increasing the instruction level parallelism.

APA, Harvard, Vancouver, ISO, and other styles

5

Romanova, Tatiana Nikolaevna, and Dmitry Igorevich Gorin. "CODE OPTIMIZATION METHOD FOR QUALCOMM HEXAGON PROCESSOR, SUPPORTING INSTRUCTION LEVEL PARALLELISM AND BUILT WITH VLIW (Very Long Instruction Word) ARCHITECTURE." ITNOU: Information technologies in science, education and management 115 (2021): 105–15. http://dx.doi.org/10.47501/itnou.2021.1.105-115.

Full text

Abstract:

A method for optimizing the filling of a machine word with independent instructions is proposed, which allows to increase the performance of programs by stacking the maximum number of independent commands in a package. The paper also confirms the hypothesis that with the transition to random register allocation by the compiler, the packet density will increase, which will result in a decrease in the program's running time.

APA, Harvard, Vancouver, ISO, and other styles

6

CATANIA, VINCENZO, MAURIZIO PALESI, and DAVIDE PATTI. "ANALYSIS AND TOOLS FOR THE DESIGN OF VLIW EMBEDDED SYSTEMS IN A MULTI-OBJECTIVE SCENARIO." Journal of Circuits, Systems and Computers 16, no. 05 (2007): 819–46. http://dx.doi.org/10.1142/s0218126607003915.

Full text

Abstract:

The use of Application-Specific Instruction-set Processors (ASIP) in embedded systems is a solution to the problem of increasing complexity in the functions these systems have to implement. Architectures based on Very Long Instruction Word (VLIW) have found fertile ground in multimedia electronic appliances thanks to their ability to exploit high degrees of Instruction Level Parallelism (ILP) with a reasonable trade-off in complexity and silicon costs. In this case the ASIP specialization involves a complex interaction between hardware- and software-related issues. In this paper we propose tools and methodologies to cope efficiently with this complexity from a multi-objective perspective. We present EPIC-Explorer, an open platform for estimation and system-level exploration of an EPIC/VLIW architecture. We first analyze the possible design objectives, showing that it is necessary, given the fundamental role played by the VLIW compiler in instruction scheduling, to evaluate the appropriateness of ILP-oriented compilation on a case-by-case basis. Then, in the architecture exploration phase, we will use a multi-objective genetic approach to obtain a set of Pareto-optimal configurations. Finally, by clustering the configurations thus obtained, we extract those representing possible trade-offs between the objectives, which are used as a starting point for evaluation via more accurate estimation models at a subsequent stage in the design flow.

APA, Harvard, Vancouver, ISO, and other styles

7

Hou, Yumin, Xu Wang, Jiawei Fu, Junping Ma, Hu He, and Xu Yang. "Improving ILP via Fused In-Order Superscalar and VLIW Instruction Dispatch Methods." Journal of Circuits, Systems and Computers 28, no. 02 (2018): 1950020. http://dx.doi.org/10.1142/s0218126619500208.

Full text

Abstract:

In order to expand the computation capability of digital signal processing on a General Purpose Processor (GPP), we propose a fused microarchitecture that improves Instruction Level Parallelism (ILP) by supporting both in-order superscalar and very long instruction word (VLIW) dispatch methods in a single pipeline. This design is based on ARMv7-A&R Instruction Set Architecture (ISA). To provide a performance comparison, we first design an in-order superscalar processor, considering that ARM GPPs always adopt superscalar approaches. And then we expand VLIW dispatch method based on this processor, to realize the fused microarchitecture. The two designs are both evaluated on the Xilinx 7-series FPGA (XC7K325T-2FFG900C), using Xilinx Vivado design suite. The results show that, compared with the superscalar processor, the processor working under VLIW mode can improve the performance by 15% and 8%, respectively, when running EEMBC and DSPstone benchmarks. We also run the two benchmarks on ARM Cortex-A9 processor, which is integrated in the Zynq-7000 AP SoC device on Xilinx ZC706 evaluation board. The processor in VLIW mode shows 44% and 30% performance improvements than ARM Cortex-A9. The fused microarchitecture adopts a combined bimodal and PAp branch prediction method. This method achieves 93.7% prediction accuracy with limited hardware overhead.

APA, Harvard, Vancouver, ISO, and other styles

8

Srinivasan, V. Prasanna, and A. P. Shanthi. "A BBN-Based Framework for Design Space Pruning of Application Specific Instruction Processors." Journal of Circuits, Systems and Computers 25, no. 04 (2016): 1650028. http://dx.doi.org/10.1142/s0218126616500286.

Full text

Abstract:

During the synthesis phase of the embedded system design process, the designer has to take early decisions for selecting the optimal system components such as processors, memories, communication interfaces, etc. from the available huge design alternatives. In order to obtain the optimal design configurations from the available huge design alternatives, an efficient design space pruning technique that will ease the design space exploration (DSE) process is required. The knowledge about the target architectural parameters affecting the overall objectives of the system should be considered during the design, so that the search process for finding the optimal system configurations will be rapid and more efficient. The Bayesian belief network (BBN)-based modeling framework for design space pruning proposed in this paper attempts to resolve the existing limitation in imparting domain knowledge and provides a pioneering effort to support the designer during the process of application specific system design. The Xtensa customizable processor architecture from Tensilica and a very long instruction word (VLIW) processor architecture are considered as example target platforms to impart the domain knowledge for the proposed model. Case studies in support of the proposed model are presented in order to understand how BBN can be used for design space pruning by propagating the evidence and arriving at probabilistic inferences to ease the decision-making process. The results show that the design space reduces drastically from a few million design options available to just less than one hundred for Xtensa architecture and from a few billions of design options available to just few thousands for VLIW architecture. The work also validates the pruned design points for their optimality.

APA, Harvard, Vancouver, ISO, and other styles

9

Bekayev, Е., and А. Kaharman. "Features of analysis and selection of microprocessors in modern control systems." Q A Iasaýı atyndaǵy Halyqaralyq qazaq-túrіk ýnıversıtetіnіń habarlary (fızıka matematıka ınformatıka serııasy) 24, no. 1 (2023): 139–53. http://dx.doi.org/10.47526/2023-1/2524-0080.13.

Full text

Abstract:

The article discusses the results of a brief retrospective review of the current microprocessors in control systems and the main directions of their development. The main requirements and selection criteria presented at the stage of the microprocessor design process in modern control systems, classification and architectural features of microprocessors are described. In addition, the «pipeline» processing method for maximizing processor performance, features of the analysis and selection of microprocessors in modern control systems using various architectural solutions for the elimination of contradictions arising in it related to conflicts are studied. According to the «pipeline» principle, three main types of processor architecture (VLIW – Very Long Instruction Word) with superscalar and multiple computing devices working in parallel are defined - depending on data, management and structural conflicts. Therefore, their use in VLIW processors and the limitation of their application in the field of scientific research is justified, due to the fact that the source codes are transmitted in a «closed» form. The main advantage of superscalar microprocessors is the independence of program-executable codes from their structure and the possibility of their execution on any processor models. In addition, the features of the EPIC architecture are described, which combines the advantages of the two different architectural solutions considered.

APA, Harvard, Vancouver, ISO, and other styles

10

Ko, Yohan. "Survey of Software-Implemented Soft Error Protection." Electronics 11, no. 3 (2022): 456. http://dx.doi.org/10.3390/electronics11030456.

Full text

Abstract:

As soft errors are important design concerns in embedded systems, several schemes have been presented to protect embedded systems against them. Embedded systems can be protected by hardware redundancy; however, hardware-based protections cannot provide flexible protection due to hardware-only protection modifications. Further, they incur significant overheads in terms of area, performance, and power consumption. Therefore, hardware redundancy techniques are not appropriate for resource-constrained embedded systems. On the other hand, software-based protection techniques can be an attractive alternative to protect embedded systems, especially specific-purpose architectures. This manuscript categorizes and compares software-based redundancy techniques for general-purpose and specific-purpose processors, such as VLIW (Very Long Instruction Word) and CGRA (Coarse-Grained Reconfigurable Architectures).

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Very Long Instruction Word (VLIW) Processors"

1

Porpodas, Vasileios. "Instruction scheduling optimizations for energy efficient VLIW processors." Thesis, University of Edinburgh, 2013. http://hdl.handle.net/1842/8291.

Full text

Abstract:

Very Long Instruction Word (VLIW) processors are wide-issue statically scheduled processors. Instruction scheduling for these processors is performed by the compiler and is therefore a critical factor for its operation. Some VLIWs are clustered, a design that improves scalability to higher issue widths while improving energy efficiency and frequency. Their design is based on physically partitioning the shared hardware resources (e.g., register file). Such designs further increase the challenges of instruction scheduling since the compiler has the additional tasks of deciding on the placement of the instructions to the corresponding clusters and orchestrating the data movements across clusters. In this thesis we propose instruction scheduling optimizations for energy-efficient VLIW processors. Some of the techniques aim at improving the existing state-of-theart scheduling techniques, while others aim at using compiler techniques for closing the gap between lightweight hardware designs and more complex ones. Each of the proposed techniques target individual features of energy efficient VLIW architectures. Our first technique, called Aligned Scheduling, makes use of a novel scheduling heuristic for hiding memory latencies in lightweight VLIW processors without hardware load-use interlocks (Stall-On-Miss). With Aligned Scheduling, a software-only technique, a SOM processor coupled with non-blocking caches can better cope with the cache latencies and it can perform closer to the heavyweight designs. Performance is improved by up to 20% across a range of benchmarks from the Mediabench II and SPEC CINT2000 benchmark suites. The rest of the techniques target a class of VLIW processors known as clustered VLIWs, that are more scalable and more energy efficient and operate at higher frequencies than their monolithic counterparts. The second scheme (LUCAS) is an improved scheduler for clustered VLIW processors that solves the problem of the existing state-of-the-art schedulers being very susceptible to the inter-cluster communication latency. The proposed unified clustering and scheduling technique is a hybrid scheme that performs instruction by instruction switching between the two state-of-the-art clustering heuristics, leading to better scheduling than either of them. It generates better performing code compared to the state-of-the-art for a wide range of inter-cluster latency values on the Mediabench II benchmarks. The third technique (called CAeSaR) is a scheduler for clustered VLIW architectures that minimizes inter-cluster communication by local caching and reuse of already received data. Unlike dynamically scheduled processors, where this can be supported by the register renaming hardware, in VLIWs it has to be done by the code generator. The proposed instruction scheduler unifies cluster assignment, instruction scheduling and communication minimization in a single unified algorithm, solving the phase ordering issues between all three parts. The proposed scheduler shows an improvement in execution time of up to 20.3% and 13.8% on average across a range of benchmarks from the Mediabench II and SPEC CINT2000 benchmark suites. The last technique, applies to heterogeneous clustered VLIWs that support dynamic voltage and frequency scaling (DVFS) independently per cluster. In these processors there are no hardware interlocks between clusters to honor the data dependencies. Instead, the scheduler has to be aware of the DVFS decisions to guarantee correct execution. Effectively controlling DVFS, to selectively decrease the frequency of clusters with slack in their schedule, can lead to significant energy savings. The proposed technique (called UCIFF) solves the phase ordering problem between frequency selection and scheduling that is present in existing algorithms. The results show that UCIFF produces better code than the state-of-the-art and very close to the optimal across the Mediabench II benchmarks. Overall, the proposed instruction scheduling techniques lead to either better efficiency on existing designs or allow simpler lightweight designs to be competitive against ones with more complex hardware.

APA, Harvard, Vancouver, ISO, and other styles

2

Psiakis, Rafail. "Performance optimization mechanisms for fault-resilient VLIW processors." Thesis, Rennes 1, 2018. http://www.theses.fr/2018REN1S095/document.

Full text

Abstract:

Les processeurs intégrés dans des domaines critiques exigent une combinaison de fiabilité, de performances et de faible consommation d'énergie. Very Large Instruction Word (VLIW) processeurs améliorent les performances grâce à l'exploitation ILP (Instruction Level Parallelism), tout en maintenant les coûts et la puissance à un niveau bas. L’ILP étant fortement dépendant de l'application, le processeur n'utilise pas toutes ses ressources en permanence et ces ressources peuvent donc être utilisées pour l'exécution d'instructions redondantes. Cette thèse présente une méthodologie d’injection fautes pour processeurs VLIW et trois mécanismes matériels pour traiter les pannes légères, permanentes et à long terme menant à trois contributions.La première contribution présente un schéma d’analyse du facteur de vulnérabilité architecturale et du facteur de vulnérabilité d’instruction pour les processeurs VLIW. Une méthodologie d’injection de fautes au niveau de différentes structures de mémoire est proposée pour extraire les capacités de masquage architecture / instruction du processeur. Un schéma de classification des défaillances de haut niveau est présenté pour catégoriser la sortie du processeur. La deuxième contribution explore les ressources inactives hétérogènes au moment de l'exécution, à l'intérieur et à travers des ensembles d'instructions consécutifs. Pour ce faire, une technique d’ordonnancement des instructions optimisée pour le matériel est appliquée en parallèle avec le pipeline afin de contrôler efficacement la réplication et l’ordonnancement des instructions. Suivant les tendances à la parallélisation croissante, une conception basée sur les clusters est également proposée pour résoudre les problèmes d’évolutivité, tout en maintenant une pénalité surface/énergie raisonnable. La technique proposée accélère la performance de 43,68% avec une surcoût en surface et en énergie de ~10% par rapport aux approches existantes. Les analyses AVF et IVF évaluent la vulnérabilité du processeur avec le mécanisme proposé.La troisième contribution traite des défauts persistants. Un mécanisme matériel est proposé, qui réplique au moment de l'exécution les instructions et les planifie aux emplacements inactifs en tenant compte des contraintes de ressources. Si une ressource devient défaillante, l'approche proposée permet de relier efficacement les instructions d'origine et les instructions répliquées pendant l'exécution. Les premiers résultats de performance d’évaluation montrent un gain de performance jusqu’à 49% sur les techniques existantes.Afin de réduire davantage le surcoût lié aux performances et de prendre en charge l’atténuation des erreurs uniques et multiples sur les transitoires de longue durée (LDT), une quatrième contribution est présentée. Nous proposons un mécanisme matériel qui détecte les défauts toujours actifs pendant l'exécution et réorganise les instructions pour utiliser non seulement les unités fonctionnelles saines, mais également les composants sans défaillance des unités fonctionnelles concernées. Lorsque le défaut disparaît, les composants de l'unité fonctionnelle concernés peuvent être réutilisés. La fenêtre de planification du mécanisme proposé comprend deux ensembles d'instructions pouvant explorer des solutions d'atténuation lors de l'exécution de l'instruction en cours et de l'instruction suivante. Les résultats obtenus sur l'injection de fautes montrent que l'approche proposée peut atténuer un grand nombre de fautes avec des performances, une surface et une surcharge de puissance faibles<br>Embedded processors in critical domains require a combination of reliability, performance and low energy consumption. Very Long Instruction Word (VLIW) processors provide performance improvements through Instruction Level Parallelism (ILP) exploitation, while keeping cost and power in low levels. Since the ILP is highly application dependent, the processor does not use all its resources constantly and, thus, these resources can be utilized for redundant instruction execution. This thesis presents a fault injection methodology for VLIW processors and three hardware mechanisms to deal with soft, permanent and long-term faults leading to three contributions. The first contribution presents an Architectural Vulnerability Factor (AVF) and Instruction Vulnerability Factor (IVF) analysis schema for VLIW processors. A fault injection methodology at different memory structures is proposed to extract the architectural/instruction masking capabilities of the processor. A high-level failure classification schema is presented to categorize the output of the processor. The second contribution explores heterogeneous idle resources at run-time both inside and across consecutive instruction bundles. To achieve this, a hardware optimized instruction scheduling technique is applied in parallel with the pipeline to efficiently control the replication and the scheduling of the instructions. Following the trends of increasing parallelization, a cluster-based design is also proposed to tackle the issues of scalability, while maintaining a reasonable area/power overhead. The proposed technique achieves a speed-up of 43.68% in performance with a ~10% area and power overhead over existing approaches. AVF and IVF analysis evaluate the vulnerability of the processor with the proposed mechanism.The third contribution deals with persistent faults. A hardware mechanism is proposed which replicates at run-time the instructions and schedules them at the idle slots considering the resource constraints. If a resource becomes faulty, the proposed approach efficiently rebinds both the original and replicated instructions during execution. Early evaluation performance results show up to 49\% performance gain over existing techniques.In order to further decrease the performance overhead and to support single and multiple Long-Duration Transient (LDT) error mitigation a fourth contribution is presented. We propose a hardware mechanism, which detects the faults that are still active during execution and re-schedules the instructions to use not only the healthy function units, but also the fault-free components of the affected function units. When the fault faints, the affected function unit components can be reused. The scheduling window of the proposed mechanism is two instruction bundles being able to explore mitigation solutions in the current and the next instruction execution. The obtained fault injection results show that the proposed approach can mitigate a large number of faults with low performance, area, and power overhead

APA, Harvard, Vancouver, ISO, and other styles

3

Tergino, Christian Sean. "Efficient Binary Field Multiplication on a VLIW DSP." Thesis, Virginia Tech, 2009. http://hdl.handle.net/10919/33693.

Full text

Abstract:

Modern public-key cryptography relies extensively on modular multiplication with long operands. We investigate the opportunities to optimize this operation in a heterogeneous multiprocessing platform such as TI OMAP3530. By migrating the long operand modular multiplication from a general-purpose ARM Cortex A8 to a specialized C64x+ VLIW DSP, we are able to exploit the XOR-Multiply instruction and the inherent parallelism of the DSP. The proposed multiplication utilizes Multi-Precision Binary Polynomial Multiplication with Unbalanced Exponent Modular Reduction. The resulting DSP implementation performs a GF(2^233) multiplication in less than 1.31us, which is over a seven times speed up when compared with the ARM implementation on the same chip. We present several strategies for different field sizes and field polynomials, and show that a 360MHz DSP easily outperforms the 500MHz ARM.<br>Master of Science

APA, Harvard, Vancouver, ISO, and other styles

4

De, Souza Alberto Ferreira. "Integer performance evaluation of the dynamically trace scheduled VLIW." Thesis, University College London (University of London), 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.322044.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Stevens, David. "On the automated compilation of UML notation to a VLIW chip multiprocessor." Thesis, Loughborough University, 2013. https://dspace.lboro.ac.uk/2134/13746.

Full text

Abstract:

With the availability of more and more cores within architectures the process of extracting implicit and explicit parallelism in applications to fully utilise these cores is becoming complex. Implicit parallelism extraction is performed through the inclusion of intelligent software and hardware sections of tool chains although these reach their theoretical limit rather quickly. Due to this the concept of a method of allowing explicit parallelism to be performed as fast a possible has been investigated. This method enables application developers to perform creation and synchronisation of parallel sections of an application at a finer-grained level than previously possible, resulting in smaller sections of code being executed in parallel while still reducing overall execution time. Alongside explicit parallelism, a concept of high level design of applications destined for multicore systems was also investigated. As systems are getting larger it is becoming more difficult to design and track the full life-cycle of development. One method used to ease this process is to use a graphical design process to visualise the high level designs of such systems. One drawback in graphical design is the explicit nature in which systems are required to be generated, this was investigated, and using concepts already in use in text based programming languages, the generation of platform-independent models which are able to be specialised to multiple hardware architectures was developed. The explicit parallelism was performed using hardware elements to perform thread management, this resulted in speed ups of over 13 times when compared to threading libraries executed in software on commercially available processors. This allowed applications with large data dependent sections to be parallelised in small sections within the code resulting in a decrease of overall execution time. The modelling concepts resulted in the saving of between 40-50% of the time and effort required to generate platform-specific models while only incurring an overhead of up to 15% the execution cycles of these models designed for specific architectures.

APA, Harvard, Vancouver, ISO, and other styles

6

Valiukas, Tadas. "Kompiliatorių optimizavimas IA-64 architektūroje." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2014. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2009~D_20140701_180746-19336.

Full text

Abstract:

Tradicinės x86 architektūros spartinimui artėjant prie galimybių ribos, kompanija Intel pradėjo kurti naują IA-64 architektūrą, paremtą EPIC – išreikštinai lygiagrečiai vykdomomis instrukcijomis vieno takto metu. Ši pagrindinė savybė leidžia vykdyti iki šešių instrukcijų per vieną taktą. Taipogi architektūra pasižymi tokiomis savybėmis, kurios leido efektyviai spręsti su kodo optimizavimu susijusias problemas tradicinėse architektūrose. Tačiau kompiliatorių optimizavimo algoritmai ilgą laiką buvo tobulinami tradicinėse architektūrose, todėl norint išnaudoti naująją architektūrą, reikia ieškoti būdų tobulinti esamus kompiliatorius. Vienas iš būdų – kompiliatoriaus vidinių parametrų atsakingų už optimizacijas reikšmių pritaikymas IA-64. Būtent toks yra šio darbo tikslas, kuriam pasiekti reikia išnagrinėti IA-64 savybes, jas vėliau eksperimentiškai taikyti realaus kodo pavyzdžiuose bei įvertinti jų įtaką kodo vykdymo spartai. Pagal gautus rezultatus nagrinėjami kompiliatoriaus vidiniai parametrai ir su specialia kompiliatorių testavimo programa randamas geriausias reikšmių rinkinys šiai architektūrai. Vėliau šis rinkinys išbandomas su taikomosiomis programomis. Gauto parametrų rinkinio reikšmės turėtų leisti generuoti efektyvesnį kodą IA-64 architektūrai.<br>After performance optimization of traditional architectures began to reach their limits, Intel corporation started to develop new architecture based on EPIC – Explicitly Parallel Instruction Counting. This main feature allowed up to six instructions to be executed in single CPU cycle. Also this architecture includes more features, which allowed efficient solution of traditional architectures code optimization problems. However for long time code optimization algorithms have been improved for traditional architectures only, as a result those algorithms should be adopted to new architecture. One of the ways to do that – exploration of internal compilers parameters, which are responsible for code optimizations. That is the primary target of this work and in order to reach it the features of the IA-64 architecture and impact to execution performance must be explored using real-life code examples. Tests results may be used later for internal parameters selection and further exploration of these parameters values by using special compiler performance testing benchmarks. The set of those new values could be tested with real life applications in order to prove efficiency of IA-64 architecture features.

APA, Harvard, Vancouver, ISO, and other styles

7

Nagpal, Rahul. "Compiler-Assisted Energy Optimization For Clustered VLIW Processors." Thesis, 2008. http://hdl.handle.net/2005/684.

Full text

Abstract:

Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving clock speed, reducing energy consumption of the logic, and making the design simpler, it introduces extra overheads by way of inter-cluster communication. This communication happens over long wires having high load capacitance which leads to delay in execution and significantly high energy consumption. Inter-cluster communication also introduces many short idle cycles, therby significantly increasing the overall leakage energy consumption in the functional units. The trend towards miniatrurization of devices (and associated reduction in threshold voltage) makes energy consumption in interconnects and functional units even worse and limits the usability of clustered architectures in smaller technologies. In the past, study of leakage energy management at the architectural level has mostly focused on storage structures such as cache. Relatively, little work has been done on architecture level leakage energy management in functional units in the context of superscalar processors and energy efficient scheduling in the context of VLIW architectures. In the absence of any high level model for interconnect energy estimation, the primary focus of research in the context of interconnects has been to reduce the latency of communication and evaluation of various inter-cluster communication models. To the best of our knowledge, there has been no such work in the past from the point of view of enegy efficiency targeting clustered VLIW architectures specifically focusing on smaller technologies. Technological advancements now permit design of interconnects and functional units With varying performance and power modes. In thesis we people scheduling algorithms that aggregate the scheduling slack of instructions and communication slack of data values to exploit the low power modes of interconnects and functional units . We also propose a high level model for estimation of interconnect delay and energy (in contrast to low-level circuit level model proposed earlier) that makes it possible to carry out architectural and compiler optimizations specifically targeting the inter connect, Finally we present synergistic combination of these algorithms that simultaneously saves energy in functional units and interconnects to improve the usability of clustered architectures by archiving better overall energy-performance trade-offs. Our compiler assisted leakage energy management scheme for functional units reduces the energy consumption of functional units approximately by 15% and 17% in the context of a 2-clustered and a 4-clustered VLIW architecture respectively with negligible performance degradation over and above that offered by a hardware-only scheme. The interconnect energy optimization scheme improves the energy consumption of interconnects on an average by 41% and 46% for a 2-clustered and a 4-clustered machine respectively with 2% and 1.5% performance degradation. The combined scheme options slightly better energy benefit in functional units and 37% and 43% energy benefit in interconnect with slightly higher performance degradation. Even with the conservative estimates of contribution of functional unit interconnect to overall processor energy consumption the proposed combined scheme obtains on an average 8% and 10% improvement in overall energy delay product with 3.5% and 2% performance degradation for a 2-clustered and a 4-clustered machine respectively. We present a detailed experimental evaluation of the proposed schemes using the Trimaran compiler infrastructure.

APA, Harvard, Vancouver, ISO, and other styles

8

Valluri, Madhavi Gopal. "Evaluation Of Register Allocation And Instruction Scheduling Methods In Multiple Issue Processors." Thesis, 1999. http://etd.iisc.ernet.in/handle/2005/1532.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Costa, Henrique Miguel Basto da. "Desenvolvimento de um processador VLIW." Master's thesis, 2013. http://hdl.handle.net/1822/40055.

Full text

Abstract:

Dissertação de mestrado integrado em Engenharia Eletrónica Industrial e Computadores<br>A arquitetura very long instruction word (VLIW) consiste numa implementação da técnica de aumento de performance instruction-level parallelism (ILP) e destaca-se das demais por efetuar esse paralelismo recorrendo à utilização de múltiplas unidades funcionais em paralelo. No VLIW, tanto a deteção de existência de paralelismo como a resolução de conflitos nas instruções é efetuada em compiling time, reduzindo significativamente a complexidade do hardware, o que resulta num menor custo de implementação e consumo inferior. Existem no entanto alguns obstáculos à afirmação desta arquitetura, como por exemplo a compatibilidade binária com o software legacy. Nesta dissertação pretende-se desenvolver um processador VLIW, pois, devido ao seu alto throughput, e baixo consumo, os processadores VLIW enquadram-se nos requisitos dos sistemas embebidos. O processador implementado deve servir-se da cache como meio de acesso à memória principal. Será também desenvolvido um Assembler dedicado ao processador implementado por forma a gerar código máquina compatível e com o intuito de permitir que futuras alterações na microarquitetura possam ser acompanhadas de alteração na geração de código máquina. Foi feito o estudo de alguns Instruction Set Arquitectures (ISAs) e de microarquitecturas VLIW existentes, de forma a implementar um processador VLIW softcore de acordo com o state-of-the-art numa plataforma Xilinx FPGA.<br>A very long instruction word architecture (VLIW) is an implementation of the technique to increase performance instruction- level parallelism (ILP), and stands out from the others for making this parallelism through the use of multiple functional units in parallel. In VLIW, the detection of parallelism and conflict resolution in the instructions is done on compiling time, significantly reducing the complexity of the hardware, resulting in a lower cost of implementation and less consumption. However, there are some barriers to the affirmation of this architecture, such as the binary compatibility with legacy software. This thesis aims to develop a VLIW processor, because, thanks to its high throughput and low-power, VLIW processors fit the requirements of embedded systems. The implemented processor should use a cache memory for access to main memory. An assembler dedicated to the processor implemented will also be in order to generate machine code compatible and in order to allow future changes in the microarchitecture may be accompanied by changes in the generation of machine code. Study was conducted on some existing VLIW Instruction Set Architectures (ISAs) and microarchitectures in order to implement a soft-core VLIW processor according to the state-of-the-art in a Xilinx FPGA platform.

APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Very Long Instruction Word (VLIW) Processors"

1

Microprogramming a Writeable Control Memory using Very Long Instruction Word (VLIW) Compilation Techniques. Storming Media, 1997.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Very Long Instruction Word (VLIW) Processors"

1

Rajagopalan, Subramanian, and Sharad Malik. "A Retargetable Very Long Instruction Word Compiler Framework for Digital Signal Processors." In The Compiler Design Handbook. CRC Press, 2007. http://dx.doi.org/10.1201/9781420043839.ch18.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

"Retargetable Very Long Instruction Word Compiler Framework for Digital Signal Processors Subramanian Rajagopalan and Sharad Malik." In The Compiler Design Handbook. CRC Press, 2002. http://dx.doi.org/10.1201/9781420040579-20.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Yviquel, Hervé, Emmanuel Casseau, Matthieu Wipliez, Jérôme Gorin, and Mickaël Raulet. "Classification-Based Optimization of Dynamic Dataflow Programs." In Advances in Systems Analysis, Software Engineering, and High Performance Computing. IGI Global, 2014. http://dx.doi.org/10.4018/978-1-4666-6034-2.ch012.

Full text

Abstract:

This chapter reviews dataflow programming as a whole and presents a classification-based methodology to bridge the gap between predictable and dynamic dataflow modeling in order to achieve expressiveness of the programming language as well as efficiency of the implementation. The authors conduct experiments across three MPEG video decoders including one based on the new High Efficiency Video Coding standard. Those dataflow-based video decoders are executed onto two different platforms: a desktop processor and an embedded platform composed of interconnected and tiny Very Long Instruction Word-style processors. The authors show that the fully automated transformations presented can result in a 80% gain in speed compared to runtime scheduling in the more favorable case.

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Very Long Instruction Word (VLIW) Processors"

1

Fryza, Tomas, and Roman Mego. "Instruction-level programming approach for very long instruction word digital signal processors." In 2017 24th IEEE International Conference on Electronics, Circuits and Systems (ICECS). IEEE, 2017. http://dx.doi.org/10.1109/icecs.2017.8292060.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!