To see the other types of publications on this topic, follow the link: Very Long Instruction Word (VLIW) Processors.

Journal articles on the topic 'Very Long Instruction Word (VLIW) Processors'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 25 journal articles for your research on the topic 'Very Long Instruction Word (VLIW) Processors.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Mego, Roman, and Tomas Fryza. "Instruction mapping techniques for processors with very long instruction word architectures." Journal of Electrical Engineering 73, no. 6 (December 1, 2022): 387–95. http://dx.doi.org/10.2478/jee-2022-0053.

Full text
Abstract:
Abstract This paper presents an instruction mapping technique for generating a low-level assembly code for digital signal processing algorithms. This technique helps developers to implement retargetable kernel functions with the performance benefits of the low-level assembly languages. The approach is aimed at exceptionally long instruction word (VLIW) architectures, which benefits the most from the proposed method. Mapped algorithms are described by the signal-flow graphs, which are used to find possible parallel operations. The algorithm is converted into low-level code and mapped to the target architecture. This process also introduces the optimization of instruction mapping priority, which leads to the more effective code. The technique was verified on selected kernels, compared to the common programming methods, and proved that it is suitable for VLIW architectures and for portability to other systems.
APA, Harvard, Vancouver, ISO, and other styles
2

CHEN, YUNG-YUAN. "INCORPORATING FAULT-TOLERANT FEATURES IN VLIW PROCESSORS." International Journal of Reliability, Quality and Safety Engineering 12, no. 05 (October 2005): 397–411. http://dx.doi.org/10.1142/s0218539305001914.

Full text
Abstract:
In recent years, very long instruction word (VLIW) processor has attracted much attention in that it offers a high instruction level parallelism and reduces the hardware design complexity. In this paper, we present two fault-tolerant schemes for VLIW processors. The first one is termed as test-instruction scheme which is based on the concept of instruction duplication to detect the errors. The process of test-instruction scheme consists of the error detection, error rollback recovery and reconfiguration. The second approach is called self-checking scheme which adopts the concept of self-checking logic to detect the errors. A real-time error recovery procedure is developed to conquer the errors. We implement the proposed designs of fault-tolerant VLIW processor in VHDL and employ the fault injection and fault simulation to validate our schemes. The main contribution of this research is to present the complete frameworks from error detection to error recovery for fault-tolerant design of VLIW processors. Experience learned from this investigation is that the issues of error detection and error recovery entail considering together. Without taking both issues into account simultaneously, the outcomes may lead to the improper conclusions.
APA, Harvard, Vancouver, ISO, and other styles
3

Zhang, Haifeng, Xiaoti Wu, Yuyu Du, Hongqing Guo, Chuxi Li, Yidong Yuan, Meng Zhang, and Shengbing Zhang. "A Heterogeneous RISC-V Processor for Efficient DNN Application in Smart Sensing System." Sensors 21, no. 19 (September 28, 2021): 6491. http://dx.doi.org/10.3390/s21196491.

Full text
Abstract:
Extracting features from sensing data on edge devices is a challenging application for which deep neural networks (DNN) have shown promising results. Unfortunately, the general micro-controller-class processors which are widely used in sensing system fail to achieve real-time inference. Accelerating the compute-intensive DNN inference is, therefore, of utmost importance. As the physical limitation of sensing devices, the design of processor needs to meet the balanced performance metrics, including low power consumption, low latency, and flexible configuration. In this paper, we proposed a lightweight pipeline integrated deep learning architecture, which is compatible with open-source RISC-V instructions. The dataflow of DNN is organized by the very long instruction word (VLIW) pipeline. It combines with the proposed special intelligent enhanced instructions and the single instruction multiple data (SIMD) parallel processing unit. Experimental results show that total power consumption is about 411 mw and the power efficiency is about 320.7 GOPS/W.
APA, Harvard, Vancouver, ISO, and other styles
4

Li, Hong Yan. "Research on Cipher Coprocessor Instruction Level Parallelism Compiler." Applied Mechanics and Materials 130-134 (October 2011): 2907–10. http://dx.doi.org/10.4028/www.scientific.net/amm.130-134.2907.

Full text
Abstract:
The important method of studying cipher coprocessor is focus on system architecture of processor in combination with reconfigurable design technique. How to improve performance of cipher coprocessor is important. Based on very long instruction word (VLIW) structure and reconfigurable design technique, specific instruction cipher coprocessor is designed. In this paper, the cipher coprocessor instruction level parallelism compilation technique is studied to enhance the cipher coprocessor performance by increasing the instruction level parallelism.
APA, Harvard, Vancouver, ISO, and other styles
5

Romanova, Tatiana Nikolaevna, and Dmitry Igorevich Gorin. "CODE OPTIMIZATION METHOD FOR QUALCOMM HEXAGON PROCESSOR, SUPPORTING INSTRUCTION LEVEL PARALLELISM AND BUILT WITH VLIW (Very Long Instruction Word) ARCHITECTURE." ITNOU: Information technologies in science, education and management 115 (2021): 105–15. http://dx.doi.org/10.47501/itnou.2021.1.105-115.

Full text
Abstract:
A method for optimizing the filling of a machine word with independent instructions is proposed, which allows to increase the performance of programs by stacking the maximum number of independent commands in a package. The paper also confirms the hypothesis that with the transition to random register allocation by the compiler, the packet density will increase, which will result in a decrease in the program's running time.
APA, Harvard, Vancouver, ISO, and other styles
6

CATANIA, VINCENZO, MAURIZIO PALESI, and DAVIDE PATTI. "ANALYSIS AND TOOLS FOR THE DESIGN OF VLIW EMBEDDED SYSTEMS IN A MULTI-OBJECTIVE SCENARIO." Journal of Circuits, Systems and Computers 16, no. 05 (October 2007): 819–46. http://dx.doi.org/10.1142/s0218126607003915.

Full text
Abstract:
The use of Application-Specific Instruction-set Processors (ASIP) in embedded systems is a solution to the problem of increasing complexity in the functions these systems have to implement. Architectures based on Very Long Instruction Word (VLIW) have found fertile ground in multimedia electronic appliances thanks to their ability to exploit high degrees of Instruction Level Parallelism (ILP) with a reasonable trade-off in complexity and silicon costs. In this case the ASIP specialization involves a complex interaction between hardware- and software-related issues. In this paper we propose tools and methodologies to cope efficiently with this complexity from a multi-objective perspective. We present EPIC-Explorer, an open platform for estimation and system-level exploration of an EPIC/VLIW architecture. We first analyze the possible design objectives, showing that it is necessary, given the fundamental role played by the VLIW compiler in instruction scheduling, to evaluate the appropriateness of ILP-oriented compilation on a case-by-case basis. Then, in the architecture exploration phase, we will use a multi-objective genetic approach to obtain a set of Pareto-optimal configurations. Finally, by clustering the configurations thus obtained, we extract those representing possible trade-offs between the objectives, which are used as a starting point for evaluation via more accurate estimation models at a subsequent stage in the design flow.
APA, Harvard, Vancouver, ISO, and other styles
7

Hou, Yumin, Xu Wang, Jiawei Fu, Junping Ma, Hu He, and Xu Yang. "Improving ILP via Fused In-Order Superscalar and VLIW Instruction Dispatch Methods." Journal of Circuits, Systems and Computers 28, no. 02 (November 12, 2018): 1950020. http://dx.doi.org/10.1142/s0218126619500208.

Full text
Abstract:
In order to expand the computation capability of digital signal processing on a General Purpose Processor (GPP), we propose a fused microarchitecture that improves Instruction Level Parallelism (ILP) by supporting both in-order superscalar and very long instruction word (VLIW) dispatch methods in a single pipeline. This design is based on ARMv7-A&R Instruction Set Architecture (ISA). To provide a performance comparison, we first design an in-order superscalar processor, considering that ARM GPPs always adopt superscalar approaches. And then we expand VLIW dispatch method based on this processor, to realize the fused microarchitecture. The two designs are both evaluated on the Xilinx 7-series FPGA (XC7K325T-2FFG900C), using Xilinx Vivado design suite. The results show that, compared with the superscalar processor, the processor working under VLIW mode can improve the performance by 15% and 8%, respectively, when running EEMBC and DSPstone benchmarks. We also run the two benchmarks on ARM Cortex-A9 processor, which is integrated in the Zynq-7000 AP SoC device on Xilinx ZC706 evaluation board. The processor in VLIW mode shows 44% and 30% performance improvements than ARM Cortex-A9. The fused microarchitecture adopts a combined bimodal and PAp branch prediction method. This method achieves 93.7% prediction accuracy with limited hardware overhead.
APA, Harvard, Vancouver, ISO, and other styles
8

Srinivasan, V. Prasanna, and A. P. Shanthi. "A BBN-Based Framework for Design Space Pruning of Application Specific Instruction Processors." Journal of Circuits, Systems and Computers 25, no. 04 (February 2, 2016): 1650028. http://dx.doi.org/10.1142/s0218126616500286.

Full text
Abstract:
During the synthesis phase of the embedded system design process, the designer has to take early decisions for selecting the optimal system components such as processors, memories, communication interfaces, etc. from the available huge design alternatives. In order to obtain the optimal design configurations from the available huge design alternatives, an efficient design space pruning technique that will ease the design space exploration (DSE) process is required. The knowledge about the target architectural parameters affecting the overall objectives of the system should be considered during the design, so that the search process for finding the optimal system configurations will be rapid and more efficient. The Bayesian belief network (BBN)-based modeling framework for design space pruning proposed in this paper attempts to resolve the existing limitation in imparting domain knowledge and provides a pioneering effort to support the designer during the process of application specific system design. The Xtensa customizable processor architecture from Tensilica and a very long instruction word (VLIW) processor architecture are considered as example target platforms to impart the domain knowledge for the proposed model. Case studies in support of the proposed model are presented in order to understand how BBN can be used for design space pruning by propagating the evidence and arriving at probabilistic inferences to ease the decision-making process. The results show that the design space reduces drastically from a few million design options available to just less than one hundred for Xtensa architecture and from a few billions of design options available to just few thousands for VLIW architecture. The work also validates the pruned design points for their optimality.
APA, Harvard, Vancouver, ISO, and other styles
9

Bekayev, Е., and А. Kaharman. "Features of analysis and selection of microprocessors in modern control systems." Q A Iasaýı atyndaǵy Halyqaralyq qazaq-túrіk ýnıversıtetіnіń habarlary (fızıka matematıka ınformatıka serııasy) 24, no. 1 (March 30, 2023): 139–53. http://dx.doi.org/10.47526/2023-1/2524-0080.13.

Full text
Abstract:
The article discusses the results of a brief retrospective review of the current microprocessors in control systems and the main directions of their development. The main requirements and selection criteria presented at the stage of the microprocessor design process in modern control systems, classification and architectural features of microprocessors are described. In addition, the «pipeline» processing method for maximizing processor performance, features of the analysis and selection of microprocessors in modern control systems using various architectural solutions for the elimination of contradictions arising in it related to conflicts are studied. According to the «pipeline» principle, three main types of processor architecture (VLIW – Very Long Instruction Word) with superscalar and multiple computing devices working in parallel are defined - depending on data, management and structural conflicts. Therefore, their use in VLIW processors and the limitation of their application in the field of scientific research is justified, due to the fact that the source codes are transmitted in a «closed» form. The main advantage of superscalar microprocessors is the independence of program-executable codes from their structure and the possibility of their execution on any processor models. In addition, the features of the EPIC architecture are described, which combines the advantages of the two different architectural solutions considered.
APA, Harvard, Vancouver, ISO, and other styles
10

Ko, Yohan. "Survey of Software-Implemented Soft Error Protection." Electronics 11, no. 3 (February 3, 2022): 456. http://dx.doi.org/10.3390/electronics11030456.

Full text
Abstract:
As soft errors are important design concerns in embedded systems, several schemes have been presented to protect embedded systems against them. Embedded systems can be protected by hardware redundancy; however, hardware-based protections cannot provide flexible protection due to hardware-only protection modifications. Further, they incur significant overheads in terms of area, performance, and power consumption. Therefore, hardware redundancy techniques are not appropriate for resource-constrained embedded systems. On the other hand, software-based protection techniques can be an attractive alternative to protect embedded systems, especially specific-purpose architectures. This manuscript categorizes and compares software-based redundancy techniques for general-purpose and specific-purpose processors, such as VLIW (Very Long Instruction Word) and CGRA (Coarse-Grained Reconfigurable Architectures).
APA, Harvard, Vancouver, ISO, and other styles
11

Hou, Yumin, Hu He, Xu Yang, Deyuan Guo, Xu Wang, Jiawei Fu, and Keni Qiu. "FuMicro: A Fused Microarchitecture Design Integrating In-Order Superscalar and VLIW." VLSI Design 2016 (December 15, 2016): 1–12. http://dx.doi.org/10.1155/2016/8787919.

Full text
Abstract:
This paper proposes FuMicro, a fused microarchitecture integrating both in-order superscalar and Very Long Instruction Word (VLIW) in a single core. A processor with FuMicro microarchitecture can work under alternative in-order superscalar and VLIW mode, using the same pipeline and the same Instruction Set Architecture (ISA). Small modification to the compiler is made to expand the register file in VLIW mode. The decision of mode switch is made by software, and this does not need extra hardware. VLIW code can be exploited in the form of library function and the users will be exposed under only superscalar mode; by this means, we can provide the users with a convenient development environment. FuMicro could serve as a universal microarchitecture for it can be applied to different ISAs. In this paper, we focus on the implementation of FuMicro with ARM ISA. This architecture is evaluated on gem5, which is a cycle accurate microarchitecture simulation platform. By adopting FuMicro microarchitecture, the performance can be improved on an average of 10%, with the best performance improvement being 47.3%, compared with that under pure in-order superscalar mode. The result shows that FuMicro microarchitecture can improve Instruction Level Parallelism (ILP) significantly, making it promising to expand digital signal processing capability on a General Purpose Processor.
APA, Harvard, Vancouver, ISO, and other styles
12

Schneider, M., H. Blume, and T. G. Noll. "Power estimation on functional level for programmable processors." Advances in Radio Science 2 (May 27, 2005): 215–19. http://dx.doi.org/10.5194/ars-2-215-2004.

Full text
Abstract:
Abstract. In diesem Beitrag werden verschiedene Ansätze zur Verlustleistungsschätzung von programmierbaren Prozessoren vorgestellt und bezüglich ihrer Übertragbarkeit auf moderne Prozessor-Architekturen wie beispielsweise Very Long Instruction Word (VLIW)-Architekturen bewertet. Besonderes Augenmerk liegt hierbei auf dem Konzept der sogenannten Functional-Level Power Analysis (FLPA). Dieser Ansatz basiert auf der Einteilung der Prozessor-Architektur in funktionale Blöcke wie beispielsweise Processing-Unit, Clock-Netzwerk, interner Speicher und andere. Die Verlustleistungsaufnahme dieser Bl¨ocke wird parameterabhängig durch arithmetische Modellfunktionen beschrieben. Durch automatisierte Analyse von Assemblercodes des zu schätzenden Systems mittels eines Parsers können die Eingangsparameter wie beispielsweise der erzielte Parallelitätsgrad oder die Art des Speicherzugriffs gewonnen werden. Dieser Ansatz wird am Beispiel zweier moderner digitaler Signalprozessoren durch eine Vielzahl von Basis-Algorithmen der digitalen Signalverarbeitung evaluiert. Die ermittelten Schätzwerte für die einzelnen Algorithmen werden dabei mit physikalisch gemessenen Werten verglichen. Es ergibt sich ein sehr kleiner maximaler Schätzfehler von 3%. In this contribution different approaches for power estimation for programmable processors are presented and evaluated concerning their capability to be applied to modern digital signal processor architectures like e.g. Very Long InstructionWord (VLIW) -architectures. Special emphasis will be laid on the concept of so-called Functional-Level Power Analysis (FLPA). This approach is based on the separation of the processor architecture into functional blocks like e.g. processing unit, clock network, internal memory and others. The power consumption of these blocks is described by parameter dependent arithmetic model functions. By application of a parser based automized analysis of assembler codes of the systems to be estimated the input parameters of the Correspondence to: H. Blume (blume@eecs.rwth-aachen.de) arithmetic functions like e.g. the achieved degree of parallelism or the kind and number of memory accesses can be computed. This approach is exemplarily demonstrated and evaluated applying two modern digital signal processors and a variety of basic algorithms of digital signal processing. The resulting estimation values for the inspected algorithms are compared to physically measured values. A resulting maximum estimation error of 3% is achieved.
APA, Harvard, Vancouver, ISO, and other styles
13

Li, Lin, Shengbing Zhang, and Juan Wu. "Design of Deep Learning VLIW Processor for Image Recognition." Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University 38, no. 1 (February 2020): 216–24. http://dx.doi.org/10.1051/jnwpu/20203810216.

Full text
Abstract:
In order to adapt the application demands of high resolution images recognition and efficient processing of localization in aviation and aerospace fields, and to solve the problem of insufficient parallelism in existing researches, an extensible multiprocessor cluster deep learning processor architecture based on VLIW is designed by optimizing the computation of each layer of deep convolutional neural network model. Parallel processing of feature maps and neurons, instruction level parallelism based on very long instruction word (VLIW), data level parallelism of multiprocessor clusters and pipeline technologies are adopted in the design. The test results based on FPGA prototype system show that the processor can effectively complete the image classification and object detection applications. The peak performance of processor is up to 128 GOP/s when it operates at 200 MHz. For selecting benchmarks, the processor speed is about 12X faster than CPU and 7X faster than GPU at least. Comparing with the results of the software framework, the average error of the test accuracy of the processor is less than 1%.
APA, Harvard, Vancouver, ISO, and other styles
14

Basoglu, Chris, Yongmin Kim, and Vikram Chalana. "A Real-Time Scan Conversion Algorithm on Commercially Available Microprocessors." Ultrasonic Imaging 18, no. 4 (October 1996): 241–60. http://dx.doi.org/10.1177/016173469601800402.

Full text
Abstract:
We have developed a new ultrasound scan conversion algorithm that can be executed very efficiently on modern microprocessors. Our algorithm is designed to handle the address calculations and input and output (I/O) data loading concurrently with the interpolation. The processing unit's computing power can be dedicated to performing pixel interpolations while the other operations are handled by an independent direct memory access (DMA) controller. By making intelligent use of the I/O transfer capabilities of the DMA controller, the algorithm avoids spending the processing unit's valuable computing cycles in address calculations and nonactive pixel blanking. Furthermore, the new approach speeds up the computation by utilizing the ability of superscalar and very long instruction word (VLIW) processors to perform multiple operations in parallel. Our scan conversion algorithm was implemented on a multimedia and imaging system based on the Texas Instruments TMS320C80 Multimedia Video Processor (MVP). Computing cycles are spent only on predeterminable nonzero output pixels. For example, an execution time of 11.4 ms was achieved when there are 101,829 nonzero output pixels. This algorithm demonstrates a substantial improvement over previous scan conversion algorithms, and its optimized implementation enables modern commercially available programmable processors to support scan conversion at video rates.
APA, Harvard, Vancouver, ISO, and other styles
15

Dong, Jing Chuan, Tai Yong Wang, Bo Li, Xian Wang, and Zhe Liu. "Design and Implementation of an Interpolation Processor for CNC Machining." Advanced Materials Research 819 (September 2013): 322–27. http://dx.doi.org/10.4028/www.scientific.net/amr.819.322.

Full text
Abstract:
As the demand for high speed and high precision machining increases, the fast and accurate real-time interpolation is necessary in modern computerized numerical control (CNC) systems. However, the complexity of the interpolation algorithm is an obstacle for the embedded processor to achieve high performance control. In this paper, a novel interpolation processor is designed to accelerate the real-time interpolation algorithm. The processor features an advanced parallel architecture, including a 3-stage instruction pipeline, very long instruction word (VLIW) support, and asynchronous instruction execution mechanism. The architecture is aimed for accelerating the computing-intensive tasks in CNC systems. A prototype platform was built using a low-cost field programmable gate array (FPGA) chip to implementation the processor. Experimental result has verified the design and showed the good computing performance of the proposed architecture.
APA, Harvard, Vancouver, ISO, and other styles
16

Balakrishnan, S. "Very long instruction word processors." Resonance 6, no. 12 (December 2001): 61–68. http://dx.doi.org/10.1007/bf02913768.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Ko, Yohan, Soohwan Kim, Hyunchoong Kim, and Kyoungwoo Lee. "Selective Code Duplication for Soft Error Protection on VLIW Architectures." Electronics 10, no. 15 (July 30, 2021): 1835. http://dx.doi.org/10.3390/electronics10151835.

Full text
Abstract:
Very Long Instruction Word, or VLIW, architectures have received much attention in specific-purpose applications such as scientific computation, digital signal processing, and even safety-critical systems. Several compilation techniques for VLIW architectures have been proposed in order to improve the performance, but there is a lack of research to improve reliability against soft errors. Instruction duplication techniques have been proposed by exploiting unused instruction slots (i.e., NOPs) in VLIW architectures. All the instructions cannot be replicated without additional code lines. Additional code lines are required to increase the number of duplicated instructions in VLIW architectures. Our experimental results show that 52% performance overhead as compared to unprotected source code when we duplicate all the instructions. This considerable performance overhead can be inapplicable for resource-constrained embedded systems so that we can limit the number of additional NOP instructions for selective protection. However, the previous static scheme duplicates instructions just in sequential order. In this work, we propose packing-oriented duplication to maximize the number of duplicated instructions within the same peroformance overhead bounds. Our packing-oriented approach can duplicate up to 18% more instructions within the same performance overheads compared to the previous static duplication techniques.
APA, Harvard, Vancouver, ISO, and other styles
18

Najoui, Mohamed, Mounir Bahtat, Anas Hatim, Said Belkouch, and Noureddine Chabini. "VLIW DSP-Based Low-Level Instruction Scheme of Givens QR Decomposition for Real-Time Processing." Journal of Circuits, Systems and Computers 26, no. 09 (April 24, 2017): 1750129. http://dx.doi.org/10.1142/s0218126617501298.

Full text
Abstract:
QR decomposition (QRD) is one of the most widely used numerical linear algebra (NLA) kernels in several signal processing applications. Its implementation has a considerable and an important impact on the system performance. As processor architectures continue to gain ground in the high-performance computing world, QRD algorithms have to be redesigned in order to take advantage of the architectural features on these new processors. However, in some processor architectures like very large instruction word (VLIW), compiler efficiency is not enough to make an effective use of available computational resources. This paper presents an efficient and optimized approach to implement Givens QRD in a low-power platform based on VLIW architecture. To overcome the compiler efficiency limits to parallelize the most of Givens arithmetic operations, we propose a low-level instruction scheme that could maximize the parallelism rate and minimize clock cycles. The key contributions of this work are as follows: (i) New parallel and fast version design of Givens algorithm based on the VLIW features (i.e., instruction-level parallelism (ILP) and data-level parallelism (DLP)) including the cache memory properties. (ii) Efficient data management approach to avoid cache misses and memory bank conflicts. Two DSP platforms C6678 and AK2H12 were used as targets for implementation. The introduced parallel QR implementation method achieves, in average, more than 12[Formula: see text] and 6[Formula: see text] speedups over the standard algorithm version and the optimized QR routine implementations, respectively. Compared to the state of the art, the proposed scheme implementation is at least 3.65 and 2.5 times faster than the recent CPU and DSP implementations, respectively.
APA, Harvard, Vancouver, ISO, and other styles
19

Najoui, Mohamed, Anas Hatim, Said Belkouch, and Noureddine Chabini. "Novel Implementation Approach with Enhanced Memory Access Performance of MGS Algorithm for VLIW Architecture." Journal of Circuits, Systems and Computers 29, no. 12 (February 19, 2020): 2050200. http://dx.doi.org/10.1142/s021812662050200x.

Full text
Abstract:
Modified Gram–Schmidt (MGS) algorithm is one of the most-known forms of QR decomposition (QRD) algorithms. It has been used in many signal and image processing applications to solve least square problem and linear equations or to invert matrices. However, QRD is well-thought-out as a computationally expensive technique, and its sequential implementation fails to meet the requirements of many real-time applications. In this paper, we suggest a new parallel version of MGS algorithm that uses VLIW (Very Long Instruction Word) resources in an efficient way to get more performance. The presented parallel MGS is based on compact VLIW kernels that have been designed for each algorithm step taking into account architectural and algorithmic constraints. Based on instruction scheduling and software pipelining techniques, the proposed kernels exploit efficiently data, instruction and loop levels parallelism. Additionally, cache memory properties were used efficiently to enhance parallel memory access and to avoid cache misses. The robustness, accuracy and rapidity of the introduced parallel MGS implementation on VLIW enhance significantly the performance of systems under severe rea-time and low power constraints. Experimental results show great improvements over the optimized vendor QRD implementation and the state of art.
APA, Harvard, Vancouver, ISO, and other styles
20

Brost, Vincent. "Multiple modular very long instruction word processors based on field programmable gate arrays." Journal of Electronic Imaging 16, no. 2 (April 1, 2007): 023001. http://dx.doi.org/10.1117/1.2728743.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Managuli, Ravi. "Mapping of two-dimensional convolution on very long instruction word media processors for real-time performance." Journal of Electronic Imaging 9, no. 3 (July 1, 2000): 327. http://dx.doi.org/10.1117/1.482755.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Cho, Doo-San. "A Study on software performance acceleration for improving real time constraint of a VLIW type Drone FCC." Journal of the Korean Society of Industry Convergence 20, no. 1 (March 31, 2017): 1–7. http://dx.doi.org/10.21289/ksic.2017.20.1.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

KAMARAJU, M., M. ALEKHYA, and K. LAL KISHORE. "A NOVEL IMPLEMENTATION OF 32-BIT VLIW-MISC PROCESSOR ON FPGA." International Journal of Computer and Communication Technology, January 2016, 60–63. http://dx.doi.org/10.47893/ijcct.2016.1339.

Full text
Abstract:
The main objective of this work is to implement a 32-bit pipelined RISC processor without interlocking stages. It is developed by S.I.M.E (Single Instruction Multiple Execution) that is with single instruction scheme more executions can be done and is based on VLIW(Very Long Instruction Word) architecture processing is an optimal choice in the attempt to obtain high performance level in Embedded Systems. In VLIW based architecture, the effectiveness of the processor depends on the ability of compilers to provide sufficient instruction level parallelism (ILP). The processor has been designed with VHDL, synthesized using Xilinx tool.
APA, Harvard, Vancouver, ISO, and other styles
24

"An Energy Efficient Register File Architecture for VLIW Streaming Processors on FPGAs." International Journal of Engineering and Advanced Technology 9, no. 1S3 (December 31, 2019): 10–14. http://dx.doi.org/10.35940/ijeat.a1003.1291s319.

Full text
Abstract:
The design of a register file with large scalability, high bandwidth, and energy efficiency is the major issue in the execution of streaming Very Long Instruction Word (VLIW) processors on Field Programmable Gate Arrays (FPGA's). This problem arises due to the fact that accessing multi-ported register files that can use optimized on-chip memory resources as well as enabling the maximum sharing of register operands are difficult provided that FPGA's on-chip memory resources only support up to two ports. To handle this issue, an Inverted Distributed Register File (IDRF) architecture is proposed in this article. This new IDRF is compared with the existing Central Register File (CRF) and the Distributed Register File (DRF) architectures on parameters such as kernel performance, circuit area, access delay, dynamic power, and energy. Experimental results show that IDRF matches the kernel performance with the CRF architecture but 10.4% improvement in kernel performance as compared to DRF architecture. Similar experimental results related to the circuit area, dynamic power, and energy are discussed in this article.
APA, Harvard, Vancouver, ISO, and other styles
25

Ferreira, Lucas, Steffen Malkowsky, Patrik Persson, Sven Karlsson, Kalle Åström, and Liang Liu. "Design of an Application-specific VLIW Vector Processor for ORB Feature Extraction." Journal of Signal Processing Systems, January 30, 2023. http://dx.doi.org/10.1007/s11265-022-01833-9.

Full text
Abstract:
AbstractIn computer-vision feature extraction algorithms, compressing the image into a sparse set of trackable keypoints, empowers navigation-critical systems such as Simultaneous Localization And Mapping (SLAM) in autonomous robots, and also other applications such as augmented reality and 3D reconstruction. Most of those applications are performed in battery-powered gadgets featuring in common a very stringent power-budget. Near-to-sensor computing of feature extraction algorithms allows for several design optimizations. First, the overall on-chip memory requirements can be lessened, and second, the internal data movement can be minimized. This work explores the usage of an Application Specific Instruction Set Processor (ASIP) dedicated to perform feature extraction in a real-time and energy-efficient manner. The ASIP features a Very Long Instruction Word (VLIW) architecture comprising one RV32I RISC-V and three vector slots. The on-chip memory sub-system implements parallel multi-bank memories with near-memory data shuffling to enable single-cycle multi-pattern vector access. Oriented FAST and Rotated BRIEF (ORB) are thoroughly explored to validate the proposed architecture, achieving a throughput of 140 Frames-Per-Second (FPS) for VGA images for one scale, while reducing the number of memory accesses by 2 orders of magnitude as compared to other embedded general-purpose architectures.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography