To see the other types of publications on this topic, follow the link: Novel processor architecture.

Journal articles on the topic 'Novel processor architecture'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Novel processor architecture.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Yantır, Hasan Erdem, Wenzhe Guo, Ahmed M. Eltawil, Fadi J. Kurdahi, and Khaled Nabil Salama. "An Ultra-Area-Efficient 1024-Point In-Memory FFT Processor." Micromachines 10, no. 8 (July 31, 2019): 509. http://dx.doi.org/10.3390/mi10080509.

Full text
Abstract:
Current computation architectures rely on more processor-centric design principles. On the other hand, the inevitable increase in the amount of data that applications need forces researchers to design novel processor architectures that are more data-centric. By following this principle, this study proposes an area-efficient Fast Fourier Transform (FFT) processor through in-memory computing. The proposed architecture occupies the smallest footprint of around 0.1 mm 2 inside its class together with acceptable power efficiency. According to the results, the processor exhibits the highest area efficiency ( FFT / s / area ) among the existing FFT processors in the current literature.
APA, Harvard, Vancouver, ISO, and other styles
2

Göhringer, Diana, Thomas Perschke, Michael Hübner, and Jürgen Becker. "A Taxonomy of Reconfigurable Single-/Multiprocessor Systems-on-Chip." International Journal of Reconfigurable Computing 2009 (2009): 1–11. http://dx.doi.org/10.1155/2009/395018.

Full text
Abstract:
Runtime adaptivity of hardware in processor architectures is a novel trend, which is under investigation in a variety of research labs all over the world. The runtime exchange of modules, implemented on a reconfigurable hardware, affects the instruction flow (e.g., in reconfigurable instruction set processors) or the data flow, which has a strong impact on the performance of an application. Furthermore, the choice of a certain processor architecture related to the class of target applications is a crucial point in application development. A simple example is the domain of high-performance computing applications found in meteorology or high-energy physics, where vector processors are the optimal choice. A classification scheme for computer systems was provided in 1966 by Flynn where single/multiple data and instruction streams were combined to four types of architectures. This classification is now used as a foundation for an extended classification scheme including runtime adaptivity as further degree of freedom for processor architecture design. The developed scheme is validated by a multiprocessor system implemented on reconfigurable hardware as well as by a classification of existing static and reconfigurable processor systems.
APA, Harvard, Vancouver, ISO, and other styles
3

Meyer, M. "A novel processor architecture with exact tag-free pointers." IEEE Micro 24, no. 3 (May 2004): 46–55. http://dx.doi.org/10.1109/mm.2004.2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Karmakar, Amiya, Amitabha Sinha, Pratik Kumar Sinha, and Pijush Biswas. "Architecture of a Novel Configurable Communication Processor for SDR." International Journal of VLSI Design & Communication Systems 6, no. 4 (August 30, 2015): 35–49. http://dx.doi.org/10.5121/vlsic.2015.6404.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Bu, Wei Jing. "A Novel Numerical Control Architecture Based on Multiprocessor and Real-Time Ethernet." Applied Mechanics and Materials 155-156 (February 2012): 120–24. http://dx.doi.org/10.4028/www.scientific.net/amm.155-156.120.

Full text
Abstract:
The design of the CNC system to realize the function of the dedicated processor/modular is very select. Low cost of the ARM processor with Windows CE operating system is perfect for soft real-time tasks, such as the system state display, program explains, etc. The high performance DSP processors µ C/OS-II operating system is real-time tasks efforts, which is responsible for interpolation, speed control. In addition, to meet demand for the reconstruction of the design and flexible manufacturing, a reconfigurable based on FPGA technology for module, meet the functional requirement, build the PLC based on real-time Ethernet field bus network for simple connections between executors in the numerical control system controller.
APA, Harvard, Vancouver, ISO, and other styles
6

Yang, Liu, Xiao Qiang Ni, and Heng Zhu Liu. "Implementing and Optimizing DES on Stream Processor." Advanced Materials Research 532-533 (June 2012): 714–18. http://dx.doi.org/10.4028/www.scientific.net/amr.532-533.714.

Full text
Abstract:
Processors using stream architecture can make good use of the on-chip resources and explore the data locality and parallelism. DES algorithm is one of the most popular cipher algorithms. This paper proposes the novel implementation of DES algorithm on stream architecture based on both stream programming model and DES algorithm and the speedup is 1.27 times.
APA, Harvard, Vancouver, ISO, and other styles
7

Yang, Hui, Shu Ming Chen, and Tie Bin Wu. "A Novel Two-Level Instruction Issue Window Based on VLIW Architecture." Advanced Materials Research 317-319 (August 2011): 146–49. http://dx.doi.org/10.4028/www.scientific.net/amr.317-319.146.

Full text
Abstract:
Instruction compression technique overcomes the drawbacks of traditional VLIW architectures with low density in the instruction cache. However, the separated long instruction word was arranged into two cache line. It comes to be a bottleneck problem for VLIW architecture processor performance because these split long instruction word can not be fetched and issued simultaneously. A novel two-level instruction issue window mechanism is proposed in this paper. It solves the instruction fetch and issue problem in separating instruction words. It provides more effective and continuous instruction flow, and stores one iteration of the loop body to support software pipeline technique, which improves VLIW DSP processor performance effectively. Proposed machanism was synthesized to evaluate its overall costs, and the performance speedup result for DSP/IMG library bencharks using the cycle accurate simulator are presented.
APA, Harvard, Vancouver, ISO, and other styles
8

L.Giridas, K., and A. Shajin Nargunam. "A Novel Architecture for Hybrid Processor Pool Model using IITPS Scheme." International Journal of Computer Applications 49, no. 5 (July 28, 2012): 20–25. http://dx.doi.org/10.5120/7624-0684.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Dong, Jing Chuan, Tai Yong Wang, Bo Li, Xian Wang, and Zhe Liu. "Design and Implementation of an Interpolation Processor for CNC Machining." Advanced Materials Research 819 (September 2013): 322–27. http://dx.doi.org/10.4028/www.scientific.net/amr.819.322.

Full text
Abstract:
As the demand for high speed and high precision machining increases, the fast and accurate real-time interpolation is necessary in modern computerized numerical control (CNC) systems. However, the complexity of the interpolation algorithm is an obstacle for the embedded processor to achieve high performance control. In this paper, a novel interpolation processor is designed to accelerate the real-time interpolation algorithm. The processor features an advanced parallel architecture, including a 3-stage instruction pipeline, very long instruction word (VLIW) support, and asynchronous instruction execution mechanism. The architecture is aimed for accelerating the computing-intensive tasks in CNC systems. A prototype platform was built using a low-cost field programmable gate array (FPGA) chip to implementation the processor. Experimental result has verified the design and showed the good computing performance of the proposed architecture.
APA, Harvard, Vancouver, ISO, and other styles
10

Issa, Joseph. "A Novel Method to Predict Processor Performance by Modeling Different Architecture Parameters." Journal of Computer Science 16, no. 4 (April 1, 2020): 479–92. http://dx.doi.org/10.3844/jcssp.2020.479.492.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Mahmood, Ausif. "Behavioral Simulation and Performance Evaluation of Multi-Processor Architectures." VLSI Design 4, no. 1 (January 1, 1996): 59–68. http://dx.doi.org/10.1155/1996/91035.

Full text
Abstract:
The development of multi-processor architectures requires extensive behavioral simulations to verify the correctness of design and to evaluate its performance. A high level language can provide maximum flexibility in this respect if the constructs for handling concurrent processes and a time mapping mechanism are added. This paper describes a novel technique for emulating hardware processes involved in a parallel architecture such that an object-oriented description of the design is maintained. The communication and synchronization between hardware processes is handled by splitting the processes into their equivalent subprograms at the entry points. The proper scheduling of these subprograms is coordinated by a timing wheel which provides a time mapping mechanism. Finally, a high level language pre-processor is proposed so that the timing wheel and the process emulation details can be made transparent to the user.
APA, Harvard, Vancouver, ISO, and other styles
12

Swaminathan, Raja, Ram Viswanath, Sriram Srinivasan, and Arun Chandrasekhar. "Next Generation Xeon Server Package Architecture." International Symposium on Microelectronics 2017, no. 1 (October 1, 2017): 000342–45. http://dx.doi.org/10.4071/isom-2017-wp14_110.

Full text
Abstract:
Abstract Semiconductor industry is moving towards more and more integration to provide more functionality and add value to the processor, thereby enabling better user experience. This integration can come in 3 categories: On-die integration, On-package integration, and On-board integration. On-board integration is the typical method being used for several generations and on-die and on-package integration architectures are getting more focus due to better performance and reduced power. The key vector to enable on-die/package architectures is reduced cost and maximum features for a given substrate and socket form factor. Silicon features are also moving at a faster pace compared to the board technology. This paper details a novel package (PoINT) architecture as well as the key technology challenges that were resolved to successfully enable this architecture.
APA, Harvard, Vancouver, ISO, and other styles
13

Vakili, S., S. M. Fakhraie, and S. Mohammadi. "Evolvable multi-processor: a novel MPSoC architecture with evolvable task decomposition and scheduling." IET Computers & Digital Techniques 4, no. 2 (March 1, 2010): 143–56. http://dx.doi.org/10.1049/iet-cdt.2008.0120.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Naresh, K., and Dr G. Sateesh Kumar. "A Novel Architecture for Radix-4 Pipelined FFT Processor using Vedic Mathematics Algorithm." IOSR Journal of Electronics and Communication Engineering 9, no. 6 (2014): 23–31. http://dx.doi.org/10.9790/2834-09622331.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

VASSILIADIS, STAMATIS, GERALD G. PECHANEK, and JOSÉ G. DELGADO-FRIAS. "SPIN: THE SEQUENTIAL PIPELINED NEUROEMULATOR." International Journal on Artificial Intelligence Tools 02, no. 01 (March 1993): 117–32. http://dx.doi.org/10.1142/s0218213093000084.

Full text
Abstract:
This paper proposes a novel digital neural network architecture referred to as the Sequential PIpelined Neuroemulator or Neurocomputer (SPIN). The SPIN processor emulates neural networks producing high performance with minimum hardware by sequentially processing each neuron in the modeled completely connected network with a pipelined physical neuron structure. In addition to describing SPIN, performance equations are estimated for the ring systolic, the recurrent systolic array, and the neuromimetic neurocomputer architectures, three previously reported schemes for the emulation of neural networks, and a comparison with the SPIN architecture is reported.
APA, Harvard, Vancouver, ISO, and other styles
16

VENKATESWARAN, N., S. PATTABIRAMAN, R. DEVANATHAN, B. KUMARAN, ASHRAF AHMED, and SANKARA NARAYANAN. "A DESIGN METHODOLOGY FOR VERY LARGE ARRAY PROCESSORS—PART 1: GIPOP PROCESSOR ARRAY." International Journal of Pattern Recognition and Artificial Intelligence 09, no. 02 (April 1995): 231–62. http://dx.doi.org/10.1142/s0218001495000122.

Full text
Abstract:
Very Large Array Processors (VLAP) will be the need of the future for solving computationally intense Very Large Problems (VLP) common in pattern recognition, image processing and other related areas of digital signal processing. Design methodology of such VLAPs for massively parallel dedicated/general purpose applications is highly complex. Two companion papers (Part 1 and Part 2) on VLAP are presented in this issue. In Part 1, we propose a VLAP called Reconfigurable GIPOP Processor Array (RGPA). The RGPA is made up of high performance processing elements called the Generalized Inner Product Outer Product (GIPOP) processor. Unlike the traditional special/general purpose processors, ours has a totally different and new architecture and organization involving higher level functional units to match with the complex computational structures of numeric algorithms and suitable for massively parallel processing. We also present a strategy for mapping VLPs on VLAPs. In Part 2, we propose a novel VLSI design methodology for implementing cost effective and very high performance processors meant for special purpose applications and in particular, for VLAPs.
APA, Harvard, Vancouver, ISO, and other styles
17

Hwang, Wen-Jyi, Chien-Min Ou, Peng-Chieh Hung, Cheng-Yen Yang, and Tun-Hao Yu. "An Efficient Distributed Genetic Algorithm Architecture for Vector Quantizer Design." Open Artificial Intelligence Journal 4, no. 1 (February 18, 2010): 20–29. http://dx.doi.org/10.2174/1874061801004010020.

Full text
Abstract:
This paper presents a novel distributed genetic algorithm (GA) architecture for the design of vector quantizers. The design is based on a multi-core architecture, where each island of the GA is associated with a hardware accelerator and a softcore processor for independent genetic evolutions. An on-chip RAM with a mutex circuit is adopted for the migration of genetic strings among different islands. This allows a simple and flexible migration for the implementation of hardware distributed GA. Experimental results shows that the proposed architecture has significantly lower computational time as compared with its software counterparts running on multicore processors with multithreading for GA-based optimization.
APA, Harvard, Vancouver, ISO, and other styles
18

Wang, Hong Yi, Qing Yang, Jian Fei Wu, and Jian Cheng Li. "A Novel Implementation of UHF RFID Reader." Applied Mechanics and Materials 190-191 (July 2012): 642–46. http://dx.doi.org/10.4028/www.scientific.net/amm.190-191.642.

Full text
Abstract:
The radio frequency identification technology is a kind of emerging non-contact identification technology. In the study of radio frequency identification reader, the traditional microcontroller-based architecture can’t meet the system requirements due to the limited processing ability. In this paper, the author designs a UHF RFID reader based on the ARM processor and FPGA, compared to the microcontroller-based architecture, the reader deals faster. The reader consists of three parts, namely, the protocol processing module, the digital baseband module and the RF module.
APA, Harvard, Vancouver, ISO, and other styles
19

Motupalle, Haritha, and Syed Jahangir Badashah. "A Novel VLSI Architecture for SPHIT Encoder." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 10, no. 4 (August 15, 2013): 1522–30. http://dx.doi.org/10.24297/ijct.v10i4.3252.

Full text
Abstract:
In this Paper we propose a highly scalable image compression scheme based on the set partitioning in hierarchical trees (SPIHT) algorithm. Our algorithm called highly scalable SPIHT (HS-SPIHT), supports spatial and SNR scalability and provides a bit stream that can be easily adapted (reordered) to given bandwidth and resolution requirements by a simple transcoder (parser). The HS-SPIHT algorithm adds the spatial scalability feature without sacrificing the SNR embeddedness property as found in the original SPIHT bit stream. HS-SPIHT finds applications in progressive Web browsing, flexible image storage and retrieval, and image transmission over heterogeneous networks. Here we have written the core processor Microblaze is designed in VHDL (VHSIC hardware description language), implemented using XILINX ISE 8.1 Design suite the algorithm is written in system C Language and tested in SPARTAN-3 FPGA kit by interfacing a test circuit with the PC using the RS232 cable. The test results are seen to be satisfactory. The area taken and the speed of the algorithm are also evaluated.
APA, Harvard, Vancouver, ISO, and other styles
20

Srinivasan, Sudarshan K. "Optimization Techniques for Verification of Out-of-Order Execution Machines." Journal of Electrical and Computer Engineering 2010 (2010): 1–7. http://dx.doi.org/10.1155/2010/515021.

Full text
Abstract:
We develop two optimization techniques,flush-machineand collapsed flushing, to improve the efficiency of automatic refinement-abased verification of out-of-order (ooo) processor models. Refinement is a notion of equivalence that can be used to check that an ooo processor correctly implements all behaviors of its instruction set architecture (ISA), including deadlock detection. The optimization techniques work by reducing the computational complexity of the refinement map, a function central to refinement proofs that maps ooo processor model states to ISA states. This has a direct impact on the efficiency of verification, which is studied using 23 ooo processor models.Flush-machine, is a novel optimization technique. Collapsed flushing has been employed previously in the context of in-order processors. We show how to apply collapsed flushing for ooo processor models. Using both the optimizations together, we can handle 9 ooo models that could not be verified using standard flushing. Also, the optimizations provided a speed up of 23.29 over standard flushing.
APA, Harvard, Vancouver, ISO, and other styles
21

Ahmed, O., S. Areibi, and G. Grewal. "Hardware Accelerators Targeting a Novel Group Based Packet Classification Algorithm." International Journal of Reconfigurable Computing 2013 (2013): 1–33. http://dx.doi.org/10.1155/2013/681894.

Full text
Abstract:
Packet classification is a ubiquitous and key building block for many critical network devices. However, it remains as one of the main bottlenecks faced when designing fast network devices. In this paper, we propose a novel Group Based Search packet classification Algorithm (GBSA) that is scalable, fast, and efficient. GBSA consumes an average of 0.4 megabytes of memory for a 10 k rule set. The worst-case classification time per packet is 2 microseconds, and the preprocessing speed is 3 M rules/second based on an Xeon processor operating at 3.4 GHz. When compared with other state-of-the-art classification techniques, the results showed that GBSA outperforms the competition with respect to speed, memory usage, and processing time. Moreover, GBSA is amenable to implementation in hardware. Three different hardware implementations are also presented in this paper including an Application Specific Instruction Set Processor (ASIP) implementation and two pure Register-Transfer Level (RTL) implementations based on Impulse-C and Handel-C flows, respectively. Speedups achieved with these hardware accelerators ranged from 9x to 18x compared with a pure software implementation running on an Xeon processor.
APA, Harvard, Vancouver, ISO, and other styles
22

Loan, Sajad A., Asim M. Murshid, Shuja A. Abbasi, and Abdul Rahman M. Alamoud. "A novel VLSI architecture for a fuzzy inference processor using Gaussian-shaped membership function." Journal of Intelligent & Fuzzy Systems 24, no. 1 (2013): 5–19. http://dx.doi.org/10.3233/ifs-2012-0503.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Murakami, K., N. Irie, and S. Tomita. "SIMP (Single Instruction stream/Multiple instruction Pipelining): a novel high-speed single-processor architecture." ACM SIGARCH Computer Architecture News 17, no. 3 (June 1989): 78–85. http://dx.doi.org/10.1145/74926.74935.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Venkatachar, A., J. Ramanujam, and A. Thirumalai. "Communication Generation for Block-Cyclic Distributions." Parallel Processing Letters 07, no. 02 (June 1997): 195–202. http://dx.doi.org/10.1142/s0129626497000206.

Full text
Abstract:
Data-parallel languages such as High Performance Fortran, Vienna Fortran and Fortran D include directives such as alignment and distribution that describe how data and computation are mapped onto the processors in a distributed-memory multiprocessor. A compiler for HPF that generates code for each processor has to compute the sequence of local memory addresses accessed by each processor and the sequence of send and receives for a given processor to access non-local data. In this paper, we present a novel approach for the generation of communication sets that exploits a pttern of send-receive index pairs. In addition, we present an algorithm for code generation. Experimental results demonstrate the viability of this technique.
APA, Harvard, Vancouver, ISO, and other styles
25

Itou, Tsutomu, and Nobuyuki Yamasaki. "Design and Implementation of the Multimedia Operation Mechanism for Responsive Multithreaded Processor." Journal of Robotics and Mechatronics 17, no. 4 (August 20, 2005): 456–62. http://dx.doi.org/10.20965/jrm.2005.p0456.

Full text
Abstract:
<I>Responsive Multithreaded (RMT) Processor</I> is designed for distributed real-time systems. This paper focuses on the multimedia processing architecture of <I>RMT Processor</I>. Multimedia processing requires high-throughput calculation for bulky data processing. <I>RMT Processor</I> architecture is based on eight-way prioritized simultaneous multithreading, which executes each thread in order of priority. Since the priority of hard real-time threads is higher than that of multimedia processing threads, instruction issue slots used by the multimedia processing threads are few in <I>RMT Processor</I> when hard real-time threads are executed simultaneously. Therefore multimedia processing threads need to utilize instruction issue slots effectively to achieve high performance. We have designed a novel vector operation mechanism to process multimedia data efficiently in parallel. Because the same instructions are iterated in multimedia processing, the compound operation mechanism is designed to calculate more data per instruction in multimedia processing.
APA, Harvard, Vancouver, ISO, and other styles
26

CHUNG, KUO-LIANG, and HSUN-WEN CHANG. "NOVEL PIPELINING AND PROCESSOR ALLOCATION STRATEGY FOR MONOID COMPUTATIONS ON UNSHUFFLE-EXCHANGE NETWORKS." Parallel Processing Letters 03, no. 02 (June 1993): 189–93. http://dx.doi.org/10.1142/s012962649300023x.

Full text
Abstract:
This short paper presents a novel pipelining and processor allocation strategy for monoid computations on an unshuffle-exchange network. In the strategy, the processor utilization is near 1 and the communication is collision-free. With the characteristics of constant connections to each processor and only a single output node on the network, the method given here can compete with the method of Barnard and Skillicorn based on a hypercube network with multiple output nodes.
APA, Harvard, Vancouver, ISO, and other styles
27

Guo, Jing Jie, and Wei Tang. "Design of Pythagorean Hodograph Curve Interpolator Based on NiosII Embedded Processor and FPGA." Advanced Materials Research 383-390 (November 2011): 6868–72. http://dx.doi.org/10.4028/www.scientific.net/amr.383-390.6868.

Full text
Abstract:
In this paper, a novel architecture of Pythagorean Hodograph (PH) curve interpolator based on Nios Ⅱ embedded processor and FPGA is proposed. The whole interpolator including NiosⅡ processor is built in a single FPGA chip. The interpolator uses a two-stage interpolation scheme to reduce the computational burden of PH curve interpolator. The Nios Ⅱ embedded processor implements 1st-stage interpolation, the FPGA receives the command from the Nios Ⅱ processor and implements 2nd-stage interpolation simultaneously. Therefore, the interpolator can implement the real-time PH curve interpolation algorithm steadily to meet the needs of high-speed and high-precision machining.
APA, Harvard, Vancouver, ISO, and other styles
28

Vojtko, Martin, and Tibor Krajčovič. "Semi-automated process of adaptation of platform dependent parts of embedded operating systems." Journal of Electrical Engineering 68, no. 2 (March 28, 2017): 87–98. http://dx.doi.org/10.1515/jee-2017-0013.

Full text
Abstract:
Abstract Each year manufacturers develop new processors. As a reaction to this continuous development, the developers of software have to adapt their software to those new processors. As a minimal requirement, the code of an operating system has to be changed to enable the execution of other user applications. This change is a complicated process during which incompatible parts of an operating system have to be redesigned and missing parts have to be implemented. Complications arise when there is a need to adapt an operating system to completely different processor architecture. In this paper we present a novel adaptation process that has preconditions to reduce the impact of these complications. This process uses a file for the formal description of a processor, which is also described in this paper. The formal description could act as a standard for processor manufacturers and could allow the generation of a platform dependent code of an operating system. This paper presents concepts, definitions and ideas of the adaptation process and shows possible solutions for an automatic generation of code parts of an operating system.
APA, Harvard, Vancouver, ISO, and other styles
29

Kirchhoff, Michael, Philipp Kerling, Detlef Streitferdt, and Wolfgang Fengler. "A Real-Time Capable Dynamic Partial Reconfiguration System for an Application-Specific Soft-Core Processor." International Journal of Reconfigurable Computing 2019 (September 22, 2019): 1–14. http://dx.doi.org/10.1155/2019/4723838.

Full text
Abstract:
Modern FPGAs (Field Programmable Gate Arrays) are becoming increasingly important when it comes to embedded system development. Within these FPGAs, soft-core processors are often used to solve a wide range of different tasks. Soft-core processors are a cost-effective and time-efficient way to realize embedded systems. When using the full potential of FPGAs, it is possible to dynamically reconfigure parts of them during run time without the need to stop the device. This feature is called dynamic partial reconfiguration (DPR). If the DPR approach is to be applied in a real-time application-specific soft-core processor, an architecture must be created that ensures strict compliance with the real-time constraint at all times. In this paper, a novel method that addresses this problem is introduced, and its realization is described. In the first step, an application-specializable soft-core processor is presented that is capable of solving problems while adhering to hard real-time deadlines. This is achieved by the full design time analyzability of the soft-core processor. Its special architecture and other necessary features are discussed. Furthermore, a method for the optimized generation of partial bitstreams for the DPR as well as its practical implementation in a tool is presented. This tool is able to minimize given bitstreams with the help of a differential frame bitmap. Experiments that realize the DPR within the soft-core framework are presented, with respect to the need for hard real-time capability. Those experiments show a significant resource reduction of about 40% compared to a functionally equivalent non-DPR design.
APA, Harvard, Vancouver, ISO, and other styles
30

Voudouris, Petros, Per Stenström, and Risat Pathan. "Federated Scheduling of Sporadic DAGs on Unrelated Multiprocessors." ACM Transactions on Embedded Computing Systems 20, no. 5s (October 31, 2021): 1–25. http://dx.doi.org/10.1145/3477018.

Full text
Abstract:
This paper presents a federated scheduling algorithm for implicit-deadline sporadic DAGs that execute on an unrelated heterogeneous multiprocessor platform. We consider a global work-conserving scheduler to execute a single DAG exclusively on a subset of the unrelated processors. Formal schedulability analysis to find the makespan of a DAG on its dedicated subset of the processors is proposed. The problem of determining each subset of dedicated unrelated processors for each DAG such that the DAG meets its deadline (i.e., designing the federated scheduling algorithm) is tackled by proposing a novel processors-to-task assignment heuristic using a new concept called processor value . Empirical evaluation is presented to show the effectiveness of our approach.
APA, Harvard, Vancouver, ISO, and other styles
31

Xiaofeng Wu, V. A. Chouliaras, J. L. Nunez-Yanez, and R. M. Goodall. "A Novel $\Delta\Sigma$ Control System Processor and Its VLSI Implementation." IEEE Transactions on Very Large Scale Integration (VLSI) Systems 16, no. 3 (March 2008): 217–28. http://dx.doi.org/10.1109/tvlsi.2007.915396.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Yuan, Min, Zhenguo Ma, Feng Yu, and Qianjian Xing. "A Novel Address Scheme for Continuous-Flow Parallel Memory-Based Real-Valued FFT Processor." Electronics 8, no. 9 (September 17, 2019): 1042. http://dx.doi.org/10.3390/electronics8091042.

Full text
Abstract:
In this article, we present a modified constant-geometry based signal flow graph for memory-based real-valued fast Fourier transform architecture. Without an extra permutation, the corresponding address scheme solves the memory conflict and achieves continuous-flow operation with the minimal memory and computation cycles requirement when compared to the state-of-the-art designs. Besides, the address scheme meets the constraint of in-place operation, concurrent I/O, normal-order I/O, variable size, and parallel processing. The experimental results demonstrate the resource and frequency efficiency of the proposed address scheme.
APA, Harvard, Vancouver, ISO, and other styles
33

Hamblen, James O. "Using Vhdl Based Modeling, Synthesis, and Simulation in an Introductory Computer Architecture Laboratory." International Journal of Electrical Engineering & Education 33, no. 3 (July 1996): 251–60. http://dx.doi.org/10.1177/002072099603300306.

Full text
Abstract:
Using VHDL based modelling, synthesis, and simulation in an introductory computer architecture laboratory In many existing curricula, there is a notable lack of recent research advances in CAD tools and rapid prototyping using logic synthesis. This paper describes a novel introductory computer architecture laboratory that utilizes these new developments. VHDL based logic synthesis and timing simulations are used to design a RISC processor.
APA, Harvard, Vancouver, ISO, and other styles
34

Ezhumalai, P., A. Chilambuchelvan, and C. Arun. "Novel NoC Topology Construction for High-Performance Communications." Journal of Computer Networks and Communications 2011 (2011): 1–6. http://dx.doi.org/10.1155/2011/405697.

Full text
Abstract:
Different intellectual property (IP) cores, including processor and memory, are interconnected to build a typical system-on-chip (SoC) architecture. Larger SoC designs dictate the data communication to happen over the global interconnects. Network-on-Chip(NoC) architectures have been proposed as a scalable solution to the global communication challenges in nanoscale systems-on-chip (SoC) design. We proposed an idea on building customizing synthesis network—on-chip with the better flow partitioning and also considered power and area reduction as compared to the already presented regular topologies. Hence to improve the performance of SoC, first, we did a performance study of regular interconnect topologies MESH, TORUS, BFT and EBFT, we observed that the overall latency and throughput of the EBFT is better compared to other topologies, The next best in case of latency and throughput is BFT. Experimental results on a variety of NoC benchmarks showed that our synthesis results were achieved reduction in power consumption and average hop count over custom topology implementation.
APA, Harvard, Vancouver, ISO, and other styles
35

KWON, YOUNG-SU, and NAK-WOONG EUM. "APPLICATION-ADAPTIVE RECONFIGURATION OF MEMORY ADDRESS SHUFFLER FOR FPGA-EMBEDDED INSTRUCTION-SET PROCESSOR." Journal of Circuits, Systems and Computers 19, no. 07 (November 2010): 1435–47. http://dx.doi.org/10.1142/s0218126610006748.

Full text
Abstract:
Programmability requirement in reconfigurable systems necessitates the integration of soft processors in FPGAs. The extensive memory bandwidth sets a major performance bottleneck in soft processors for media applications. While the parallel memory system is a viable solution to account for a large amount of memory transactions in media processors, memory access conflicts caused by multiple memory buses limit the overall performance. We propose and evaluate the configurable memory address shuffler integrated in memory access arbiter for the parallel memory system in a soft processor. The novel address shuffling algorithm profiles memory access pattern of the application, produces the access conflict graph, relocates decomposed memory sub-pages based on the access conflict graph, and finally generates a synthesizable code of the address shuffler. The address shuffler efficiently translates the requested memory addresses into the shuffled addresses such that the amount of simultaneous accesses to the identical physical memory block diminishes. The reconfigurability of the address shuffler enables the adaptive address shuffling depending on the memory access pattern of an application running on the soft processor. The configurable address shuffler removes 80% of access conflicts on average for benchmarks where the hardware overhead of the shuffler is 1592 LUTs which is 14% of LUT size of the processor core.
APA, Harvard, Vancouver, ISO, and other styles
36

Wen, Changbao, and Changchun Zhu. "A novel architecture of implementing wavelet transform and reconstruction processor with SAW device based on MSC." Sensors and Actuators A: Physical 126, no. 1 (January 2006): 148–53. http://dx.doi.org/10.1016/j.sna.2005.09.016.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Mahdizadeh, Hossein, and Massoud Masoumi. "Novel Architecture for Efficient FPGA Implementation of Elliptic Curve Cryptographic Processor Over ${\rm GF}(2^{163})$." IEEE Transactions on Very Large Scale Integration (VLSI) Systems 21, no. 12 (December 2013): 2330–33. http://dx.doi.org/10.1109/tvlsi.2012.2230410.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Dantas, Leandro Poloni, Rodolfo J. de Azevedo, and Salvador Pinillos Gimenez. "A Novel Processor Architecture With a Hardware Microkernel to Improve the Performance of Task-Based Systems." IEEE Embedded Systems Letters 11, no. 2 (June 2019): 46–49. http://dx.doi.org/10.1109/les.2018.2864094.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Wu, Guang Wen, Xiang Sheng Huang, and Wen Long Hu. "A Novel Method for Solution of the Division Operation on ARM7 Microcontroller." Advanced Materials Research 718-720 (July 2013): 2418–21. http://dx.doi.org/10.4028/www.scientific.net/amr.718-720.2418.

Full text
Abstract:
Before architecture V7, the hardware of ARM microcontroller family does not support division operation. Although it is easy to program on ARM processors with C language which can implement division operation with library functions, the procedure has much trouble and the efficiency is lower when the function code written in C language is called in assembly program. This paper introduces an algorithm for the division operation on ARM7 processor and also gives corresponding subroutines which can be used directly in assembly program design. The algorithm is similar to the operation theory of the digital circuit which uses subtraction circuit to do division operation. The given subroutines can deal with the division operation between two 32-bit unsigned integers and the division between a 64-bit unsigned integer and a 32-bit unsigned integer.
APA, Harvard, Vancouver, ISO, and other styles
40

Yu, Lei, Zhi Yong Liu, Dong Rui Fan, Yi Ke Ma, Feng Long Song, Xiao Chun Ye, and Wei Zhi Xu. "Mapping Routing Lookup Algorithm on Many-Core Architecture Based on SPM and Cache Mixed Method." Applied Mechanics and Materials 58-60 (June 2011): 1226–31. http://dx.doi.org/10.4028/www.scientific.net/amm.58-60.1226.

Full text
Abstract:
With the development of the computing ability of many-core processor, the acceleration of parallel programs on many-core has become the research focus. The network packet processing is an important application of large-scale parallel processing. Thus, many researchers have great interests on the acceleration of packet processing on many-core processor. We select the IP routing lookup algorithm as our target application. We have analyzed the feature of packet lookup algorithm based on binary tree and propose a novel parallel lookup algorithm SCMRL (Spm and Cache Mixed Routing Lookup). We describe the whole process of SCMRL in details. The results of experiments on Godson-T many-core architecture show the better performance than the baseline algorithm.
APA, Harvard, Vancouver, ISO, and other styles
41

DJEMAL, RIDHA. "AN EMBEDDED SYSTEM ARCHITECTURE OF AUTOMATIC CENSORED ORDERED STATISTIC DETECTOR TECHNIQUES." Journal of Circuits, Systems and Computers 22, no. 07 (August 2013): 1350051. http://dx.doi.org/10.1142/s0218126613500515.

Full text
Abstract:
We designed and tested a novel field-programmable gate array (FPGA)-based embedded system that uses automatic censored ordered statistics detector (ACOSD) algorithms to detect targets in clutter with lognormal distribution. The detection process operates through two techniques called backward and forward ACOSD (B-ACOSD and F-ACOSD, respectively), which work in parallel to increase the detection accuracy and reduce the false alarm rate. Two architectures were considered for the proposed detector. The B-ACOSD algorithm operates the censoring beginning from the last cell belonging to a window of N range cells, whereas the F-ACOSD algorithm considers the censoring based on a scan beginning with the first cell in the same sorted window of cells. The detector is implemented on a FPGA-Altera Stratix II as a system-on-chip that integrates a Nios II core processor with our proposed detector as a co-processor and additional embedded memories and interfaces using parallelism and pipelining. For a reference window of 16 cells, the processor works properly with a processing speed of up to 129.13 MHz and a processing time of only 0.23 μs, within the range of the maximum tolerated delay of 0.5 μs fixed by the pulse width [A. Farina, A. Russo and F. A. Studer, IEE Proc. F Commun. Radar Signal Process.133 (1986) 39–54] for viewing a target at high resolution.
APA, Harvard, Vancouver, ISO, and other styles
42

Lewis, Mike, and Linda Brackenbury. "CADRE: A Low-power, Low-EMI DSP Architecture for Digital Mobile Phones." VLSI Design 12, no. 3 (January 1, 2001): 333–48. http://dx.doi.org/10.1155/2001/47640.

Full text
Abstract:
Current mobile phone applications demand high performance from the DSP, and future generations are likely to require even greater throughput. However, it is important to balance these processing demands against the requirement of low power consumption for extended battery lifetime. A novel low-power digital signal processor (DSP) architecture CADRE (Configurable Asynchronous DSP for Reduced Energy) addresses these requirements through a multi-level power reduction strategy. A parallel architecture and configurable compressed instruction set meets the throughput requirements without excessive program memory bandwidth, while a large register file reduces the cost of data accesses. Sign-magnitude representation is used for data, to reduce switching activity within the datapath. Asynchronous design gives fine-grained activity control without the complexities of clock gating, and gives low electromagnetic interference. Finally, the operational model of the target application allows for a reduced interrupt structure, simplifying processor design by avoiding the need for exact exceptions.
APA, Harvard, Vancouver, ISO, and other styles
43

Ou, Chien Min, Wen Jyi Hwang, and Ssu Min Yang. "Efficient Hardware Architecture for Kernel Fuzzy C-Means Algorithm." Applied Mechanics and Materials 284-287 (January 2013): 3079–86. http://dx.doi.org/10.4028/www.scientific.net/amm.284-287.3079.

Full text
Abstract:
A novel VLSI architecture for kernel fuzzy c-means algorithm is presented in this paper. The architecture consists of efficient circuits for the computation of kernel functions, membership coefficients and cluster centers. In addition, the usual iterative operations for updating the membership matrix and cluster centers are merged into one single updating process to evade the large storage requirement. The circuit is used as a hardware accelerator of a softcore processor in a system-on-programmable chip for physical performance measurement. Experimental results show that the proposed solution is an effective alternative for cluster analysis with low computational cost and high performance.
APA, Harvard, Vancouver, ISO, and other styles
44

Tajahuerce, E., J. Lancis, V. Climent, and P. Andrés. "Hybrid (refractive–diffractive) Fourier processor: a novel optical architecture for achromatic processing with broadband point-source illumination." Optics Communications 151, no. 1-3 (May 1998): 86–92. http://dx.doi.org/10.1016/s0030-4018(97)00739-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Shen, Mingya, Feng Xiao, and Kamal Alameh. "A novel reconfigurable optical interconnect architecture using an Opto-VLSI processor and a 4-f imaging system." Optics Express 17, no. 25 (November 25, 2009): 22680. http://dx.doi.org/10.1364/oe.17.022680.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Lubeck, Olaf, Michael Lang, Ram Srinivasan, and Greg Johnson. "Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E." Scientific Programming 17, no. 1-2 (2009): 199–208. http://dx.doi.org/10.1155/2009/784153.

Full text
Abstract:
The IBM Cell Broadband Engine (BE) is a novel multi-core chip with the potential for the demanding floating point performance that is required for high-fidelity scientific simulations. However, data movement within the chip can be a major challenge to realizing the benefits of the peak floating point rates. In this paper, we present the results of implementing Sweep3D on the Cell/B.E. using an intra-chip message passing model that minimizes data movement. We compare the advantages/disadvantages of this programming model with a previous implementation using a master–worker threading strategy. We apply a previously validated micro-architecture performance model for the application executing on the Cell/B.E. (based on our previous work in Monte Carlo performance models), that predicts overall CPI (cycles per instruction), and gives a detailed breakdown of processor stalls. Finally, we use the micro-architecture model to assess the performance of future design parameters for the Cell/B.E. micro-architecture. The methodologies and results have broader implications that extend to multi-core architectures.
APA, Harvard, Vancouver, ISO, and other styles
47

McNichols, John M., Eric J. Balster, William F. Turri, and Kerry L. Hill. "Design and Implementation of an Embedded NIOS II System for JPEG2000 Tier II Encoding." International Journal of Reconfigurable Computing 2013 (2013): 1–9. http://dx.doi.org/10.1155/2013/140234.

Full text
Abstract:
This paper presents a novel implementation of the JPEG2000 standard as a system on a chip (SoC). While most of the research in this field centers on acceleration of the EBCOT Tier I encoder, this work focuses on an embedded solution for EBCOT Tier II. Specifically, this paper proposes using an embedded softcore processor to perform Tier II processing as the back end of an encoding pipeline. The Altera NIOS II processor is chosen for the implementation and is coupled with existing embedded processing modules to realize a fully embedded JPEG2000 encoder. The design is synthesized on a Stratix IV FPGA and is shown to out perform other comparable SoC implementations by 39% in computation time.
APA, Harvard, Vancouver, ISO, and other styles
48

Thakur, Garima, Harsh Sohal, and Shruti Jain. "A novel parallel prefix adder for optimized Radix-2 FFT processor." Multidimensional Systems and Signal Processing 32, no. 3 (March 15, 2021): 1041–63. http://dx.doi.org/10.1007/s11045-021-00772-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

TSAY, JONG-CHUANG. "DESIGNING A SYSTOLIC ALGORITHM FOR GENERATING WELL-FORMED PARENTHESIS STRINGS." Parallel Processing Letters 14, no. 01 (March 2004): 83–97. http://dx.doi.org/10.1142/s0129626404001738.

Full text
Abstract:
A parenthesis string is a string of left and right parentheses. The string is well-formed when it consists of balanced pairs of left and right parentheses. This study presents a novel systolic algorithm for generating all the well-formed parenthesis strings in lexicographical order. The algorithm is cost-optimal and is run on a linear array of processors such that each well-formed parenthesis string can be generated in three time steps. The processor array is appropriate for VLSI implementation, since it has the features of modularity, regularity, and local connection.
APA, Harvard, Vancouver, ISO, and other styles
50

K, Periyarselvam, Saravanakumar G, and Anand M. "A Novel Architecture of Radix-3 Singlepath Delay Feedback (R3SDF) FFT Using MCSLA." Indonesian Journal of Electrical Engineering and Computer Science 10, no. 1 (April 1, 2018): 37. http://dx.doi.org/10.11591/ijeecs.v10.i1.pp37-42.

Full text
Abstract:
Fast Fourier transform (FFT) is widely used in digital signal processing and telecommunications, particularly in orthogonal frequency division multiplexing systems, to overcome the problems associated with orthogonal subcarriers. A new algorithm of radix-3 FFT has been introduced in this work. The DFT of length N can be realized from three DFT sequences; each of length N/3.Radix-3 algorithm reduces the number of multiplications required for realizing DFT.A novel design of Radix-3pipelined Single path Delay Feedback (R3SDF) FFT using MCSLA has been proposed in this paper. First, the pipelined radix-3 SDF FFT method has been designed. It has less area and large power consumption and delay. In order to overcome these problems, modified carry select adder structure is used to perform the adder operation for reducing the power consumption and delay. Finally, the MCSLA is integrated into radix-3 SDF FFT processor. The hardware complexity and execution time for implementing radix-3 FFT algorithm can be reduced than other FFTs.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography