Journal articles on the topic 'Network-on-chip, Dataflow Computing, Performance, Framework'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 22 journal articles for your research on the topic 'Network-on-chip, Dataflow Computing, Performance, Framework.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Alam, Shahanur, Chris Yakopcic, Qing Wu, Mark Barnell, Simon Khan, and Tarek M. Taha. "Survey of Deep Learning Accelerators for Edge and Emerging Computing." Electronics 13, no. 15 (2024): 2988. http://dx.doi.org/10.3390/electronics13152988.

Full text
Abstract:
The unprecedented progress in artificial intelligence (AI), particularly in deep learning algorithms with ubiquitous internet connected smart devices, has created a high demand for AI computing on the edge devices. This review studied commercially available edge processors, and the processors that are still in industrial research stages. We categorized state-of-the-art edge processors based on the underlying architecture, such as dataflow, neuromorphic, and processing in-memory (PIM) architecture. The processors are analyzed based on their performance, chip area, energy efficiency, and application domains. The supported programming frameworks, model compression, data precision, and the CMOS fabrication process technology are discussed. Currently, most commercial edge processors utilize dataflow architectures. However, emerging non-von Neumann computing architectures have attracted the attention of the industry in recent years. Neuromorphic processors are highly efficient for performing computation with fewer synaptic operations, and several neuromorphic processors offer online training for secured and personalized AI applications. This review found that the PIM processors show significant energy efficiency and consume less power compared to dataflow and neuromorphic processors. A future direction of the industry could be to implement state-of-the-art deep learning algorithms in emerging non-von Neumann computing paradigms for low-power computing on edge devices.
APA, Harvard, Vancouver, ISO, and other styles
2

Fang, Juan, Sitong Liu, Shijian Liu, Yanjin Cheng, and Lu Yu. "Hybrid Network-on-Chip: An Application-Aware Framework for Big Data." Complexity 2018 (July 30, 2018): 1–11. http://dx.doi.org/10.1155/2018/1040869.

Full text
Abstract:
Burst growing IoT and cloud computing demand exascale computing systems with high performance and low power consumption to process massive amounts of data. Modern system platforms based on fundamental requirements encounter a performance gap in chasing exponential growth in data speed and amount. To narrow the gap, a heterogamous design gives us a hint. A network-on-chip (NoC) introduces a packet-switched fabric for on-chip communication and becomes the de facto many-core interconnection mechanism; it refers to a vital shared resource for multifarious applications which will notably affect system energy efficiency. Among all the challenges in NoC, unaware application behaviors bring about considerable congestion, which wastes huge amounts of bandwidth and power consumption on the chip. In this paper, we propose a hybrid NoC framework, combining buffered and bufferless NoCs, to make the NoC framework aware of applications’ performance demands. An optimized congestion control scheme is also devised to satisfy the requirement in energy efficiency and the fairness of big data applications. We use a trace-driven simulator to model big data applications. Compared with the classical buffered NoC, the proposed hybrid NoC is able to significantly improve the performance of mixed applications by 17% on average and 24% at the most, decrease the power consumption by 38%, and improve the fairness by 13.3%.
APA, Harvard, Vancouver, ISO, and other styles
3

Muhsen, Yousif, Nor Azura Husin, Maslina Binti Zolkepli, Noridayu Manshor, Ahmed Abbas Jasim Al-Hchaimi, and A. S. Albahri. "Routing Techniques in Network-On-Chip Based Multiprocessor-System-on-Chip for IOT: A Systematic Review." Iraqi Journal For Computer Science and Mathematics 5, no. 1 (2024): 181–204. http://dx.doi.org/10.52866/ijcsm.2024.05.01.014.

Full text
Abstract:
Routing techniques (RTs) play a critical role in modern computing systems that use network-on-chip (NoC) communication infrastructure within multiprocessor system-on-chip (MPSoC) platforms. RTs contribute greatly to the successful performance of NoC-based MPSoCs due to traffic congestion avoidance, quality-of-service assurance, fault handling and optimisation of power usage. This paper outlines our efforts to catalogue RTs, limitations, recommendations and key challenges associated with these RTs used in NoC-based MPSoC systems for the IoT domain. We utilized the PRISMA method to collect data from credible resources, including IEEE Xplore ®, ScienceDirect, Association for Computing Machinery and Web of Science. Out of the 906 research papers reviewed, only 51 were considered relevant to the investigation on NoC RTs. The study addresses issues related to NoC routing and suggests new approaches for in-package data negotiating. In addition, it gives an overview of the recent research on routing strategies and numerous algorithms that can be used for NoC-based MPSoCs. The literature analysis addresses current obstacles and delineates potential future avenues, recommendations, and challenges analyzing techniques to assess performance utilizing metrics within the TCCM framework.
APA, Harvard, Vancouver, ISO, and other styles
4

Lin, Yanru, Yanjun Zhang, and Xu Yang. "A Low Memory Requirement MobileNets Accelerator Based on FPGA for Auxiliary Medical Tasks." Bioengineering 10, no. 1 (2022): 28. http://dx.doi.org/10.3390/bioengineering10010028.

Full text
Abstract:
Convolutional neural networks (CNNs) have been widely applied in the fields of medical tasks because they can achieve high accuracy in many fields using a large number of parameters and operations. However, many applications designed for auxiliary checks or help need to be deployed into portable devices, where the huge number of operations and parameters of a standard CNN can become an obstruction. MobileNet adopts a depthwise separable convolution to replace the standard convolution, which can greatly reduce the number of operations and parameters while maintaining a relatively high accuracy. Such highly structured models are very suitable for FPGA implementation in order to further reduce resource requirements and improve efficiency. Many other implementations focus on performance more than on resource requirements because MobileNets has already reduced both parameters and operations and obtained significant results. However, because many small devices only have limited resources they cannot run MobileNet-like efficient networks in a normal way, and there are still many auxiliary medical applications that require a high-performance network running in real-time to meet the requirements. Hence, we need to figure out a specific accelerator structure to further reduce the memory and other resource requirements while running MobileNet-like efficient networks. In this paper, a MobileNet accelerator is proposed to minimize the on-chip memory capacity and the amount of data that is transferred between on-chip and off-chip memory. We propose two configurable computing modules: Pointwise Convolution Accelerator and Depthwise Convolution Accelerator, to parallelize the network and reduce the memory requirement with a specific dataflow model. At the same time, a new cache usage method is also proposed to further reduce the use of the on-chip memory. We implemented the accelerator on Xilinx XC7Z020, deployed MobileNetV2 on it, and achieved 70.94 FPS with 524.25 KB on-chip memory usage under 150 MHz.
APA, Harvard, Vancouver, ISO, and other styles
5

Sowmya B J and Dr Jamuna S. "Design of Area Efficient Network-On-Chip Router: A Comprehensive Review." International Research Journal on Advanced Engineering Hub (IRJAEH) 2, no. 07 (2024): 1895–908. http://dx.doi.org/10.47392/irjaeh.2024.0260.

Full text
Abstract:
The number of uses for cutting-edge technologies has led to a further growth in a single chip's computational capacity. In this case, several applications want to build on a single chip for computing resources. As a result, connecting the IP cores becomes yet another difficult chore. The many-core System-On-Chips (SoCs) are being replaced by Network-On-Chip (NoC) as an on-chip connectivity option. As a result, the Network on Chip was created as a cutting-edge framework for those networks inside the System on Chip. Modern multiprocessor architectures would benefit more from a NoC architecture as its communication backbone. The most important components of any network structure are its topologies, routing algorithms, and router architectures. NoCs use the routers on each node to route traffic. Circuit complexity, high critical path latency, resource usage, timing, and power efficiency are the primary shortcomings of conventional NoC router architecture. It has been difficult to build a high-performance, low-latency NoC with little area overhead. This paper surveys previous methods and strategies for NoC router topologies and study of general router architecture and its components. Analysis is carried out to understand and work for a low latency, low power consumption, and high performance NoC router design that can be employed with a wide range of FPGA families. In the current work, we are structuring a modified four port router with the goals of low area and high performance operation.
APA, Harvard, Vancouver, ISO, and other styles
6

Sabah, Yousri. "Quantum-Inspired Temporal Synchronization in Dynamic Mesh Networks: A Non-Local Approach to Latency Optimization." Wasit Journal for Pure sciences 4, no. 1 (2025): 86–93. https://doi.org/10.31185/wjps.710.

Full text
Abstract:
This paper presents a novel method for achieving temporal synchronization in Network-on-Chip (NoC) architectures, using optimization techniques derived from quantum mechanics. We provide a non-local temporal coordination framework to optimize network latency in dynamic mesh networks using quantum principles such as entanglement and superposition. A specialized router design using quantum-inspired control units incorporates the Quantum-Inspired Temporal Coordination Algorithm (QTCA) and Non-Local State Synchronization Protocol (NSSP), which are essential components of the proposed architecture. The experimental results indicate that the 16x16 mesh network significantly outperforms conventional route strategies. Latency is diminished by 31.2%, the network saturation threshold is enhanced by 37.8%, and packet loss is decreased by 76.3%. Notwithstanding a minor 8.2% increase in logic overhead and a 5.7% rise in power usage, the framework sustains robust phase coherence (0.92 local, 0.87 non-local). The results demonstrate that next-generation NoC designs might gain from temporal synchronization influenced by quantum computing, particularly in addressing performance and scalability challenges in complex multi-core systems.
APA, Harvard, Vancouver, ISO, and other styles
7

Sheng, Huayi, and Muhammad Shemyal Nisar. "Simulating an Integrated Photonic Image Classifier for Diffractive Neural Networks." Micromachines 15, no. 1 (2023): 50. http://dx.doi.org/10.3390/mi15010050.

Full text
Abstract:
The slowdown of Moore’s law and the existence of the “von Neumann bottleneck” has led to electronic-based computing systems under von Neumann’s architecture being unable to meet the fast-growing demand for artificial intelligence computing. However, all-optical diffractive neural networks provide a possible solution to this challenge. They can outperform conventional silicon-based electronic neural networks due to the significantly higher speed of the propagation of optical signals (≈108 m.s−1) compared to electrical signals (≈105 m.s−1), their parallelism in nature, and their low power consumption. The integrated diffractive deep neural network (ID2NN) uses an on-chip fully passive photonic approach to achieve the functionality of neural networks (matrix–vector operations) and can be fabricated via the CMOS process, which is technologically more amenable to implementing an artificial intelligence processor. In this paper, we present a detailed design framework for the integrated diffractive deep neural network and corresponding silicon-on-insulator integration implementation through Python-based simulations. The performance of our proposed ID2NN was evaluated by solving image classification problems using the MNIST dataset.
APA, Harvard, Vancouver, ISO, and other styles
8

Apoorva, Reddy Proddutoori. "Optimistic Workload Configuration of Parallel Matrices On CPU." European Journal of Advances in Engineering and Technology 8, no. 8 (2021): 66–70. https://doi.org/10.5281/zenodo.12770771.

Full text
Abstract:
This study compares and uses different feature parallelization techniques Fast Fourier Transform (FFT) and Discrete Wavelet Transform (DWT) for classification of matrices. Convolutional Neural Network (CNN) is used to determine the classifications. In the classification, CNN is a unique technique that can be effectively used as a classifier. This study helps to extract features in the most efficient way with less computing time in real life. The framework provides comprehensive and flexible APIs that enable efficient implementation of multi-threaded applications. To meet the real-time performance requirements of these security applications, it is imperative to develop a fast parallelization technique for the algorithm. In this paper, we introduce a new memory-efficient parallelization technique that efficiently places and stores input text data and reference data in an on-chip shared memory and CPU texture cache. For better performance while reducing the power ratio, we extend the parallelization technology to support other major cores of the SOC. OpenCL, a heterogeneous parallel programming model, is used to communicate between CPU and other macro blocks.
APA, Harvard, Vancouver, ISO, and other styles
9

Sui, Xuefu, Qunbo Lv, Liangjie Zhi, et al. "A Hardware-Friendly High-Precision CNN Pruning Method and Its FPGA Implementation." Sensors 23, no. 2 (2023): 824. http://dx.doi.org/10.3390/s23020824.

Full text
Abstract:
To address the problems of large storage requirements, computational pressure, untimely data supply of off-chip memory, and low computational efficiency during hardware deployment due to the large number of convolutional neural network (CNN) parameters, we developed an innovative hardware-friendly CNN pruning method called KRP, which prunes the convolutional kernel on a row scale. A new retraining method based on LR tracking was used to obtain a CNN model with both a high pruning rate and accuracy. Furthermore, we designed a high-performance convolutional computation module on the FPGA platform to help deploy KRP pruning models. The results of comparative experiments on CNNs such as VGG and ResNet showed that KRP has higher accuracy than most pruning methods. At the same time, the KRP method, together with the GSNQ quantization method developed in our previous study, forms a high-precision hardware-friendly network compression framework that can achieve “lossless” CNN compression with a 27× reduction in network model storage. The results of the comparative experiments on the FPGA showed that the KRP pruning method not only requires much less storage space, but also helps to reduce the on-chip hardware resource consumption by more than half and effectively improves the parallelism of the model in FPGAs with a strong hardware-friendly feature. This study provides more ideas for the application of CNNs in the field of edge computing.
APA, Harvard, Vancouver, ISO, and other styles
10

Chen, Hui, Zihao Zhang, Peng Chen, Xiangzhong Luo, Shiqing Li, and Weichen Liu. "MARCO: A High-performance Task M apping a nd R outing Co -optimization Framework for Point-to-Point NoC-based Heterogeneous Computing Systems." ACM Transactions on Embedded Computing Systems 20, no. 5s (2021): 1–21. http://dx.doi.org/10.1145/3476985.

Full text
Abstract:
Heterogeneous computing systems (HCSs), which consist of various processing elements (PEs) that vary in their processing ability, are usually facilitated by the network-on-chip (NoC) to interconnect its components. The emerging point-to-point NoCs which support single-cycle-multi-hop transmission, reduce or eliminate the latency dependence on distance, addressing the scalability concern raised by high latency for long-distance transmission and enlarging the design space of the routing algorithm to search the non-shortest paths. For such point-to-point NoC-based HCSs, resource management strategies which are managed by compilers, scheduler, or controllers, e.g., mapping and routing, are complicated for the following reasons: (i) Due to the heterogeneity, mapping and routing need to optimize computation and communication concurrently (for homogeneous computing systems, only communication). (ii) Conducting mapping and routing consecutively cannot minimize the schedule length in most cases since the PEs with high processing ability may locate in the crowded area and suffer from high resource contention overhead. (iii) Since changing the mapping selection of one task will reconstruct the whole routing design space, the exploration of mapping and routing design space is challenging. Therefore, in this work, we propose MARCO, the m apping a nd r outing co -optimization framework, to decrease the schedule length of applications on point-to-point NoC-based HCSs. Specifically, we revise the tabu search to explore the design space and evaluate the quality of mapping and routing. The advanced reinforcement learning (RL)algorithm, i.e., advantage actor-critic, is adopted to efficiently compute paths. We perform extensive experiments on various real applications, which demonstrates that the MARCO achieves a remarkable performance improvement in terms of schedule length (+44.94% ∼ +50.18%) when compared with the state-of-the-art mapping and routing co-optimization algorithm for homogeneous computing systems. We also compare MARCO with different combinations of state-of-the-art mapping and routing approaches.
APA, Harvard, Vancouver, ISO, and other styles
11

Deyannis, Dimitris, Eva Papadogiannaki, Grigorios Chrysos, Konstantinos Georgopoulos, and Sotiris Ioannidis. "The Diversification and Enhancement of an IDS Scheme for the Cybersecurity Needs of Modern Supply Chains." Elecronics 11, no. 13 (2022): 17. https://doi.org/10.3390/electronics11131944.

Full text
Abstract:
Despite the tremendous socioeconomic importance of supply chains (SCs), security officers and operators are faced with no easy and integrated way for protecting their critical, and interconnected, infrastructures from cyber-attacks. As a result, solutions and methodologies that support the detection of malicious activity on SCs are constantly researched into and proposed. Hence, this work presents the implementation of a low-cost reconfigurable intrusion detection system (IDS), on the edge, that can be easily integrated into SC networks, thereby elevating the featured levels of security. Specifically, the proposed system offers real-time cybersecurity intrusion detection over highspeed networks and services by offloading elements of the security check workloads on dedicated reconfigurable hardware. Our solution uses a novel framework that implements the Aho–Corasick algorithm on the reconfigurable fabric of a multi-processor system-on-chip (MPSoC), which supports parallel matching for multiple network packet patterns. The initial performance evaluation of this proof-of-concept shows that it holds the potential to outperform existing software-based solutions while unburdening SC nodes from demanding cybersecurity check workloads. The proposed system performance and its efficiency were evaluated using a real-life environment in the context of European Union’s Horizon 2020 research and innovation program, i.e., CYRENE.
APA, Harvard, Vancouver, ISO, and other styles
12

Georgios, Smaragdos, Chatzikonstantis Georgios, Kukreja Rahul, et al. "BrainFrame: A node-level heterogeneous accelerator platform for neuron simulations." December 13, 2016. https://doi.org/10.1088/1741-2552/aa7fc5.

Full text
Abstract:
Objective: The advent of High-Performance Computing (HPC) in recent years has led to its increasing use in brain study through computational models. The scale and complexity of such models are constantly increasing, leading to challenging computational requirements. Even though modern HPC platforms can often deal with such challenges, the vast diversity of the modelling field does not permit for a single acceleration (or homogeneous) platform to effectively address the complete array of modelling requirements. Approach: In this paper we propose and build BrainFrame, a heterogeneous acceleration platform, incorporating three distinct acceleration technologies, a Dataflow Engine, a Xeon Phi and a GP-GPU. The PyNN framework is also integrated into the platform. As a challenging proof of concept, we analyse the performance of BrainFrame on different instances of a state-of-the-art neuron model, representing the Inferior-Olivary Nucleus using a biophysically-meaningful, extended Hodgkin-Huxley representation. The model instances take into account not only the neuronal-network dimensions but also different network-connectivity circumstances that can drastically change HPC workload characteristics. Main results: The combined use of HPC fabrics demonstrated that BrainFrame is better able to cope with the modelling diversity encountered. Our performance analysis shows clearly that the model directly affects performance and all three technologies are required to cope with all the model use cases. Significance: The BrainFrame framework is designed to transparently configure and select the appropriate back-end accelerator technology for use per simulation run. The PyNN integration provides a familiar bridge to the vast array of modelling work already conducted. Additionally, it gives a clear roadmap for extending the platform support beyond the proof-of-concept, with improved usability and directly useful features to the computational-neuroscience community, paving the way for wider adoption.
APA, Harvard, Vancouver, ISO, and other styles
13

Mazumdar, Somnath, Alberto Scionti, Stéphane Zuckerman, and Antoni Portero. "NoC-based hardware software co-design framework for dataflow thread management." Journal of Supercomputing, May 11, 2023. http://dx.doi.org/10.1007/s11227-023-05335-8.

Full text
Abstract:
AbstractApplications running in a large and complex manycore system can significantly benefit from adopting the dataflow model of computation. In a dataflow execution environment, a thread can run only if all its required inputs are available. While the potential benefits are large, it is not trivial to improve resource utilization and energy efficiency by focusing on dataflow thread execution models (i.e., the ways specifying how the threads adhering to a dataflow model of computation execute on a given compute/communication architecture). This paper proposes and implements a hardware-software co-design-based dataflow threads management framework. It works at the Network-on-Chip (NoC) level and consists of three stages. The first stage focuses on a fast and effective thread distribution policy. The next stage proposes an approach that adds reconfigurability to a 2D mesh NoC via customized instructions to manage the dataflow thread distribution. Finally, a 2D mesh and ring-based hybrid NoC is proposed for better scalability and higher performance. This work can be considered a primary reference framework from which extensions can be carried out.
APA, Harvard, Vancouver, ISO, and other styles
14

Xu, Jinwei, Jingfei Jiang, Lei Gao, Xifu Qian, and Yong Dou. "SPDFA: A Novel Dataflow Fusion Sparse Deep Neural Network Accelerator." ACM Transactions on Reconfigurable Technology and Systems, May 30, 2025. https://doi.org/10.1145/3737462.

Full text
Abstract:
Unstructured sparse pruning significantly reduces the computational and parametric complexities of deep neural network models. Nevertheless, the highly irregular nature of sparse models limits its performance and efficiency on traditional computing platforms, thereby prompting the development of specialized hardware solutions. To improve computational efficiency, we introduce the Sparse Dataflow Fusion Accelerator (SPDFA), a specialized architecture meticulously designed for sparse deep neural networks. Firstly, we present a non-blocking data distribution-computing engine that integrates inner product and column product. This engine boosts computational efficiency by decomposing matrix multiplication and convolution into rectangular matrix-vector multiplications. Secondly, we implement a computation array to further exploit the parallelism, and design an on-chip buffer structure that supports multi-line memory access mode. Lastly, to bolster the adaptability of our accelerator, we propose an innovative macroinstruction set coupled with a micro-kernel scheme. Furthermore, we refine the macroinstruction issue strategy, thereby further enhancing computational efficiency. Our evaluation results demonstrate that SPDFA achieves an average 1.29 \(\times\) -2.38 \(\times\) improvement in computational efficiency compared to the state-of-the-art SpMM accelerators when applied to unstructured sparse deep neural network models. Furthermore, its performance outperforms existing sparse neural network accelerators by a factor of 1.03 \(\times\) -1.83 \(\times\) . Additionally, SPDFA exhibits excellent scalability with a scaling efficiency exceeding 80%.
APA, Harvard, Vancouver, ISO, and other styles
15

Yu, Miao, Tingting Xiang, Venkata Pavan Kumar Miriyala, and Trevor E. Carlson. "Multiply-and-Fire (MnF): An Event-driven Sparse Neural Network Accelerator." ACM Transactions on Architecture and Code Optimization, October 27, 2023. http://dx.doi.org/10.1145/3630255.

Full text
Abstract:
Deep neural network inference has become a vital workload for many systems, from edge-based computing to data centers. To reduce the performance and power requirements for DNNs running on these systems, pruning is commonly used as a way to maintain most of the accuracy of the system while significantly reducing the workload requirements. Unfortunately, accelerators designed for unstructured pruning typically employ expensive methods to either determine non-zero activation-weight pairings or reorder computation. These methods require additional storage and memory accesses compared to the more regular data access patterns seen in structurally pruned models. However, even existing works that focus on the more regular access patterns seen in structured pruning continue to suffer from inefficient designs, which either ignore or expensively handle activation sparsity leading to low performance. To address these inefficiencies, we leverage structured pruning and propose the multiply-and-fire (MnF) technique, which aims to solve these problems in three ways: (a) the use of a novel event-driven dataflow that naturally exploits activation sparsity without complex, high-overhead logic; (b) an optimized dataflow takes an activation-centric approach, which aims to maximize the reuse of activation data in computation and ensures the data are only fetched once from off-chip global and on-chip local memory; (c) Based on the proposed event-driven dataflow, we develop an energy-efficient, high-performance sparsity-aware DNN accelerator. Our results show that our MnF accelerator achieves a significant improvement across a number of modern benchmarks and presents a new direction to enable highly efficient AI inference for both CNN and MLP workloads. Overall, this work achieves a geometric mean of 11.2 × higher energy efficiency and 1.41 × speedup compared to a state-of-the-art sparsity-aware accelerator.
APA, Harvard, Vancouver, ISO, and other styles
16

Behera, Debasis, and Suvendu Naraya Mishra. "Enhancing Network-On-Chip Performance: Advanced Mmu Techniques For Lower Latency And Higher Bandwidth." International Journal of Computational and Experimental Science and Engineering 11, no. 2 (2025). https://doi.org/10.22399/ijcesen.2556.

Full text
Abstract:
With the increasing complexity of high-performance computing systems, Network-on-Chip (NoC) architectures face critical performance bottlenecks due to memory management latency and inefficient bandwidth utilization. This research presents a novel, mathematically rigorous framework for optimizing NoC performance through advanced Memory Management Unit (MMU) techniques, specifically Translation Lookaside Buffer (TLB) caching and hybrid address mapping. The study develops symbolic models of latency and bandwidth as optimization functions, accounting for memory translation delays and dynamic workload patterns. Using discrete-event simulation based on analytically defined traffic and MMU behavior assumptions, we evaluate performance across various configurations. Our results indicate that hybrid address mapping yields up to 30.7% latency reduction and 32% bandwidth efficiency gain, while TLB caching provides 26.1% latency improvement and 27.3% increased throughput. These findings, derived under theoretical constraints, demonstrate the potential of MMU-level optimizations for significantly enhancing NoC system performance. The proposed model serves as a foundational tool for future adaptive and scalable memory management strategies in edge computing, real-time systems, and data-intensive applications.
APA, Harvard, Vancouver, ISO, and other styles
17

Shao, Yingzhao, Jincheng Shang, Yunsong Li, et al. "A Configurable Accelerator for CNN‐Based Remote Sensing Object Detection on FPGAs." IET Computers & Digital Techniques 2024, no. 1 (2024). http://dx.doi.org/10.1049/2024/4415342.

Full text
Abstract:
Convolutional neural networks (CNNs) have been widely used in satellite remote sensing. However, satellites in orbit with limited resources and power consumption cannot meet the storage and computing power requirements of current million‐scale artificial intelligence models. This paper proposes a new generation of high flexibility and intelligent CNNs hardware accelerator for satellite remote sensing in order to make its computing carrier more lightweight and efficient. A data quantization scheme for INT16 or INT8 is designed based on the idea of dynamic fixed point numbers and is applied to different scenarios. The operation mode of the systolic array is divided into channel blocks, and the calculation method is optimized to increase the utilization of on‐chip computing resources and enhance the calculation efficiency. An RTL‐level CNNs field programable gate arrays accelerator with microinstruction sequence scheduling data flow is then designed. The hardware framework is built upon the Xilinx VC709. The results show that, under INT16 or INT8 precision, the system achieves remarkable throughput in most convolutional layers of the network, with an average performance of 153.14 giga operations per second (GOPS) or 301.52 GOPS, which is close to the system’s peak performance, taking full advantage of the platform’s parallel computing capabilities.
APA, Harvard, Vancouver, ISO, and other styles
18

Ma, Fuqi, Bo Wang, Min Li, et al. "Edge Intelligent Perception Method for Power Grid Icing Condition Based on Multi-Scale Feature Fusion Target Detection and Model Quantization." Frontiers in Energy Research 9 (October 4, 2021). http://dx.doi.org/10.3389/fenrg.2021.754335.

Full text
Abstract:
Insulator is an important equipment of power transmission line. Insulator icing can seriously affect the stable operation of power transmission line. So insulator icing condition monitoring has great significance of the safety and stability of power system. Therefore, this paper proposes a lightweight intelligent recognition method of insulator icing thickness for front-end ice monitoring device. In this method, the residual network (ResNet) and feature pyramid network (FPN) are fused to construct a multi-scale feature extraction network framework, so that the shallow features and deep features are fused to reduce the information loss and improve the target detection accuracy. Then, the full convolution neural network (FCN) is used to classify and regress the iced insulator, so as to realize the high-precision identification of icing thickness. Finally, the proposed method is compressed by model quantization to reduce the size and parameters of the model for adapting the icing monitoring terminal with limited computing resources, and the performance of the method is verified and compared with other classical method on the edge intelligent chip.
APA, Harvard, Vancouver, ISO, and other styles
19

Raju, Bandi, and Shoban Mude. "Neural Architecture Search (NAS) for Auto-Configurable SoC Designs." International Journal For Multidisciplinary Research 7, no. 3 (2025). https://doi.org/10.36948/ijfmr.2025.v07i03.49273.

Full text
Abstract:
The increasing complexity of System-on-Chip (SoC) designs necessitates advanced techniques for automating the configuration of heterogeneous computing resources. Neural Architecture Search (NAS), a subdomain of AutoML, has emerged as a powerful tool to optimize deep neural network architectures. In this paper, we propose a novel framework that integrates NAS into the design flow of Auto-Configurable SoC architectures. By combining design-space exploration (DSE) with hardware-aware NAS algorithms, the proposed approach enables automated customization of SoC components such as accelerators, memory hierarchy, and interconnects. Experimental results demonstrate that the NAS-driven SoC design achieves significant improvements in power, performance, and area (PPA) trade-offs compared to traditional hand-crafted configurations.
APA, Harvard, Vancouver, ISO, and other styles
20

Wang, Song, Qiushuang Yu, Tiantian Xie, Cheng Ma, and Jing Pei. "Approaching the mapping limit with closed-loop mapping strategy for deploying neural networks on neuromorphic hardware." Frontiers in Neuroscience 17 (May 18, 2023). http://dx.doi.org/10.3389/fnins.2023.1168864.

Full text
Abstract:
The decentralized manycore architecture is broadly adopted by neuromorphic chips for its high computing parallelism and memory locality. However, the fragmented memories and decentralized execution make it hard to deploy neural network models onto neuromorphic hardware with high resource utilization and processing efficiency. There are usually two stages during the model deployment: one is the logical mapping that partitions parameters and computations into small slices and allocate each slice into a single core with limited resources; the other is the physical mapping that places each logical core to a physical location in the chip. In this work, we propose the mapping limit concept for the first time that points out the resource saving upper limit in logical and physical mapping. Furthermore, we propose a closed-loop mapping strategy with an asynchronous 4D model partition for logical mapping and a Hamilton loop algorithm (HLA) for physical mapping. We implement the mapping methods on our state-of-the-art neuromorphic chip, TianjicX. Extensive experiments demonstrate the superior performance of our mapping methods, which can not only outperform existing methods but also approach the mapping limit. We believe the mapping limit concept and the closed-loop mapping strategy can help build a general and efficient mapping framework for neuromorphic hardware.
APA, Harvard, Vancouver, ISO, and other styles
21

Lin, Wei-Ting, Hsiang-Yun Cheng, Chia-Lin Yang, et al. "DL-RSIM: A Reliability and Deployment Strategy Simulation Framework for ReRAM-based CNN Accelerators." ACM Transactions on Embedded Computing Systems, January 31, 2022. http://dx.doi.org/10.1145/3507639.

Full text
Abstract:
Memristor-based deep learning accelerators provide a promising solution to improve the energy efficiency of neuromorphic computing systems. However, the electrical properties and crossbar structure of memristors make these accelerators error-prone. In addition, due to the hardware constraints, the way to deploy neural network models on memristor crossbar arrays affects the computation parallelism and communication overheads.To enable reliable and energy-efficient memristor-based accelerators, a simulation platform is needed to precisely analyze the impact of non-ideal circuit/device properties on the inference accuracy and the influence of different deployment strategies on performance and energy consumption. In this paper, we propose a flexible simulation framework, DL-RSIM, to tackle this challenge. A rich set of reliability impact factors and deployment strategies are explored by DL-RSIM, and it can be incorporated with any deep learning neural networks implemented by TensorFlow. Using several representative convolutional neural networks as case studies, we show that DL-RSIM can guide chip designers to choose a reliability-friendly design option and energy-efficient deployment strategies and develop optimization techniques accordingly.
APA, Harvard, Vancouver, ISO, and other styles
22

Boikynia, Artur O., Nikita S. Tkachenko, Yuriy V. Didenko, Ostap O. Oliinyk, and Dmitry D. Tatarchuk. "Investigation of Electrical Signals Transmission through Light-Induced Conductive Channels on the Surface of CdS Single Crystal." Microsystems, Electronics and Acoustics 29, no. 2 (2024). http://dx.doi.org/10.20535/2523-4455.mea.304564.

Full text
Abstract:
Further development of information technologies hinges on innovations in the electronic components sector, particularly in enhancing electronic communication devices. This involves creating dynamic interconnects—electrically conductive channels that can be configured on-demand within chip circuitry to overcome the "tyranny of interconnects," which limits electronic systems due to the fixed nature of conventional interconnects. This paper presents experimental verification of transmitting information through photoconductive channels formed on a photosensitive cadmium sulfide (CdS) semiconductor single crystal using optical irradiation. By directing a focused light beam to specific areas of the CdS crystal, localized conductivity is induced, allowing for the dynamic formation of conductive channels. This method's efficacy in real-time signal transmission validates the theoretical framework and suggests new possibilities for semiconductor technology. The integration of dynamic interconnects could revolutionize communication systems by enhancing device efficiency and processing capabilities. This technology could lead to more complex electronic architectures needed in high-speed computing and advanced telecommunications. Additionally, this approach has potential applications in optoelectronics, improving device interaction with light. Dynamic interconnects could enhance solar cell efficiency, increase light sensor sensitivity, and aid in developing innovative visual displays. The ability to control material conductivity through light not only advances existing device performance but also opens doors to new electronic designs and operations. This includes fully reconfigurable circuits that adapt in real-time, self-optimizing network components, and smart sensors that respond to environmental changes. In summary, this research not only confirms the practicality of using photoconductive channels for information transmission but also emphasizes the significant implications for electronic and communication system advancements. As this technology evolves, it promises to significantly impact the design and functionality of future electronic devices, paving the way for more adaptable and powerful systems.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!