Academic literature on the topic 'Network-on-chip, Dataflow Computing, Performance, Framework'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Network-on-chip, Dataflow Computing, Performance, Framework.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Network-on-chip, Dataflow Computing, Performance, Framework"

1

Alam, Shahanur, Chris Yakopcic, Qing Wu, Mark Barnell, Simon Khan, and Tarek M. Taha. "Survey of Deep Learning Accelerators for Edge and Emerging Computing." Electronics 13, no. 15 (2024): 2988. http://dx.doi.org/10.3390/electronics13152988.

Full text
Abstract:
The unprecedented progress in artificial intelligence (AI), particularly in deep learning algorithms with ubiquitous internet connected smart devices, has created a high demand for AI computing on the edge devices. This review studied commercially available edge processors, and the processors that are still in industrial research stages. We categorized state-of-the-art edge processors based on the underlying architecture, such as dataflow, neuromorphic, and processing in-memory (PIM) architecture. The processors are analyzed based on their performance, chip area, energy efficiency, and application domains. The supported programming frameworks, model compression, data precision, and the CMOS fabrication process technology are discussed. Currently, most commercial edge processors utilize dataflow architectures. However, emerging non-von Neumann computing architectures have attracted the attention of the industry in recent years. Neuromorphic processors are highly efficient for performing computation with fewer synaptic operations, and several neuromorphic processors offer online training for secured and personalized AI applications. This review found that the PIM processors show significant energy efficiency and consume less power compared to dataflow and neuromorphic processors. A future direction of the industry could be to implement state-of-the-art deep learning algorithms in emerging non-von Neumann computing paradigms for low-power computing on edge devices.
APA, Harvard, Vancouver, ISO, and other styles
2

Fang, Juan, Sitong Liu, Shijian Liu, Yanjin Cheng, and Lu Yu. "Hybrid Network-on-Chip: An Application-Aware Framework for Big Data." Complexity 2018 (July 30, 2018): 1–11. http://dx.doi.org/10.1155/2018/1040869.

Full text
Abstract:
Burst growing IoT and cloud computing demand exascale computing systems with high performance and low power consumption to process massive amounts of data. Modern system platforms based on fundamental requirements encounter a performance gap in chasing exponential growth in data speed and amount. To narrow the gap, a heterogamous design gives us a hint. A network-on-chip (NoC) introduces a packet-switched fabric for on-chip communication and becomes the de facto many-core interconnection mechanism; it refers to a vital shared resource for multifarious applications which will notably affect system energy efficiency. Among all the challenges in NoC, unaware application behaviors bring about considerable congestion, which wastes huge amounts of bandwidth and power consumption on the chip. In this paper, we propose a hybrid NoC framework, combining buffered and bufferless NoCs, to make the NoC framework aware of applications’ performance demands. An optimized congestion control scheme is also devised to satisfy the requirement in energy efficiency and the fairness of big data applications. We use a trace-driven simulator to model big data applications. Compared with the classical buffered NoC, the proposed hybrid NoC is able to significantly improve the performance of mixed applications by 17% on average and 24% at the most, decrease the power consumption by 38%, and improve the fairness by 13.3%.
APA, Harvard, Vancouver, ISO, and other styles
3

Muhsen, Yousif, Nor Azura Husin, Maslina Binti Zolkepli, Noridayu Manshor, Ahmed Abbas Jasim Al-Hchaimi, and A. S. Albahri. "Routing Techniques in Network-On-Chip Based Multiprocessor-System-on-Chip for IOT: A Systematic Review." Iraqi Journal For Computer Science and Mathematics 5, no. 1 (2024): 181–204. http://dx.doi.org/10.52866/ijcsm.2024.05.01.014.

Full text
Abstract:
Routing techniques (RTs) play a critical role in modern computing systems that use network-on-chip (NoC) communication infrastructure within multiprocessor system-on-chip (MPSoC) platforms. RTs contribute greatly to the successful performance of NoC-based MPSoCs due to traffic congestion avoidance, quality-of-service assurance, fault handling and optimisation of power usage. This paper outlines our efforts to catalogue RTs, limitations, recommendations and key challenges associated with these RTs used in NoC-based MPSoC systems for the IoT domain. We utilized the PRISMA method to collect data from credible resources, including IEEE Xplore ®, ScienceDirect, Association for Computing Machinery and Web of Science. Out of the 906 research papers reviewed, only 51 were considered relevant to the investigation on NoC RTs. The study addresses issues related to NoC routing and suggests new approaches for in-package data negotiating. In addition, it gives an overview of the recent research on routing strategies and numerous algorithms that can be used for NoC-based MPSoCs. The literature analysis addresses current obstacles and delineates potential future avenues, recommendations, and challenges analyzing techniques to assess performance utilizing metrics within the TCCM framework.
APA, Harvard, Vancouver, ISO, and other styles
4

Lin, Yanru, Yanjun Zhang, and Xu Yang. "A Low Memory Requirement MobileNets Accelerator Based on FPGA for Auxiliary Medical Tasks." Bioengineering 10, no. 1 (2022): 28. http://dx.doi.org/10.3390/bioengineering10010028.

Full text
Abstract:
Convolutional neural networks (CNNs) have been widely applied in the fields of medical tasks because they can achieve high accuracy in many fields using a large number of parameters and operations. However, many applications designed for auxiliary checks or help need to be deployed into portable devices, where the huge number of operations and parameters of a standard CNN can become an obstruction. MobileNet adopts a depthwise separable convolution to replace the standard convolution, which can greatly reduce the number of operations and parameters while maintaining a relatively high accuracy. Such highly structured models are very suitable for FPGA implementation in order to further reduce resource requirements and improve efficiency. Many other implementations focus on performance more than on resource requirements because MobileNets has already reduced both parameters and operations and obtained significant results. However, because many small devices only have limited resources they cannot run MobileNet-like efficient networks in a normal way, and there are still many auxiliary medical applications that require a high-performance network running in real-time to meet the requirements. Hence, we need to figure out a specific accelerator structure to further reduce the memory and other resource requirements while running MobileNet-like efficient networks. In this paper, a MobileNet accelerator is proposed to minimize the on-chip memory capacity and the amount of data that is transferred between on-chip and off-chip memory. We propose two configurable computing modules: Pointwise Convolution Accelerator and Depthwise Convolution Accelerator, to parallelize the network and reduce the memory requirement with a specific dataflow model. At the same time, a new cache usage method is also proposed to further reduce the use of the on-chip memory. We implemented the accelerator on Xilinx XC7Z020, deployed MobileNetV2 on it, and achieved 70.94 FPS with 524.25 KB on-chip memory usage under 150 MHz.
APA, Harvard, Vancouver, ISO, and other styles
5

Sowmya B J and Dr Jamuna S. "Design of Area Efficient Network-On-Chip Router: A Comprehensive Review." International Research Journal on Advanced Engineering Hub (IRJAEH) 2, no. 07 (2024): 1895–908. http://dx.doi.org/10.47392/irjaeh.2024.0260.

Full text
Abstract:
The number of uses for cutting-edge technologies has led to a further growth in a single chip's computational capacity. In this case, several applications want to build on a single chip for computing resources. As a result, connecting the IP cores becomes yet another difficult chore. The many-core System-On-Chips (SoCs) are being replaced by Network-On-Chip (NoC) as an on-chip connectivity option. As a result, the Network on Chip was created as a cutting-edge framework for those networks inside the System on Chip. Modern multiprocessor architectures would benefit more from a NoC architecture as its communication backbone. The most important components of any network structure are its topologies, routing algorithms, and router architectures. NoCs use the routers on each node to route traffic. Circuit complexity, high critical path latency, resource usage, timing, and power efficiency are the primary shortcomings of conventional NoC router architecture. It has been difficult to build a high-performance, low-latency NoC with little area overhead. This paper surveys previous methods and strategies for NoC router topologies and study of general router architecture and its components. Analysis is carried out to understand and work for a low latency, low power consumption, and high performance NoC router design that can be employed with a wide range of FPGA families. In the current work, we are structuring a modified four port router with the goals of low area and high performance operation.
APA, Harvard, Vancouver, ISO, and other styles
6

Sabah, Yousri. "Quantum-Inspired Temporal Synchronization in Dynamic Mesh Networks: A Non-Local Approach to Latency Optimization." Wasit Journal for Pure sciences 4, no. 1 (2025): 86–93. https://doi.org/10.31185/wjps.710.

Full text
Abstract:
This paper presents a novel method for achieving temporal synchronization in Network-on-Chip (NoC) architectures, using optimization techniques derived from quantum mechanics. We provide a non-local temporal coordination framework to optimize network latency in dynamic mesh networks using quantum principles such as entanglement and superposition. A specialized router design using quantum-inspired control units incorporates the Quantum-Inspired Temporal Coordination Algorithm (QTCA) and Non-Local State Synchronization Protocol (NSSP), which are essential components of the proposed architecture. The experimental results indicate that the 16x16 mesh network significantly outperforms conventional route strategies. Latency is diminished by 31.2%, the network saturation threshold is enhanced by 37.8%, and packet loss is decreased by 76.3%. Notwithstanding a minor 8.2% increase in logic overhead and a 5.7% rise in power usage, the framework sustains robust phase coherence (0.92 local, 0.87 non-local). The results demonstrate that next-generation NoC designs might gain from temporal synchronization influenced by quantum computing, particularly in addressing performance and scalability challenges in complex multi-core systems.
APA, Harvard, Vancouver, ISO, and other styles
7

Sheng, Huayi, and Muhammad Shemyal Nisar. "Simulating an Integrated Photonic Image Classifier for Diffractive Neural Networks." Micromachines 15, no. 1 (2023): 50. http://dx.doi.org/10.3390/mi15010050.

Full text
Abstract:
The slowdown of Moore’s law and the existence of the “von Neumann bottleneck” has led to electronic-based computing systems under von Neumann’s architecture being unable to meet the fast-growing demand for artificial intelligence computing. However, all-optical diffractive neural networks provide a possible solution to this challenge. They can outperform conventional silicon-based electronic neural networks due to the significantly higher speed of the propagation of optical signals (≈108 m.s−1) compared to electrical signals (≈105 m.s−1), their parallelism in nature, and their low power consumption. The integrated diffractive deep neural network (ID2NN) uses an on-chip fully passive photonic approach to achieve the functionality of neural networks (matrix–vector operations) and can be fabricated via the CMOS process, which is technologically more amenable to implementing an artificial intelligence processor. In this paper, we present a detailed design framework for the integrated diffractive deep neural network and corresponding silicon-on-insulator integration implementation through Python-based simulations. The performance of our proposed ID2NN was evaluated by solving image classification problems using the MNIST dataset.
APA, Harvard, Vancouver, ISO, and other styles
8

Apoorva, Reddy Proddutoori. "Optimistic Workload Configuration of Parallel Matrices On CPU." European Journal of Advances in Engineering and Technology 8, no. 8 (2021): 66–70. https://doi.org/10.5281/zenodo.12770771.

Full text
Abstract:
This study compares and uses different feature parallelization techniques Fast Fourier Transform (FFT) and Discrete Wavelet Transform (DWT) for classification of matrices. Convolutional Neural Network (CNN) is used to determine the classifications. In the classification, CNN is a unique technique that can be effectively used as a classifier. This study helps to extract features in the most efficient way with less computing time in real life. The framework provides comprehensive and flexible APIs that enable efficient implementation of multi-threaded applications. To meet the real-time performance requirements of these security applications, it is imperative to develop a fast parallelization technique for the algorithm. In this paper, we introduce a new memory-efficient parallelization technique that efficiently places and stores input text data and reference data in an on-chip shared memory and CPU texture cache. For better performance while reducing the power ratio, we extend the parallelization technology to support other major cores of the SOC. OpenCL, a heterogeneous parallel programming model, is used to communicate between CPU and other macro blocks.
APA, Harvard, Vancouver, ISO, and other styles
9

Sui, Xuefu, Qunbo Lv, Liangjie Zhi, et al. "A Hardware-Friendly High-Precision CNN Pruning Method and Its FPGA Implementation." Sensors 23, no. 2 (2023): 824. http://dx.doi.org/10.3390/s23020824.

Full text
Abstract:
To address the problems of large storage requirements, computational pressure, untimely data supply of off-chip memory, and low computational efficiency during hardware deployment due to the large number of convolutional neural network (CNN) parameters, we developed an innovative hardware-friendly CNN pruning method called KRP, which prunes the convolutional kernel on a row scale. A new retraining method based on LR tracking was used to obtain a CNN model with both a high pruning rate and accuracy. Furthermore, we designed a high-performance convolutional computation module on the FPGA platform to help deploy KRP pruning models. The results of comparative experiments on CNNs such as VGG and ResNet showed that KRP has higher accuracy than most pruning methods. At the same time, the KRP method, together with the GSNQ quantization method developed in our previous study, forms a high-precision hardware-friendly network compression framework that can achieve “lossless” CNN compression with a 27× reduction in network model storage. The results of the comparative experiments on the FPGA showed that the KRP pruning method not only requires much less storage space, but also helps to reduce the on-chip hardware resource consumption by more than half and effectively improves the parallelism of the model in FPGAs with a strong hardware-friendly feature. This study provides more ideas for the application of CNNs in the field of edge computing.
APA, Harvard, Vancouver, ISO, and other styles
10

Chen, Hui, Zihao Zhang, Peng Chen, Xiangzhong Luo, Shiqing Li, and Weichen Liu. "MARCO: A High-performance Task M apping a nd R outing Co -optimization Framework for Point-to-Point NoC-based Heterogeneous Computing Systems." ACM Transactions on Embedded Computing Systems 20, no. 5s (2021): 1–21. http://dx.doi.org/10.1145/3476985.

Full text
Abstract:
Heterogeneous computing systems (HCSs), which consist of various processing elements (PEs) that vary in their processing ability, are usually facilitated by the network-on-chip (NoC) to interconnect its components. The emerging point-to-point NoCs which support single-cycle-multi-hop transmission, reduce or eliminate the latency dependence on distance, addressing the scalability concern raised by high latency for long-distance transmission and enlarging the design space of the routing algorithm to search the non-shortest paths. For such point-to-point NoC-based HCSs, resource management strategies which are managed by compilers, scheduler, or controllers, e.g., mapping and routing, are complicated for the following reasons: (i) Due to the heterogeneity, mapping and routing need to optimize computation and communication concurrently (for homogeneous computing systems, only communication). (ii) Conducting mapping and routing consecutively cannot minimize the schedule length in most cases since the PEs with high processing ability may locate in the crowded area and suffer from high resource contention overhead. (iii) Since changing the mapping selection of one task will reconstruct the whole routing design space, the exploration of mapping and routing design space is challenging. Therefore, in this work, we propose MARCO, the m apping a nd r outing co -optimization framework, to decrease the schedule length of applications on point-to-point NoC-based HCSs. Specifically, we revise the tabu search to explore the design space and evaluate the quality of mapping and routing. The advanced reinforcement learning (RL)algorithm, i.e., advantage actor-critic, is adopted to efficiently compute paths. We perform extensive experiments on various real applications, which demonstrates that the MARCO achieves a remarkable performance improvement in terms of schedule length (+44.94% ∼ +50.18%) when compared with the state-of-the-art mapping and routing co-optimization algorithm for homogeneous computing systems. We also compare MARCO with different combinations of state-of-the-art mapping and routing approaches.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Network-on-chip, Dataflow Computing, Performance, Framework"

1

MAZUMDAR, SOMNATH. "An Efficient NoC-based Framework To Improve Dataflow Thread Management At Runtime." Doctoral thesis, Università di Siena, 2017. http://hdl.handle.net/11365/1011261.

Full text
Abstract:
This doctoral thesis focuses on how the application threads that are based on dataflow execution model can be managed at Network-on-Chip (NoC) level. The roots of the dataflow execution model date back to the early 1970’s. Applications adhering to such program execution model follow a simple producer-consumer communication scheme for synchronising parallel thread related activities. In dataflow execution environment, a thread can run if and only if all its required inputs are available. Applications running on a large and complex computing environment can significantly benefit from the adoption of dataflow model. In the first part of the thesis, the work is focused on the thread distribution mechanism. It has been shown that how a scalable hash-based thread distribution mechanism can be implemented at the router level with low overheads. To enhance the support further, a tool to monitor the dataflow threads’ status and a simple, functional model is also incorporated into the design. Next, a software defined NoC has been proposed to manage the distribution of dataflow threads by exploiting its reconfigurability. The second part of this work is focused more on NoC microarchitecture level. Traditional 2D-mesh topology is combined with a standard ring, to understand how such hybrid network topology can outperform the traditional topology (such as 2D-mesh). Finally, a mixed-integer linear programming based analytical model has been proposed to verify if the application threads mapped on to the free cores is optimal or not. The proposed mathematical model can be used as a yardstick to verify the solution quality of the newly developed mapping policy. It is not trivial to provide a complete low-level framework for dataflow thread execution for better resource and power management. However, this work could be considered as a primary framework to which improvements could be carried out.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Network-on-chip, Dataflow Computing, Performance, Framework"

1

Jegadeesan, R., and A. Devi. "Parallel Processing Frameworks on FPGA for High Throughput Neural Network Inference." In Smart Microcontrollers and FPGA Based Architectures for Advanced Computing and Signal Processing, 2025th ed. RADemics Research Institute, 2025. https://doi.org/10.71443/9789349552425-10.

Full text
Abstract:
The increasing demand for real-time, energy-efficient, and high-throughput inference of deep neural networks has positioned FPGAs as a compelling hardware platform due to their inherent parallelism, reconfigurability, and customizability. This book chapter investigates advanced parallel processing frameworks on FPGAs tailored for neural network acceleration, emphasizing architectural strategies that balance throughput, latency, and resource constraints. A comprehensive analysis of data-level, task-level, pipeline, spatial, and hybrid parallelism is presented, with a focus on their synergistic deployment to meet the unique computational requirements of diverse deep learning models. Particular attention is given to loop pipelining and systolic array-based spatial parallelism for matrix-intensive workloads, along with latencyoptimized inter-PE communication schemes. Model-specific parallelism control using metacompilers and high-level synthesis (HLS) pragmas is explored to demonstrate how automation and model-awareness can drive architectural customization and performance scaling. By integrating hardware-efficient techniques such as LUT-based activation computation, memory-optimized dataflows, and pragma-directed code generation, the chapter outlines a practical path from algorithmic description to deployable FPGA inference engines. The interaction between architectural design choices and neural model characteristics is dissected to uncover optimization opportunities for edge AI, embedded processing, and real-time signal interpretation. Experimental insights and synthesis-driven validations further reinforce the feasibility of proposed frameworks under realistic resource and timing constraints.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Network-on-chip, Dataflow Computing, Performance, Framework"

1

Kim, Hanjoon, Seulki Heo, Junghoon Lee, Jaehyuk Huh, and John Kim. "On-Chip Network Evaluation Framework." In 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2010. http://dx.doi.org/10.1109/sc.2010.35.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Li, Yixiao, Yutaka Matsubara, Daniel Olbrys, Kazuhiro Kajio, Takashi Inada, and Hiroaki Takada. "Agile Software Design Verification and Validation (V&V) for Automated Driving." In FISITA World Congress 2021. FISITA, 2021. http://dx.doi.org/10.46720/f2020-ves-017.

Full text
Abstract:
Automated Driving System (ADS) generally consists of 3 functions 1) Recognition, 2) Planning, 3) Control. Precise vehicle localization and accurate recognition of objects (vehicle, pedestrian, lane, traffic sign, etc.) are typically based on high-definition dynamic maps and data from multiple sensors (e.g. Camera, LiDAR, Radar). Planners, especially those for optimal path and trajectory, tend to be computationally intensive. Many applications in ADS use machine learning techniques such as DNN (Deep Neural Network), which further increase the demand for computing power. To parallelly process massive tasks and data in real-time, scalable software and high-performance SoC (System on Chip) with many CPUs or processing cores, and hardware accelerators (e.g. GPU, DLA) have been adopted. However, ADS software and SoC hardware architecture are so large and complex that software validation at later testing phase is inefficient and costly. Due to continuous ADS software evolution and iterations, software redesign will occur much more frequently than traditional automotive systems. The productivity of software validation must be improved to avoid the unacceptable bloat of required effort and time. This paper explores how to obtain optimal ADS software scheduling design and how to enable agile ADS software V&V (Verification and Validation) in order to release the product in short development cycle. The proposed agile software V&V framework integrates the design verification with scheduling simulator in PC and the validation with debugging and tracing tools for the hardware target, which is usually an embedded board. We developed utility tools to make the proposed framework seamless and automated. The evaluation results indicate that the proposed framework can efficiently explore the optimal scheduling design (e.g. scheduling policy, thread priority, core affinity) satisfying several non-functional requirements (e.g. response time, CPU utilization) for ADS. We also proved that the framework is practical and can be incorporated into agile ADS software development by validating it through the project. Key words: - Automated Driving System (ADS) - System on Chip (SOC) - Deep Neural Network (DNN) - Optimal Scheduling Design - Verification and Validation (V&V)
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!