To see the other types of publications on this topic, follow the link: Neural network accelerator.

Journal articles on the topic 'Neural network accelerator'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Neural network accelerator.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Eliahu, Adi, Ronny Ronen, Pierre-Emmanuel Gaillardon, and Shahar Kvatinsky. "multiPULPly." ACM Journal on Emerging Technologies in Computing Systems 17, no. 2 (2021): 1–27. http://dx.doi.org/10.1145/3432815.

Full text
Abstract:
Computationally intensive neural network applications often need to run on resource-limited low-power devices. Numerous hardware accelerators have been developed to speed up the performance of neural network applications and reduce power consumption; however, most focus on data centers and full-fledged systems. Acceleration in ultra-low-power systems has been only partially addressed. In this article, we present multiPULPly, an accelerator that integrates memristive technologies within standard low-power CMOS technology, to accelerate multiplication in neural network inference on ultra-low-pow
APA, Harvard, Vancouver, ISO, and other styles
2

Hong, JiUn, Saad Arslan, TaeGeon Lee, and HyungWon Kim. "Design of Power-Efficient Training Accelerator for Convolution Neural Networks." Electronics 10, no. 7 (2021): 787. http://dx.doi.org/10.3390/electronics10070787.

Full text
Abstract:
To realize deep learning techniques, a type of deep neural network (DNN) called a convolutional neural networks (CNN) is among the most widely used models aimed at image recognition applications. However, there is growing demand for light-weight and low-power neural network accelerators, not only for inference but also for training process. In this paper, we propose a training accelerator that provides low power and compact chip size targeted for mobile and edge computing applications. It accelerates to achieve the real-time processing of both inference and training using concurrent floating-p
APA, Harvard, Vancouver, ISO, and other styles
3

Cho, Jaechan, Yongchul Jung, Seongjoo Lee, and Yunho Jung. "Reconfigurable Binary Neural Network Accelerator with Adaptive Parallelism Scheme." Electronics 10, no. 3 (2021): 230. http://dx.doi.org/10.3390/electronics10030230.

Full text
Abstract:
Binary neural networks (BNNs) have attracted significant interest for the implementation of deep neural networks (DNNs) on resource-constrained edge devices, and various BNN accelerator architectures have been proposed to achieve higher efficiency. BNN accelerators can be divided into two categories: streaming and layer accelerators. Although streaming accelerators designed for a specific BNN network topology provide high throughput, they are infeasible for various sensor applications in edge AI because of their complexity and inflexibility. In contrast, layer accelerators with reasonable reso
APA, Harvard, Vancouver, ISO, and other styles
4

Noskova, E. S., I. E. Zakharov, Y. N. Shkandybin, and S. G. Rykovanov. "Towards energy-efficient neural network calculations." Computer Optics 46, no. 1 (2022): 160–66. http://dx.doi.org/10.18287/2412-6179-co-914.

Full text
Abstract:
Nowadays, the problem of creating high-performance and energy-efficient hardware for Artificial Intelligence tasks is very acute. The most popular solution to this problem is the use of Deep Learning Accelerators, such as GPUs and Tensor Processing Units to run neural networks. Recently, NVIDIA has announced the NVDLA project, which allows one to design neural network accelerators based on an open-source code. This work describes a full cycle of creating a prototype NVDLA accelerator, as well as testing the resulting solution by running the resnet-50 neural network on it. Finally, an assessmen
APA, Harvard, Vancouver, ISO, and other styles
5

Fan, Yuxiao. "Design and research of high-performance convolutional neural network accelerator based on Chipyard." Journal of Physics: Conference Series 2858, no. 1 (2024): 012001. http://dx.doi.org/10.1088/1742-6596/2858/1/012001.

Full text
Abstract:
Abstract Neural network accelerator performs well in the research and verification of neural network models. In this paper, a convolutional neural network accelerator system composed of RISC-V processor core and Gemmini array accelerator is designed in Chisel language within the Chipyard framework, and the acceleration effect of different Gemmini array configurations for different input matrices is further investigated. The result shows that the accelerator system can achieve thousands of times acceleration compared with a single processor for large matrix calculations.
APA, Harvard, Vancouver, ISO, and other styles
6

Xu, Jia, Han Pu, and Dong Wang. "Sparse Convolution FPGA Accelerator Based on Multi-Bank Hash Selection." Micromachines 16, no. 1 (2024): 22. https://doi.org/10.3390/mi16010022.

Full text
Abstract:
Reconfigurable processor-based acceleration of deep convolutional neural network (DCNN) algorithms has emerged as a widely adopted technique, with particular attention on sparse neural network acceleration as an active research area. However, many computing devices that claim high computational power still struggle to execute neural network algorithms with optimal efficiency, low latency, and minimal power consumption. Consequently, there remains significant potential for further exploration into improving the efficiency, latency, and power consumption of neural network accelerators across div
APA, Harvard, Vancouver, ISO, and other styles
7

Ferianc, Martin, Hongxiang Fan, Divyansh Manocha, et al. "Improving Performance Estimation for Design Space Exploration for Convolutional Neural Network Accelerators." Electronics 10, no. 4 (2021): 520. http://dx.doi.org/10.3390/electronics10040520.

Full text
Abstract:
Contemporary advances in neural networks (NNs) have demonstrated their potential in different applications such as in image classification, object detection or natural language processing. In particular, reconfigurable accelerators have been widely used for the acceleration of NNs due to their reconfigurability and efficiency in specific application instances. To determine the configuration of the accelerator, it is necessary to conduct design space exploration to optimize the performance. However, the process of design space exploration is time consuming because of the slow performance evalua
APA, Harvard, Vancouver, ISO, and other styles
8

Sunny, Febin P., Asif Mirza, Mahdi Nikdast, and Sudeep Pasricha. "ROBIN: A Robust Optical Binary Neural Network Accelerator." ACM Transactions on Embedded Computing Systems 20, no. 5s (2021): 1–24. http://dx.doi.org/10.1145/3476988.

Full text
Abstract:
Domain specific neural network accelerators have garnered attention because of their improved energy efficiency and inference performance compared to CPUs and GPUs. Such accelerators are thus well suited for resource-constrained embedded systems. However, mapping sophisticated neural network models on these accelerators still entails significant energy and memory consumption, along with high inference time overhead. Binarized neural networks (BNNs), which utilize single-bit weights, represent an efficient way to implement and deploy neural network models on accelerators. In this paper, we pres
APA, Harvard, Vancouver, ISO, and other styles
9

Tang, Wenkai, and Peiyong Zhang. "GPGCN: A General-Purpose Graph Convolution Neural Network Accelerator Based on RISC-V ISA Extension." Electronics 11, no. 22 (2022): 3833. http://dx.doi.org/10.3390/electronics11223833.

Full text
Abstract:
In the past two years, various graph convolution neural networks (GCNs) accelerators have emerged, each with their own characteristics, but their common disadvantage is that the hardware architecture is not programmable and it is optimized for a specific network and dataset. They may not support acceleration for different GCNs and may not achieve optimal hardware resource utilization for datasets of different sizes. Therefore, given the above shortcomings, and according to the development trend of traditional neural network accelerators, this paper proposes and implements GPGCN: a general-purp
APA, Harvard, Vancouver, ISO, and other styles
10

Xia, Chengpeng, Yawen Chen, Haibo Zhang, Hao Zhang, Fei Dai, and Jigang Wu. "Efficient neural network accelerators with optical computing and communication." Computer Science and Information Systems, no. 00 (2022): 66. http://dx.doi.org/10.2298/csis220131066x.

Full text
Abstract:
Conventional electronic Artificial Neural Networks (ANNs) accelerators focus on architecture design and numerical computation optimization to improve the training efficiency. However, these approaches have recently encountered bottlenecks in terms of energy efficiency and computing performance, which leads to an increase interest in photonic accelerator. Photonic architectures with low energy consumption, high transmission speed and high bandwidth have been considered as an important role for generation of computing architectures. In this paper, to provide a better understanding of optical tec
APA, Harvard, Vancouver, ISO, and other styles
11

Anmin, Kong, and Zhao Bin. "A Parallel Loading Based Accelerator for Convolution Neural Network." International Journal of Machine Learning and Computing 10, no. 5 (2020): 669–74. http://dx.doi.org/10.18178/ijmlc.2020.10.5.989.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

An, Fubang, Lingli Wang, and Xuegong Zhou. "A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural Network." Electronics 12, no. 13 (2023): 2847. http://dx.doi.org/10.3390/electronics12132847.

Full text
Abstract:
Since the lightweight convolutional neural network EfficientNet was proposed by Google in 2019, the series of models have quickly become very popular due to their superior performance with a small number of parameters. However, the existing convolutional neural network hardware accelerators for EfficientNet still have much room to improve the performance of the depthwise convolution, squeeze-and-excitation module and nonlinear activation functions. In this paper, we first design a reconfigurable register array and computational kernel to accelerate the depthwise convolution. Next, we propose a
APA, Harvard, Vancouver, ISO, and other styles
13

Biookaghazadeh, Saman, Pravin Kumar Ravi, and Ming Zhao. "Toward Multi-FPGA Acceleration of the Neural Networks." ACM Journal on Emerging Technologies in Computing Systems 17, no. 2 (2021): 1–23. http://dx.doi.org/10.1145/3432816.

Full text
Abstract:
High-throughput and low-latency Convolutional Neural Network (CNN) inference is increasingly important for many cloud- and edge-computing applications. FPGA-based acceleration of CNN inference has demonstrated various benefits compared to other high-performance devices such as GPGPUs. Current FPGA CNN-acceleration solutions are based on a single FPGA design, which are limited by the available resources on an FPGA. In addition, they can only accelerate conventional 2D neural networks. To address these limitations, we present a generic multi-FPGA solution, written in OpenCL, which can accelerate
APA, Harvard, Vancouver, ISO, and other styles
14

Chen, Weijian, Zhi Qi, Zahid Akhtar, and Kamran Siddique. "Resistive-RAM-Based In-Memory Computing for Neural Network: A Review." Electronics 11, no. 22 (2022): 3667. http://dx.doi.org/10.3390/electronics11223667.

Full text
Abstract:
Processing-in-memory (PIM) is a promising architecture to design various types of neural network accelerators as it ensures the efficiency of computation together with Resistive Random Access Memory (ReRAM). ReRAM has now become a promising solution to enhance computing efficiency due to its crossbar structure. In this paper, a ReRAM-based PIM neural network accelerator is addressed, and different kinds of methods and designs of various schemes are discussed. Various models and architectures implemented for a neural network accelerator are determined for research trends. Further, the limitatio
APA, Harvard, Vancouver, ISO, and other styles
15

Ge, Fen, Ning Wu, Hao Xiao, Yuanyuan Zhang, and Fang Zhou. "Compact Convolutional Neural Network Accelerator for IoT Endpoint SoC." Electronics 8, no. 5 (2019): 497. http://dx.doi.org/10.3390/electronics8050497.

Full text
Abstract:
As a classical artificial intelligence algorithm, the convolutional neural network (CNN) algorithm plays an important role in image recognition and classification and is gradually being applied in the Internet of Things (IoT) system. A compact CNN accelerator for the IoT endpoint System-on-Chip (SoC) is proposed in this paper to meet the needs of CNN computations. Based on analysis of the CNN structure, basic functional modules of CNN such as convolution circuit and pooling circuit with a low data bandwidth and a smaller area are designed, and an accelerator is constructed in the form of four
APA, Harvard, Vancouver, ISO, and other styles
16

Wei, Rongshan, Chenjia Li, Chuandong Chen, Guangyu Sun, and Minghua He. "Memory Access Optimization of a Neural Network Accelerator Based on Memory Controller." Electronics 10, no. 4 (2021): 438. http://dx.doi.org/10.3390/electronics10040438.

Full text
Abstract:
Special accelerator architecture has achieved great success in processor architecture, and it is trending in computer architecture development. However, as the memory access pattern of an accelerator is relatively complicated, the memory access performance is relatively poor, limiting the overall performance improvement of hardware accelerators. Moreover, memory controllers for hardware accelerators have been scarcely researched. We consider that a special accelerator memory controller is essential for improving the memory access performance. To this end, we propose a dynamic random access mem
APA, Harvard, Vancouver, ISO, and other styles
17

Clements, Joseph, and Yingjie Lao. "DeepHardMark: Towards Watermarking Neural Network Hardware." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 4 (2022): 4450–58. http://dx.doi.org/10.1609/aaai.v36i4.20367.

Full text
Abstract:
This paper presents a framework for embedding watermarks into DNN hardware accelerators. Unlike previous works that have looked at protecting the algorithmic intellectual properties of deep learning systems, this work proposes a methodology for defending deep learning hardware. Our methodology embeds modifications into the hardware accelerator's functional blocks that can be revealed with the rightful owner's key DNN and corresponding key sample, verifying the legitimate owner. We propose an Lp-box ADMM based algorithm to co-optimize watermark's hardware overhead and impact on the design's alg
APA, Harvard, Vancouver, ISO, and other styles
18

Xia, Chengpeng, Yawen Chen, Haibo Zhang, and Jigang Wu. "STADIA: Photonic Stochastic Gradient Descent for Neural Network Accelerators." ACM Transactions on Embedded Computing Systems 22, no. 5s (2023): 1–23. http://dx.doi.org/10.1145/3607920.

Full text
Abstract:
Deep Neural Networks (DNNs) have demonstrated great success in many fields such as image recognition and text analysis. However, the ever-increasing sizes of both DNN models and training datasets make deep leaning extremely computation- and memory-intensive. Recently, photonic computing has emerged as a promising technology for accelerating DNNs. While the design of photonic accelerators for DNN inference and forward propagation of DNN training has been widely investigated, the architectural acceleration for equally important backpropagation of DNN training has not been well studied. In this p
APA, Harvard, Vancouver, ISO, and other styles
19

Li, Yihang. "Sparse-Aware Deep Learning Accelerator." Highlights in Science, Engineering and Technology 39 (April 1, 2023): 305–10. http://dx.doi.org/10.54097/hset.v39i.6544.

Full text
Abstract:
In view of the difficulty of hardware implementation of convolutional neural network computing, most of the previous convolutional neural network accelerator designs focused on solving the bottleneck of computational performance and bandwidth, ignoring the importance of convolutional neural network scarcity for accelerator design. In recent years, there are a few convolutional neural network accelerators that can take advantage of the scarcity, but they are usually difficult to consider in terms of computational flexibility, parallel efficiency and resource overhead. In view of the problem tha
APA, Harvard, Vancouver, ISO, and other styles
20

Xie, Xiaoru, Mingyu Zhu, Siyuan Lu, and Zhongfeng Wang. "Efficient Layer-Wise N:M Sparse CNN Accelerator with Flexible SPEC: Sparse Processing Element Clusters." Micromachines 14, no. 3 (2023): 528. http://dx.doi.org/10.3390/mi14030528.

Full text
Abstract:
Recently, the layer-wise N:M fine-grained sparse neural network algorithm (i.e., every M-weights contains N non-zero values) has attracted tremendous attention, as it can effectively reduce the computational complexity with negligible accuracy loss. However, the speed-up potential of this algorithm will not be fully exploited if the right hardware support is lacking. In this work, we design an efficient accelerator for the N:M sparse convolutional neural networks (CNNs) with layer-wise sparse patterns. First, we analyze the performances of different processing element (PE) structures and exten
APA, Harvard, Vancouver, ISO, and other styles
21

Hu, Jian, Xianlong Zhang, and Xiaohua Shi. "Simulating Neural Network Processors." Wireless Communications and Mobile Computing 2022 (February 23, 2022): 1–12. http://dx.doi.org/10.1155/2022/7500195.

Full text
Abstract:
Deep learning has achieved competing results compared with human beings in many fields. Traditionally, deep learning networks are executed on CPUs and GPUs. In recent years, more and more neural network accelerators have been introduced in both academia and industry to improve the performance and energy efficiency for deep learning networks. In this paper, we introduce a flexible and configurable functional NN accelerator simulator, which could be configured to simulate u-architectures for different NN accelerators. The extensible and configurable simulator is helpful for system-level explorat
APA, Harvard, Vancouver, ISO, and other styles
22

Lim, Se-Min, and Sang-Woo Jun. "MobileNets Can Be Lossily Compressed: Neural Network Compression for Embedded Accelerators." Electronics 11, no. 6 (2022): 858. http://dx.doi.org/10.3390/electronics11060858.

Full text
Abstract:
Although neural network quantization is an imperative technology for the computation and memory efficiency of embedded neural network accelerators, simple post-training quantization incurs unacceptable levels of accuracy degradation on some important models targeting embedded systems, such as MobileNets. While explicit quantization-aware training or re-training after quantization can often reclaim lost accuracy, this is not always possible or convenient. We present an alternative approach to compressing such difficult neural networks, using a novel variant of the ZFP lossy floating-point compr
APA, Harvard, Vancouver, ISO, and other styles
23

Yang, Zhi. "Dynamic Logo Design System of Network Media Art Based on Convolutional Neural Network." Mobile Information Systems 2022 (May 31, 2022): 1–10. http://dx.doi.org/10.1155/2022/3247229.

Full text
Abstract:
Nowadays, we are in an era of rapid development of Internet technology and unlimited expansion of information dissemination. While the application of new media and digital multimedia has become more popular, it has also brought Earth shaking changes to our life. In order to solve the problem that the traditional static visual image has been difficult to meet people’s needs, a network media art dynamic logo design system based on convolutional neural network is proposed. Firstly, the software and hardware platform related to accelerator development is introduced, the advanced integrated design
APA, Harvard, Vancouver, ISO, and other styles
24

Afifi, Salma, Febin Sunny, Amin Shafiee, Mahdi Nikdast, and Sudeep Pasricha. "GHOST: A Graph Neural Network Accelerator using Silicon Photonics." ACM Transactions on Embedded Computing Systems 22, no. 5s (2023): 1–25. http://dx.doi.org/10.1145/3609097.

Full text
Abstract:
Graph neural networks (GNNs) have emerged as a powerful approach for modelling and learning from graph-structured data. Multiple fields have since benefitted enormously from the capabilities of GNNs, such as recommendation systems, social network analysis, drug discovery, and robotics. However, accelerating and efficiently processing GNNs require a unique approach that goes beyond conventional artificial neural network accelerators, due to the substantial computational and memory requirements of GNNs. The slowdown of scaling in CMOS platforms also motivates a search for alternative implementat
APA, Harvard, Vancouver, ISO, and other styles
25

Liang, Yong, Junwen Tan, Zhisong Xie, Zetao Chen, Daoqian Lin, and Zhenhao Yang. "Research on Convolutional Neural Network Inference Acceleration and Performance Optimization for Edge Intelligence." Sensors 24, no. 1 (2023): 240. http://dx.doi.org/10.3390/s24010240.

Full text
Abstract:
In recent years, edge intelligence (EI) has emerged, combining edge computing with AI, and specifically deep learning, to run AI algorithms directly on edge devices. In practical applications, EI faces challenges related to computational power, power consumption, size, and cost, with the primary challenge being the trade-off between computational power and power consumption. This has rendered traditional computing platforms unsustainable, making heterogeneous parallel computing platforms a crucial pathway for implementing EI. In our research, we leveraged the Xilinx Zynq 7000 heterogeneous com
APA, Harvard, Vancouver, ISO, and other styles
26

Hosseini, Morteza, and Tinoosh Mohsenin. "Binary Precision Neural Network Manycore Accelerator." ACM Journal on Emerging Technologies in Computing Systems 17, no. 2 (2021): 1–27. http://dx.doi.org/10.1145/3423136.

Full text
Abstract:
This article presents a low-power, programmable, domain-specific manycore accelerator, Binarized neural Network Manycore Accelerator (BiNMAC), which adopts and efficiently executes binary precision weight/activation neural network models. Such networks have compact models in which weights are constrained to only 1 bit and can be packed several in one memory entry that minimizes memory footprint to its finest. Packing weights also facilitates executing single instruction, multiple data with simple circuitry that allows maximizing performance and efficiency. The proposed BiNMAC has light-weight
APA, Harvard, Vancouver, ISO, and other styles
27

Park, Sang-Soo, and Ki-Seok Chung. "CENNA: Cost-Effective Neural Network Accelerator." Electronics 9, no. 1 (2020): 134. http://dx.doi.org/10.3390/electronics9010134.

Full text
Abstract:
Convolutional neural networks (CNNs) are widely adopted in various applications. State-of-the-art CNN models deliver excellent classification performance, but they require a large amount of computation and data exchange because they typically employ many processing layers. Among these processing layers, convolution layers, which carry out many multiplications and additions, account for a major portion of computation and memory access. Therefore, reducing the amount of computation and memory access is the key for high-performance CNNs. In this study, we propose a cost-effective neural network a
APA, Harvard, Vancouver, ISO, and other styles
28

Kim, Dongyoung, Junwhan Ahn, and Sungjoo Yoo. "ZeNA: Zero-Aware Neural Network Accelerator." IEEE Design & Test 35, no. 1 (2018): 39–46. http://dx.doi.org/10.1109/mdat.2017.2741463.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Chen, Tianshi, Zidong Du, Ninghui Sun, et al. "A High-Throughput Neural Network Accelerator." IEEE Micro 35, no. 3 (2015): 24–32. http://dx.doi.org/10.1109/mm.2015.41.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

To, Chun-Hao, Eduardo Rozo, Elisabeth Krause, Hao-Yi Wu, Risa H. Wechsler, and Andrés N. Salcedo. "LINNA: Likelihood Inference Neural Network Accelerator." Journal of Cosmology and Astroparticle Physics 2023, no. 01 (2023): 016. http://dx.doi.org/10.1088/1475-7516/2023/01/016.

Full text
Abstract:
Abstract Bayesian posterior inference of modern multi-probe cosmological analyses incurs massive computational costs. For instance, depending on the combinations of probes, a single posterior inference for the Dark Energy Survey (DES) data had a wall-clock time that ranged from 1 to 21 days using a state-of-the-art computing cluster with 100 cores. These computational costs have severe environmental impacts and the long wall-clock time slows scientific productivity. To address these difficulties, we introduce LINNA: the Likelihood Inference Neural Network Accelerator. Relative to the baseline
APA, Harvard, Vancouver, ISO, and other styles
31

Ro, Yuhwan, Eojin Lee, and Jung Ahn. "Evaluating the Impact of Optical Interconnects on a Multi-Chip Machine-Learning Architecture." Electronics 7, no. 8 (2018): 130. http://dx.doi.org/10.3390/electronics7080130.

Full text
Abstract:
Following trends that emphasize neural networks for machine learning, many studies regarding computing systems have focused on accelerating deep neural networks. These studies often propose utilizing the accelerator specialized in a neural network and the cluster architecture composed of interconnected accelerator chips. We observed that inter-accelerator communication within a cluster has a significant impact on the training time of the neural network. In this paper, we show the advantages of optical interconnects for multi-chip machine-learning architecture by demonstrating performance impro
APA, Harvard, Vancouver, ISO, and other styles
32

Liu, Yang, Yiheng Zhang, Xiaoran Hao, et al. "Design of a Convolutional Neural Network Accelerator Based on On-Chip Data Reordering." Electronics 13, no. 5 (2024): 975. http://dx.doi.org/10.3390/electronics13050975.

Full text
Abstract:
Convolutional neural networks have been widely applied in the field of computer vision. In convolutional neural networks, convolution operations account for more than 90% of the total computational workload. The current mainstream approach to achieving high energy-efficient convolution operations is through dedicated hardware accelerators. Convolution operations involve a significant amount of weights and input feature data. Due to limited on-chip cache space in accelerators, there is a significant amount of off-chip DRAM memory access involved in the computation process. The latency of DRAM a
APA, Harvard, Vancouver, ISO, and other styles
33

Chen, Zhimei. "Hardware Accelerated Optimization of Deep Learning Model on Artificial Intelligence Chip." Frontiers in Computing and Intelligent Systems 6, no. 2 (2023): 11–14. http://dx.doi.org/10.54097/fcis.v6i2.03.

Full text
Abstract:
With the rapid development of deep learning technology, the demand for computing resources is increasing, and the accelerated optimization of hardware on artificial intelligence (AI) chip has become one of the key ways to solve this challenge. This paper aims to explore the hardware acceleration optimization strategy of deep learning model on AI chip to improve the training and inference performance of the model. In this paper, the method and practice of optimizing deep learning model on AI chip are deeply analyzed by comprehensively considering the hardware characteristics such as parallel pr
APA, Harvard, Vancouver, ISO, and other styles
34

Neelam, Srikanth, and A. Amalin Prince. "VCONV: A Convolutional Neural Network Accelerator for FPGAs." Electronics 14, no. 4 (2025): 657. https://doi.org/10.3390/electronics14040657.

Full text
Abstract:
Field Programmable Gate Arrays (FPGAs), with their wide portfolio of configurable resources such as Look-Up Tables (LUTs), Block Random Access Memory (BRAM), and Digital Signal Processing (DSP) blocks, are the best option for custom hardware designs. Their low power consumption and cost-effectiveness give them an advantage over Graphics Processing Units (GPUs) and Central Processing Units (CPUs) in providing efficient accelerator solutions for compute-intensive Convolutional Neural Network (CNN) models. CNN accelerators are dedicated hardware modules capable of performing compute operations su
APA, Harvard, Vancouver, ISO, and other styles
35

Brennsteiner, Stefan, Tughrul Arslan, John Thompson, and Andrew McCormick. "A Real-Time Deep Learning OFDM Receiver." ACM Transactions on Reconfigurable Technology and Systems 15, no. 3 (2022): 1–25. http://dx.doi.org/10.1145/3494049.

Full text
Abstract:
Machine learning in the physical layer of communication systems holds the potential to improve performance and simplify design methodology. Many algorithms have been proposed; however, the model complexity is often unfeasible for real-time deployment. The real-time processing capability of these systems has not been proven yet. In this work, we propose a novel, less complex, fully connected neural network to perform channel estimation and signal detection in an orthogonal frequency division multiplexing system. The memory requirement, which is often the bottleneck for fully connected neural ne
APA, Harvard, Vancouver, ISO, and other styles
36

Cho, Mannhee, and Youngmin Kim. "FPGA-Based Convolutional Neural Network Accelerator with Resource-Optimized Approximate Multiply-Accumulate Unit." Electronics 10, no. 22 (2021): 2859. http://dx.doi.org/10.3390/electronics10222859.

Full text
Abstract:
Convolutional neural networks (CNNs) are widely used in modern applications for their versatility and high classification accuracy. Field-programmable gate arrays (FPGAs) are considered to be suitable platforms for CNNs based on their high performance, rapid development, and reconfigurability. Although many studies have proposed methods for implementing high-performance CNN accelerators on FPGAs using optimized data types and algorithm transformations, accelerators can be optimized further by investigating more efficient uses of FPGA resources. In this paper, we propose an FPGA-based CNN accel
APA, Harvard, Vancouver, ISO, and other styles
37

Choubey, Abhishek, and Shruti Bhargava Choubey. "A Promising Hardware Accelerator with PAST Adder." Advances in Science and Technology 105 (April 2021): 241–48. http://dx.doi.org/10.4028/www.scientific.net/ast.105.241.

Full text
Abstract:
Recent neural network research has demonstrated a significant benefit in machine learning compared to conventional algorithms based on handcrafted models and features. In regions such as video, speech and image recognition, the neural network is now widely adopted. But the high complexity of neural network inference in computation and storage poses great differences on its application. These networks are computer-intensive algorithms that currently require the execution of dedicated hardware. In this case, we point out the difficulty of Adders (MOAs) and their high-resource utilization in a CN
APA, Harvard, Vancouver, ISO, and other styles
38

Huang, Hongmin, Zihao Liu, Taosheng Chen, Xianghong Hu, Qiming Zhang, and Xiaoming Xiong. "Design Space Exploration for YOLO Neural Network Accelerator." Electronics 9, no. 11 (2020): 1921. http://dx.doi.org/10.3390/electronics9111921.

Full text
Abstract:
The You Only Look Once (YOLO) neural network has great advantages and extensive applications in computer vision. The convolutional layers are the most important part of the neural network and take up most of the computation time. Improving the efficiency of the convolution operations can greatly increase the speed of the neural network. Field programmable gate arrays (FPGAs) have been widely used in accelerators for convolutional neural networks (CNNs) thanks to their configurability and parallel computing. This paper proposes a design space exploration for the YOLO neural network based on FPG
APA, Harvard, Vancouver, ISO, and other styles
39

de Sousa, André L., Mário P. Véstias, and Horácio C. Neto. "Multi-Model Inference Accelerator for Binary Convolutional Neural Networks." Electronics 11, no. 23 (2022): 3966. http://dx.doi.org/10.3390/electronics11233966.

Full text
Abstract:
Binary convolutional neural networks (BCNN) have shown good accuracy for small to medium neural network models. Their extreme quantization of weights and activations reduces off-chip data transfer and greatly reduces the computational complexity of convolutions. Further reduction in the complexity of a BCNN model for fast execution can be achieved with model size reduction at the cost of network accuracy. In this paper, a multi-model inference technique is proposed to reduce the execution time of the binarized inference process without accuracy reduction. The technique considers a cascade of n
APA, Harvard, Vancouver, ISO, and other styles
40

Du, Wenhe, Shuoyu Chen, Lei Wang, and Ruili Chai. "Design of Yolov4-Tiny convolutional neural network hardware accelerator based on FPGA." Journal of Physics: Conference Series 2849, no. 1 (2024): 012005. http://dx.doi.org/10.1088/1742-6596/2849/1/012005.

Full text
Abstract:
Abstract This article designs a Yolov4 Tiny convolutional neural network hardware accelerator based on FPGA. A four-stage pipeline convolutional array structure has been proposed. In the design, the NC4HW4 parameter rearrangement and Im2col dimensionality reduction algorithm are used as the core to maximize the parallelism of matrix operations under limited resources. Secondly, a PE convolutional computing unit structure was designed, and a resource-efficient and highly reliable convolutional computing module was implemented by combining INT8 DSP resource reuse technology. Finally, the acceler
APA, Harvard, Vancouver, ISO, and other styles
41

Kang, Soongyu, Seongjoo Lee, and Yunho Jung. "Design of Network-on-Chip-Based Restricted Coulomb Energy Neural Network Accelerator on FPGA Device." Sensors 24, no. 6 (2024): 1891. http://dx.doi.org/10.3390/s24061891.

Full text
Abstract:
Sensor applications in internet of things (IoT) systems, coupled with artificial intelligence (AI) technology, are becoming an increasingly significant part of modern life. For low-latency AI computation in IoT systems, there is a growing preference for edge-based computing over cloud-based alternatives. The restricted coulomb energy neural network (RCE-NN) is a machine learning algorithm well-suited for implementation on edge devices due to its simple learning and recognition scheme. In addition, because the RCE-NN generates neurons as needed, it is easy to adjust the network structure and le
APA, Harvard, Vancouver, ISO, and other styles
42

Wang, Yuejiao, Zhong Ma, and Zunming Yang. "Sequential Characteristics Based Operators Disassembly Quantization Method for LSTM Layers." Applied Sciences 12, no. 24 (2022): 12744. http://dx.doi.org/10.3390/app122412744.

Full text
Abstract:
Embedded computing platforms such as neural network accelerators deploying neural network models need to quantize the values into low-bit integers through quantization operations. However, most current embedded computing platforms with a fixed-point architecture do not directly support performing the quantization operation for the LSTM layer. Meanwhile, the influence of sequential input data for LSTM has not been taken into account by quantization algorithms. Aiming at these two technical bottlenecks, a new sequential-characteristics-based operators disassembly quantization method for LSTM lay
APA, Harvard, Vancouver, ISO, and other styles
43

Kumar, Pramod. "Review of Advanced Methods in Hardware Acceleration for Deep Neural Networks." International Journal for Research in Applied Science and Engineering Technology 12, no. 5 (2024): 4523–29. http://dx.doi.org/10.22214/ijraset.2024.62595.

Full text
Abstract:
Abstract: Convolutional neural networks have become very efficient in performing tasks like Object Detection providing human like accuracy. However, their practical implementation needs significant hardware resources and memory bandwidth. In recent past a lot of research is being carried out for achieving higher efficiency in implementing such neural networks in hardware. We talk about FPGAs for hardware implementation due to their flexibility for customisation for such neural network architectures. In this paper we will discuss the metrics for efficient hardware accelerator and general method
APA, Harvard, Vancouver, ISO, and other styles
44

Kim, Jeonghun, and Sunggu Lee. "Fast Design Space Exploration for Always-On Neural Networks." Electronics 13, no. 24 (2024): 4971. https://doi.org/10.3390/electronics13244971.

Full text
Abstract:
An analytical model can quickly predict performance and energy efficiency based on information about the neural network model and neural accelerator architecture, making it ideal for rapid pre-synthesis design space exploration. This paper proposes a new analytical model specifically targeted for convolutional neural networks used in always-on applications. To validate the proposed model, the performance and energy efficiency estimated by the model were compared with actual hardware and post-synthesis gate-level simulations of hardware synthesized with a state-of-the-art electronic design auto
APA, Harvard, Vancouver, ISO, and other styles
45

Cosatto, E., and H. P. Craf. "A neural network accelerator for image analysis." IEEE Micro 15, no. 3 (1995): 32–38. http://dx.doi.org/10.1109/40.387680.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Kuznar, Damian, Robert Szczygiel, Piotr Maj, and Anna Kozioł. "Design of artificial neural network hardware accelerator." Journal of Instrumentation 18, no. 04 (2023): C04013. http://dx.doi.org/10.1088/1748-0221/18/04/c04013.

Full text
Abstract:
Abstract We present a design of the scalable processor capable of providing an artificial neural network (ANN) functionality and in-house developed tools for automatic conversion of an ANN model designed with the TensorFlow library into an HDL code. The hardware is described in SystemVerilog and the synthesized module of the processor can perform calculations of a neural network with the speed exceeding 100 MHz. Our in-house designed software tool for ANN conversion supports translation of an arbitrary multilayer perceptron neural network into a state machine module, which performs necessary c
APA, Harvard, Vancouver, ISO, and other styles
47

Xing, Siyuan, Qingyu Han, and Efstathios G. Charalampidis. "CombOpNet: a Neural-Network Accelerator for SINDy." Journal of Vibration Testing and System Dynamics 9, no. 1 (2025): 1–20. https://doi.org/10.5890/jvtsd.2025.03.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Seto, Kenshu. "A Survey on System-Level Design of Neural Network Accelerators." Journal of Integrated Circuits and Systems 16, no. 2 (2021): 1–10. http://dx.doi.org/10.29292/jics.v16i2.505.

Full text
Abstract:
In this paper, we present a brief survey on the system-level optimizations used for convolutional neural network (CNN) inference accelerators. For the nested loop of convolutional (CONV) layers, we discuss the effects of loop optimizations such as loop interchange, tiling, unrolling and fusion on CNN accelerators. We also explain memory optimizations that are effective with the loop optimizations. In addition, we discuss streaming architectures and single computation engine architectures that are commonly used in CNN accelerators. Optimizations for CNN models are briefly explained, followed by
APA, Harvard, Vancouver, ISO, and other styles
49

Paulenka, D. A. "Comparative analysis of single-board computers for the development of a microarchitectural computing system for fire detection." Informatics 21, no. 2 (2024): 73–85. http://dx.doi.org/10.37661/1816-0301-2024-21-2-73-85.

Full text
Abstract:
Objectives. The purpose of the work is to select the basic computing microplatform of the onboard microarchitectural computing complex for the detection of anomalous situations in the territory of the Republic of Belarus from space on the basis of artificial intelligence methods.Methods. The method of comparative analysis is used to select a computing platform. A series of performance tests and comparative analysis (benchmarking) are performed on the selected equipment. The methods of comparative and benchmarking analysis are performed in accordance with the terms of reference to the current p
APA, Harvard, Vancouver, ISO, and other styles
50

Park, Sang-Soo, and Ki-Seok Chung. "CONNA: Configurable Matrix Multiplication Engine for Neural Network Acceleration." Electronics 11, no. 15 (2022): 2373. http://dx.doi.org/10.3390/electronics11152373.

Full text
Abstract:
Convolutional neural networks (CNNs) have demonstrated promising results in various applications such as computer vision, speech recognition, and natural language processing. One of the key computations in many CNN applications is matrix multiplication, which accounts for a significant portion of computation. Therefore, hardware accelerators to effectively speed up the computation of matrix multiplication have been proposed, and several studies have attempted to design hardware accelerators to perform better matrix multiplications in terms of both speed and power consumption. Typically, accele
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!