Literatura académica sobre el tema "Neural network accelerator"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte las listas temáticas de artículos, libros, tesis, actas de conferencias y otras fuentes académicas sobre el tema "Neural network accelerator".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Artículos de revistas sobre el tema "Neural network accelerator"

1

Eliahu, Adi, Ronny Ronen, Pierre-Emmanuel Gaillardon, and Shahar Kvatinsky. "multiPULPly." ACM Journal on Emerging Technologies in Computing Systems 17, no. 2 (2021): 1–27. http://dx.doi.org/10.1145/3432815.

Texto completo
Resumen
Computationally intensive neural network applications often need to run on resource-limited low-power devices. Numerous hardware accelerators have been developed to speed up the performance of neural network applications and reduce power consumption; however, most focus on data centers and full-fledged systems. Acceleration in ultra-low-power systems has been only partially addressed. In this article, we present multiPULPly, an accelerator that integrates memristive technologies within standard low-power CMOS technology, to accelerate multiplication in neural network inference on ultra-low-power systems. This accelerator was designated for PULP, an open-source microcontroller system that uses low-power RISC-V processors. Memristors were integrated into the accelerator to enable power consumption only when the memory is active, to continue the task with no context-restoring overhead, and to enable highly parallel analog multiplication. To reduce the energy consumption, we propose novel dataflows that handle common multiplication scenarios and are tailored for our architecture. The accelerator was tested on FPGA and achieved a peak energy efficiency of 19.5 TOPS/W, outperforming state-of-the-art accelerators by 1.5× to 4.5×.
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Hong, JiUn, Saad Arslan, TaeGeon Lee, and HyungWon Kim. "Design of Power-Efficient Training Accelerator for Convolution Neural Networks." Electronics 10, no. 7 (2021): 787. http://dx.doi.org/10.3390/electronics10070787.

Texto completo
Resumen
To realize deep learning techniques, a type of deep neural network (DNN) called a convolutional neural networks (CNN) is among the most widely used models aimed at image recognition applications. However, there is growing demand for light-weight and low-power neural network accelerators, not only for inference but also for training process. In this paper, we propose a training accelerator that provides low power and compact chip size targeted for mobile and edge computing applications. It accelerates to achieve the real-time processing of both inference and training using concurrent floating-point data paths. The proposed accelerator can be externally controlled and employs resource sharing and an integrated convolution-pooling block to achieve low area and low energy consumption. We implemented the proposed training accelerator in an FPGA (Field Programmable Gate Array) and evaluated its training performance using an MNIST CNN example in comparison with a PC with GPU (Graphics Processing Unit). While both methods achieved a similar training accuracy of 95.1%, the proposed accelerator, when implemented in a silicon chip, reduced the energy consumption by 480 times compared to the counterpart. Additionally, when implemented on an FPGA, an energy reduction of over 4.5 times was achieved compared to the existing FPGA training accelerator for the MNIST dataset. Therefore, the proposed accelerator is more suitable for deployment in mobile/edge nodes compared to the existing software and hardware accelerators.
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Cho, Jaechan, Yongchul Jung, Seongjoo Lee, and Yunho Jung. "Reconfigurable Binary Neural Network Accelerator with Adaptive Parallelism Scheme." Electronics 10, no. 3 (2021): 230. http://dx.doi.org/10.3390/electronics10030230.

Texto completo
Resumen
Binary neural networks (BNNs) have attracted significant interest for the implementation of deep neural networks (DNNs) on resource-constrained edge devices, and various BNN accelerator architectures have been proposed to achieve higher efficiency. BNN accelerators can be divided into two categories: streaming and layer accelerators. Although streaming accelerators designed for a specific BNN network topology provide high throughput, they are infeasible for various sensor applications in edge AI because of their complexity and inflexibility. In contrast, layer accelerators with reasonable resources can support various network topologies, but they operate with the same parallelism for all the layers of the BNN, which degrades throughput performance at certain layers. To overcome this problem, we propose a BNN accelerator with adaptive parallelism that offers high throughput performance in all layers. The proposed accelerator analyzes target layer parameters and operates with optimal parallelism using reasonable resources. In addition, this architecture is able to fully compute all types of BNN layers thanks to its reconfigurability, and it can achieve a higher area–speed efficiency than existing accelerators. In performance evaluation using state-of-the-art BNN topologies, the designed BNN accelerator achieved an area–speed efficiency 9.69 times higher than previous FPGA implementations and 24% higher than existing VLSI implementations for BNNs.
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Noskova, E. S., I. E. Zakharov, Y. N. Shkandybin, and S. G. Rykovanov. "Towards energy-efficient neural network calculations." Computer Optics 46, no. 1 (2022): 160–66. http://dx.doi.org/10.18287/2412-6179-co-914.

Texto completo
Resumen
Nowadays, the problem of creating high-performance and energy-efficient hardware for Artificial Intelligence tasks is very acute. The most popular solution to this problem is the use of Deep Learning Accelerators, such as GPUs and Tensor Processing Units to run neural networks. Recently, NVIDIA has announced the NVDLA project, which allows one to design neural network accelerators based on an open-source code. This work describes a full cycle of creating a prototype NVDLA accelerator, as well as testing the resulting solution by running the resnet-50 neural network on it. Finally, an assessment of the performance and power efficiency of the prototype NVDLA accelerator when compared to the GPU and CPU is provided, the results of which show the superiority of NVDLA in many characteristics.
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Fan, Yuxiao. "Design and research of high-performance convolutional neural network accelerator based on Chipyard." Journal of Physics: Conference Series 2858, no. 1 (2024): 012001. http://dx.doi.org/10.1088/1742-6596/2858/1/012001.

Texto completo
Resumen
Abstract Neural network accelerator performs well in the research and verification of neural network models. In this paper, a convolutional neural network accelerator system composed of RISC-V processor core and Gemmini array accelerator is designed in Chisel language within the Chipyard framework, and the acceleration effect of different Gemmini array configurations for different input matrices is further investigated. The result shows that the accelerator system can achieve thousands of times acceleration compared with a single processor for large matrix calculations.
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Xu, Jia, Han Pu, and Dong Wang. "Sparse Convolution FPGA Accelerator Based on Multi-Bank Hash Selection." Micromachines 16, no. 1 (2024): 22. https://doi.org/10.3390/mi16010022.

Texto completo
Resumen
Reconfigurable processor-based acceleration of deep convolutional neural network (DCNN) algorithms has emerged as a widely adopted technique, with particular attention on sparse neural network acceleration as an active research area. However, many computing devices that claim high computational power still struggle to execute neural network algorithms with optimal efficiency, low latency, and minimal power consumption. Consequently, there remains significant potential for further exploration into improving the efficiency, latency, and power consumption of neural network accelerators across diverse computational scenarios. This paper investigates three key techniques for hardware acceleration of sparse neural networks. The main contributions are as follows: (1) Most neural network inference tasks are typically executed on general-purpose computing devices, which often fail to deliver high energy efficiency and are not well-suited for accelerating sparse convolutional models. In this work, we propose a specialized computational circuit for the convolutional operations of sparse neural networks. This circuit is designed to detect and eliminate the computational effort associated with zero values in the sparse convolutional kernels, thereby enhancing energy efficiency. (2) The data access patterns in convolutional neural networks introduce significant pressure on the high-latency off-chip memory access process. Due to issues such as data discontinuity, the data reading unit often fails to fully exploit the available bandwidth during off-chip read and write operations. In this paper, we analyze bandwidth utilization in the context of convolutional accelerator data handling and propose a strategy to improve off-chip access efficiency. Specifically, we leverage a compiler optimization plugin developed for Vitis HLS, which automatically identifies and optimizes on-chip bandwidth utilization. (3) In coefficient-based accelerators, the synchronous operation of individual computational units can significantly hinder efficiency. Previous approaches have achieved asynchronous convolution by designing separate memory units for each computational unit; however, this method consumes a substantial amount of on-chip memory resources. To address this issue, we propose a shared feature map cache design for asynchronous convolution in the accelerators presented in this paper. This design resolves address access conflicts when multiple computational units concurrently access a set of caches by utilizing a hash-based address indexing algorithm. Moreover, the shared cache architecture reduces data redundancy and conserves on-chip resources. Using the optimized accelerator, we successfully executed ResNet50 inference on an Intel Arria 10 1150GX FPGA, achieving a throughput of 497 GOPS, or an equivalent computational power of 1579 GOPS, with a power consumption of only 22 watts.
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Ferianc, Martin, Hongxiang Fan, Divyansh Manocha, et al. "Improving Performance Estimation for Design Space Exploration for Convolutional Neural Network Accelerators." Electronics 10, no. 4 (2021): 520. http://dx.doi.org/10.3390/electronics10040520.

Texto completo
Resumen
Contemporary advances in neural networks (NNs) have demonstrated their potential in different applications such as in image classification, object detection or natural language processing. In particular, reconfigurable accelerators have been widely used for the acceleration of NNs due to their reconfigurability and efficiency in specific application instances. To determine the configuration of the accelerator, it is necessary to conduct design space exploration to optimize the performance. However, the process of design space exploration is time consuming because of the slow performance evaluation for different configurations. Therefore, there is a demand for an accurate and fast performance prediction method to speed up design space exploration. This work introduces a novel method for fast and accurate estimation of different metrics that are of importance when performing design space exploration. The method is based on a Gaussian process regression model parametrised by the features of the accelerator and the target NN to be accelerated. We evaluate the proposed method together with other popular machine learning based methods in estimating the latency and energy consumption of our implemented accelerator on two different hardware platforms targeting convolutional neural networks. We demonstrate improvements in estimation accuracy, without the need for significant implementation effort or tuning.
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Sunny, Febin P., Asif Mirza, Mahdi Nikdast, and Sudeep Pasricha. "ROBIN: A Robust Optical Binary Neural Network Accelerator." ACM Transactions on Embedded Computing Systems 20, no. 5s (2021): 1–24. http://dx.doi.org/10.1145/3476988.

Texto completo
Resumen
Domain specific neural network accelerators have garnered attention because of their improved energy efficiency and inference performance compared to CPUs and GPUs. Such accelerators are thus well suited for resource-constrained embedded systems. However, mapping sophisticated neural network models on these accelerators still entails significant energy and memory consumption, along with high inference time overhead. Binarized neural networks (BNNs), which utilize single-bit weights, represent an efficient way to implement and deploy neural network models on accelerators. In this paper, we present a novel optical-domain BNN accelerator, named ROBIN , which intelligently integrates heterogeneous microring resonator optical devices with complementary capabilities to efficiently implement the key functionalities in BNNs. We perform detailed fabrication-process variation analyses at the optical device level, explore efficient corrective tuning for these devices, and integrate circuit-level optimization to counter thermal variations. As a result, our proposed ROBIN architecture possesses the desirable traits of being robust, energy-efficient, low latency, and high throughput, when executing BNN models. Our analysis shows that ROBIN can outperform the best-known optical BNN accelerators and many electronic accelerators. Specifically, our energy-efficient ROBIN design exhibits energy-per-bit values that are ∼4 × lower than electronic BNN accelerators and ∼933 × lower than a recently proposed photonic BNN accelerator, while a performance-efficient ROBIN design shows ∼3 × and ∼25 × better performance than electronic and photonic BNN accelerators, respectively.
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Tang, Wenkai, and Peiyong Zhang. "GPGCN: A General-Purpose Graph Convolution Neural Network Accelerator Based on RISC-V ISA Extension." Electronics 11, no. 22 (2022): 3833. http://dx.doi.org/10.3390/electronics11223833.

Texto completo
Resumen
In the past two years, various graph convolution neural networks (GCNs) accelerators have emerged, each with their own characteristics, but their common disadvantage is that the hardware architecture is not programmable and it is optimized for a specific network and dataset. They may not support acceleration for different GCNs and may not achieve optimal hardware resource utilization for datasets of different sizes. Therefore, given the above shortcomings, and according to the development trend of traditional neural network accelerators, this paper proposes and implements GPGCN: a general-purpose GCNs accelerator architecture based on RISC-V instruction set extension, providing the software programming freedom to support acceleration for various GCNs, and achieving the best acceleration efficiency for different GCNs with different datasets. Compared with traditional CPU, and traditional CPU with vector expansion, GPGCN achieves above 1001×, 267× speedup for GCN with the Cora dataset. Compared with dedicated accelerators, GPGCN has software programmability and supports the acceleration of more GCNs.
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Xia, Chengpeng, Yawen Chen, Haibo Zhang, Hao Zhang, Fei Dai, and Jigang Wu. "Efficient neural network accelerators with optical computing and communication." Computer Science and Information Systems, no. 00 (2022): 66. http://dx.doi.org/10.2298/csis220131066x.

Texto completo
Resumen
Conventional electronic Artificial Neural Networks (ANNs) accelerators focus on architecture design and numerical computation optimization to improve the training efficiency. However, these approaches have recently encountered bottlenecks in terms of energy efficiency and computing performance, which leads to an increase interest in photonic accelerator. Photonic architectures with low energy consumption, high transmission speed and high bandwidth have been considered as an important role for generation of computing architectures. In this paper, to provide a better understanding of optical technology used in ANN acceleration, we present a comprehensive review for the efficient photonic computing and communication in ANN accelerators. The related photonic devices are investigated in terms of the application in ANNs acceleration, and a classification of existing solutions is proposed that are categorized into optical computing acceleration and optical communication acceleration according to photonic effects and photonic architectures. Moreover, we discuss the challenges for these photonic neural network acceleration approaches to highlight the most promising future research opportunities in this field.
Los estilos APA, Harvard, Vancouver, ISO, etc.
Más fuentes

Tesis sobre el tema "Neural network accelerator"

1

Tianxu, Yue. "Convolutional Neural Network FPGA-accelerator on Intel DE10-Standard FPGA." Thesis, Linköpings universitet, Elektroniska Kretsar och System, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-178174.

Texto completo
Resumen
Convolutional neural networks (CNNs) have been extensively used in many aspects, such as face and speech recognition, image searching and classification, and automatic drive. Hence, CNN accelerators have become a trending research. Generally, Graphics processing units (GPUs) are widely applied in CNNaccelerators. However, Field-programmable gate arrays (FPGAs) have higher energy and resource efficiency compared with GPUs, moreover, high-level synthesis tools based on Open Computing Language (OpenCL) can reduce the verification and implementation period for FPGAs. In this project, PipeCNN[1] is implemented on Intel DE10-Standard FPGA. This OpenCL design acceleratesAlexnet through the interaction between Advanced RISC Machine (ARM) and FPGA. Then, PipeCNN optimization based on memory read and convolution is analyzed and discussed.
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Oudrhiri, Ali. "Performance of a Neural Network Accelerator Architecture and its Optimization Using a Pipeline-Based Approach." Electronic Thesis or Diss., Sorbonne université, 2023. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2023SORUS658.pdf.

Texto completo
Resumen
Ces dernières années, les réseaux de neurones ont gagné en popularité en raison de leur polyvalence et de leur efficacité dans la résolution d'une grande variété de tâches complexes. Cependant, à mesure que les réseaux neuronaux continuent de trouver des applications dans une gamme toujours croissante de domaines, leurs importantes exigences en matière de calcul deviennent un défi pressant. Cette demande en calcul est particulièrement problématique lors du déploiement de réseaux neuronaux sur des dispositifs embarqués aux ressources limitées, en particulier dans le contexte du calcul en périphérie pour les tâches d'inférence. De nos jours, les puces accélératrices de réseaux neuronaux émergent comme le choix optimal pour prendre en charge les réseaux neuronaux en périphérie. Ces puces offrent une efficacité remarquable avec leur taille compacte, leur faible consommation d'énergie et leur latence réduite. Dans le cadre du calcul en périphérie, diverses exigences ont émergé, nécessitant des compromis dans divers aspects de performance. Cela a conduit au développement d'architectures d'accélérateurs hautement configurables, leur permettant de s'adapter aux demandes de performance distinctes. Dans ce contexte, l'accent est mis sur Gemini, un accélérateur configurable de réseaux neuronaux conçu avec une architecture imposée et mis en œuvre à l'aide de techniques de synthèse de haut niveau. Les considérations pour sa conception et sa mise en œuvre ont été motivées par le besoin de configurabilité de la parallélisation et d'optimisation des performances. Une fois cet accélérateur conçu, il est devenu essentiel de démontrer la puissance de sa configurabilité, aidant les utilisateurs à choisir l'architecture la plus adaptée à leurs réseaux neuronaux. Pour atteindre cet objectif, cette thèse a contribué au développement d'une stratégie de prédiction des performances fonctionnant à un niveau élevé d'abstraction, qui prend en compte l'architecture choisie et la configuration du réseau neuronal. Cet outil aide les clients à prendre des décisions concernant l'architecture appropriée pour leurs applications de réseaux neuronaux spécifiques. Au cours de la recherche, nous avons constaté qu'utiliser un seul accélérateur présentait plusieurs limites et que l'augmentation de la parallélisme avait des limitations en termes de performances. Par conséquent, nous avons adopté une nouvelle stratégie d'optimisation de l'accélération des réseaux neuronaux. Cette fois, nous avons adopté une approche de haut niveau qui ne nécessitait pas d'optimisations fines de l'accélérateur. Nous avons organisé plusieurs instances de Gemini en pipeline et avons attribué les couches à différents accélérateurs pour maximiser les performances. Nous avons proposé des solutions pour deux scénarios : un scénario utilisateur où la structure du pipeline est prédéfinie avec un nombre fixe d'accélérateurs, de configurations d'accélérateurs et de tailles de RAM. Nous avons proposé des solutions pour mapper les couches sur les différents accélérateurs afin d'optimiser les performances d'exécution. Nous avons fait de même pour un scénario concepteur, où la structure du pipeline n'est pas fixe, cette fois il est permis de choisir le nombre et la configuration des accélérateurs pour optimiser l'exécution et également les performances matérielles. Cette stratégie de pipeline s'est révélée efficace pour l'accélérateur Gemini. Bien que cette thèse soit née d'un besoin industriel spécifique, certaines solutions développées au cours de la recherche peuvent être appliquées ou adaptées à d'autres accélérations de réseaux neuronaux. Notamment, la stratégie de prédiction des performances et l'optimisation de haut niveau du traitement de réseaux neuronaux en combinant plusieurs instances offrent des aperçus précieux pour une application plus large<br>In recent years, neural networks have gained widespread popularity for their versatility and effectiveness in solving a wide range of complex tasks. Their ability to learn and make predictions from large data-sets has revolutionized various fields. However, as neural networks continue to find applications in an ever-expanding array of domains, their significant computational requirements become a pressing challenge. This computational demand is particularly problematic when deploying neural networks in resource-constrained embedded devices, especially within the context of edge computing for inference tasks. Nowadays, neural network accelerator chips emerge as the optimal choice for supporting neural networks at the edge. These chips offer remarkable efficiency with their compact size, low power consumption, and reduced latency. Moreover, the fact that they are integrated on the same chip environment also enhances security by minimizing external data communication. In the frame of edge computing, diverse requirements have emerged, necessitating trade-offs in various performance aspects. This has led to the development of accelerator architectures that are highly configurable, allowing them to adapt to distinct performance demands. In this context, the focus lies on Gemini, a configurable inference neural network accelerator designed with imposed architecture and implemented using High-Level Synthesis techniques. The considerations for its design and implementation were driven by the need for parallelization configurability and performance optimization. Once this accelerator was designed, demonstrating the power of its configurability became essential, helping users select the most suitable architecture for their neural networks. To achieve this objective, this thesis contributed to the development of a performance prediction strategy operating at a high-level of abstraction, which considers the chosen architecture and neural network configuration. This tool assists clients in making decisions regarding the appropriate architecture for their specific neural network applications. During the research, we noticed that using one accelerator presents several limits and that increasing parallelism had limitations on performances. Consequently, we adopted a new strategy for optimizing neural network acceleration. This time, we took a high-level approach that did not require fine-grained accelerator optimizations. We organized multiple Gemini instances into a pipeline and allocated layers to different accelerators to maximize performance. We proposed solutions for two scenarios: a user scenario where the pipeline structure is predefined with a fixed number of accelerators, accelerator configurations, and RAM sizes. We proposed solutions to map the layers on the different accelerators to optimise the execution performance. We did the same for a designer scenario, where the pipeline structure is not fixed, this time it is allowed to choose the number and configuration of the accelerators to optimize the execution and also hardware performances. This pipeline strategy has proven to be effective for the Gemini accelerator. Although this thesis originated from a specific industrial need, certain solutions developed during the research can be applied or adapted to other neural network accelerators. Notably, the performance prediction strategy and high-level optimization of NN processing through pipelining multiple instances offer valuable insights for broader application
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Maltoni, Pietro. "Progetto di un acceleratore hardware per layer di convoluzioni depthwise in applicazioni di Deep Neural Network." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24205/.

Texto completo
Resumen
Il progressivo sviluppo tecnologico e il costante monitoraggio, controllo e analisi della realtà circostante ha condotto allo sviluppo di dispositivi IoT sempre più performanti, per questo si è iniziato a parlare di Edge Computing. In questi dispositivi sono presenti le risorse per elaborare i dati dai sensori direttamente in locale. Questa tecnologia si adatta bene alle CNN, reti neurali per l'analisi e il riconoscimento di immagini. Le Separable Convolution rappresentano una nuova frontiera perchè permettono di diminuire in modo massiccio la quantità di operazioni da eseguire su tensori di dati dividendo la convoluzione in due parti: una Depthwise e una Pointwise. Tutto questo porta a risultati molto affidabili in termini di accuratezza e velocità ma è sempre centrale il problema legato al consumo di potenza in quanto i dispositivi si affidano solamente ad una batteria interna. Per questo è necessario avere un buon trade-off tra consumi e capacità computazionale. Per rispondere a questa sfida tecnologica lo stato dell'arte in questo ambito propone soluzioni diverse, composte da cluster con core ottimizzati e istruzioni dedicate o FPGA. In questa tesi proponiamo un acceleratore hardware sviluppato in PULP orientato al calcolo di layer di convoluzioni Depthwise. Grazie ad una logica HWC dei dati in memoria e al Window Buffer, una finestra che trasla sull'immagine per effettuare le convoluzioni canale per canale è stato possibile sviluppare una architettura del datapath orientata al riuso dei dati; questo porta l’acceleratore ad avere come risultato in uscita uno throughput massimo di 4 pixel per ciclo di clock. Con le performance di 6 GOP/s, un' efficienza energetica di 101 GOP/j e un consumo di potenza nell'ordine dei mW, dati ottenuti attraverso l'integrazione dell'IP all'interno del cluster di Darkside, nuovo chip di ricerca con tecnologia TSCM a 65 nm, l'acceleratore Depthwise si candida ad essere una soluzione ideale per questo tipo di applicazioni.
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Xu, Hongjie. "Energy-Efficient On-Chip Cache Architectures and Deep Neural Network Accelerators Considering the Cost of Data Movement." Doctoral thesis, Kyoto University, 2021. http://hdl.handle.net/2433/263786.

Texto completo
Resumen
付記する学位プログラム名: 京都大学卓越大学院プログラム「先端光・電子デバイス創成学」<br>京都大学<br>新制・課程博士<br>博士(情報学)<br>甲第23325号<br>情博第761号<br>京都大学大学院情報学研究科通信情報システム専攻<br>(主査)教授 小野寺 秀俊, 教授 大木 英司, 教授 佐藤 高史<br>学位規則第4条第1項該当<br>Doctor of Informatics<br>Kyoto University<br>DFAM
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Pradels, Léo. "Efficient CNN inference acceleration on FPGAs : a pattern pruning-driven approach." Electronic Thesis or Diss., Université de Rennes (2023-....), 2024. http://www.theses.fr/2024URENS087.

Texto completo
Resumen
Les modèles d'apprentissage profond basés sur les CNNs offrent des performances de pointe dans les tâches de traitement d'images et de vidéos, en particulier pour l'amélioration ou la classification d'images. Cependant, ces modèles sont lourds en calcul et en empreinte mémoire, ce qui les rend inadaptés aux contraintes de temps réel sur des FPGA embarqués. Il est donc essentiel de compresser ces CNNs et de concevoir des architectures d'accélérateurs pour l'inférence qui intègrent la compression dans une approche de co-conception matérielle et logicielle. Bien que des optimisations logicielles telles que l'élagage aient été proposées, elles manquent souvent de structure nécessaire à une intégration efficace de l'accélérateur. Pour répondre à ces limitations, cette thèse se concentre sur l'accélération des CNNs sur FPGA tout en respectant les contraintes de temps réel sur les systèmes embarqués. Cet objectif est atteint grâce à plusieurs contributions clés. Tout d'abord, elle introduit l'élagage des motifs, qui impose une structure à la sparsité du réseau, permettant une accélération matérielle efficace avec une perte de précision minimale due à la compression. Deuxièmement, un accélérateur pour l'inférence de CNN est présenté, qui adapte son architecture en fonction des critères de performance d'entrée, des spécifications FPGA et de l'architecture du modèle CNN cible. Une méthode efficace d'intégration de l'élagage des motifs dans l'accélérateur et un flux complet pour l'accélération de CNN sont proposés. Enfin, des améliorations de la compression du réseau sont explorées grâce à la quantification de Shift\&amp;Add, qui modifie les méthodes de multiplication sur FPGA tout en maintenant la précision du réseau de base<br>CNN-based deep learning models provide state-of-the-art performance in image and video processing tasks, particularly for image enhancement or classification. However, these models are computationally and memory-intensive, making them unsuitable for real-time constraints on embedded FPGA systems. As a result, compressing these CNNs and designing accelerator architectures for inference that integrate compression in a hardware-software co-design approach is essential. While software optimizations like pruning have been proposed, they often lack the structured approach needed for effective accelerator integration. To address these limitations, this thesis focuses on accelerating CNNs on FPGAs while complying with real-time constraints on embedded systems. This is achieved through several key contributions. First, it introduces pattern pruning, which imposes structure on network sparsity, enabling efficient hardware acceleration with minimal accuracy loss due to compression. Second, a scalable accelerator for CNN inference is presented, which adapts its architecture based on input performance criteria, FPGA specifications, and target CNN model architecture. An efficient method for integrating pattern pruning within the accelerator and a complete flow for CNN acceleration are proposed. Finally, improvements in network compression are explored through Shift&amp;Add quantization, which modifies FPGA computation methods while maintaining baseline network accuracy
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Riera, Villanueva Marc. "Low-power accelerators for cognitive computing." Doctoral thesis, Universitat Politècnica de Catalunya, 2020. http://hdl.handle.net/10803/669828.

Texto completo
Resumen
Deep Neural Networks (DNNs) have achieved tremendous success for cognitive applications, and are especially efficient in classification and decision making problems such as speech recognition or machine translation. Mobile and embedded devices increasingly rely on DNNs to understand the world. Smartphones, smartwatches and cars perform discriminative tasks, such as face or object recognition, on a daily basis. Despite the increasing popularity of DNNs, running them on mobile and embedded systems comes with several main challenges: delivering high accuracy and performance with a small memory and energy budget. Modern DNN models consist of billions of parameters requiring huge computational and memory resources and, hence, they cannot be directly deployed on low-power systems with limited resources. The objective of this thesis is to address these issues and propose novel solutions in order to design highly efficient custom accelerators for DNN-based cognitive computing systems. In first place, we focus on optimizing the inference of DNNs for sequence processing applications. We perform an analysis of the input similarity between consecutive DNN executions. Then, based on the high degree of input similarity, we propose DISC, a hardware accelerator implementing a Differential Input Similarity Computation technique to reuse the computations of the previous execution, instead of computing the entire DNN. We observe that, on average, more than 60% of the inputs of any neural network layer tested exhibit negligible changes with respect to the previous execution. Avoiding the memory accesses and computations for these inputs results in 63% energy savings on average. In second place, we propose to further optimize the inference of FC-based DNNs. We first analyze the number of unique weights per input neuron of several DNNs. Exploiting common optimizations, such as linear quantization, we observe a very small number of unique weights per input for several FC layers of modern DNNs. Then, to improve the energy-efficiency of FC computation, we present CREW, a hardware accelerator that implements a Computation Reuse and an Efficient Weight Storage mechanism to exploit the large number of repeated weights in FC layers. CREW greatly reduces the number of multiplications and provides significant savings in model memory footprint and memory bandwidth usage. We evaluate CREW on a diverse set of modern DNNs. On average, CREW provides 2.61x speedup and 2.42x energy savings over a TPU-like accelerator. In third place, we propose a mechanism to optimize the inference of RNNs. RNN cells perform element-wise multiplications across the activations of different gates, sigmoid and tanh being the common activation functions. We perform an analysis of the activation function values, and show that a significant fraction are saturated towards zero or one in popular RNNs. Then, we propose CGPA to dynamically prune activations from RNNs at a coarse granularity. CGPA avoids the evaluation of entire neurons whenever the outputs of peer neurons are saturated. CGPA significantly reduces the amount of computations and memory accesses while avoiding sparsity by a large extent, and can be easily implemented on top of conventional accelerators such as TPU with negligible area overhead, resulting in 12% speedup and 12% energy savings on average for a set of widely used RNNs. Finally, in the last contribution of this thesis we focus on static DNN pruning methodologies. DNN pruning reduces memory footprint and computational work by removing connections and/or neurons that are ineffectual. However, we show that prior pruning schemes require an extremely time-consuming iterative process that requires retraining the DNN many times to tune the pruning parameters. Then, we propose a DNN pruning scheme based on Principal Component Analysis and relative importance of each neuron's connection that automatically finds the optimized DNN in one shot.<br>Les xarxes neuronals profundes (DNN) han aconseguit un èxit enorme en aplicacions cognitives, i són especialment eficients en problemes de classificació i presa de decisions com ara reconeixement de veu o traducció automàtica. Els dispositius mòbils depenen cada cop més de les DNNs per entendre el món. Els telèfons i rellotges intel·ligents, o fins i tot els cotxes, realitzen diàriament tasques discriminatòries com ara el reconeixement de rostres o objectes. Malgrat la popularitat creixent de les DNNs, el seu funcionament en sistemes mòbils presenta diversos reptes: proporcionar una alta precisió i rendiment amb un petit pressupost de memòria i energia. Les DNNs modernes consisteixen en milions de paràmetres que requereixen recursos computacionals i de memòria enormes i, per tant, no es poden utilitzar directament en sistemes de baixa potència amb recursos limitats. L'objectiu d'aquesta tesi és abordar aquests problemes i proposar noves solucions per tal de dissenyar acceleradors eficients per a sistemes de computació cognitiva basats en DNNs. En primer lloc, ens centrem en optimitzar la inferència de les DNNs per a aplicacions de processament de seqüències. Realitzem una anàlisi de la similitud de les entrades entre execucions consecutives de les DNNs. A continuació, proposem DISC, un accelerador que implementa una tècnica de càlcul diferencial, basat en l'alt grau de semblança de les entrades, per reutilitzar els càlculs de l'execució anterior, en lloc de computar tota la xarxa. Observem que, de mitjana, més del 60% de les entrades de qualsevol capa de les DNNs utilitzades presenten canvis menors respecte a l'execució anterior. Evitar els accessos de memòria i càlculs d'aquestes entrades comporta un estalvi d'energia del 63% de mitjana. En segon lloc, proposem optimitzar la inferència de les DNNs basades en capes FC. Primer analitzem el nombre de pesos únics per neurona d'entrada en diverses xarxes. Aprofitant optimitzacions comunes com la quantització lineal, observem un nombre molt reduït de pesos únics per entrada en diverses capes FC de DNNs modernes. A continuació, per millorar l'eficiència energètica del càlcul de les capes FC, presentem CREW, un accelerador que implementa un eficient mecanisme de reutilització de càlculs i emmagatzematge dels pesos. CREW redueix el nombre de multiplicacions i proporciona estalvis importants en l'ús de la memòria. Avaluem CREW en un conjunt divers de DNNs modernes. CREW proporciona, de mitjana, una millora en rendiment de 2,61x i un estalvi d'energia de 2,42x. En tercer lloc, proposem un mecanisme per optimitzar la inferència de les RNNs. Les cel·les de les xarxes recurrents realitzen multiplicacions element a element de les activacions de diferents comportes, sigmoides i tanh sent les funcions habituals d'activació. Realitzem una anàlisi dels valors de les funcions d'activació i mostrem que una fracció significativa està saturada cap a zero o un en un conjunto d'RNNs populars. A continuació, proposem CGPA per podar dinàmicament les activacions de les RNNs a una granularitat gruixuda. CGPA evita l'avaluació de neurones senceres cada vegada que les sortides de neurones parelles estan saturades. CGPA redueix significativament la quantitat de càlculs i accessos a la memòria, aconseguint en mitjana un 12% de millora en el rendiment i estalvi d'energia. Finalment, en l'última contribució d'aquesta tesi ens centrem en metodologies de poda estàtica de les DNNs. La poda redueix la petjada de memòria i el treball computacional mitjançant l'eliminació de connexions o neurones redundants. Tanmateix, mostrem que els esquemes de poda previs fan servir un procés iteratiu molt llarg que requereix l'entrenament de les DNNs moltes vegades per ajustar els paràmetres de poda. A continuació, proposem un esquema de poda basat en l'anàlisi de components principals i la importància relativa de les connexions de cada neurona que optimitza automàticament el DNN optimitzat en un sol tret sense necessitat de sintonitzar manualment múltiples paràmetres
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Khan, Muhammad Jazib. "Programmable Address Generation Unit for Deep Neural Network Accelerators." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-271884.

Texto completo
Resumen
The Convolutional Neural Networks are getting more and more popular due to their applications in revolutionary technologies like Autonomous Driving, Biomedical Imaging, and Natural Language Processing. With this increase in adoption, the complexity of underlying algorithms is also increasing. This trend entails implications for the computation platforms as well, i.e. GPUs, FPGA, or ASIC based accelerators, especially for the Address Generation Unit (AGU), which is responsible for the memory access. Existing accelerators typically have Parametrizable Datapath AGUs, which have minimal adaptability towards evolution in algorithms. Hence new hardware is required for new algorithms, which is a very inefficient approach in terms of time, resources, and reusability. In this research, six algorithms with different implications for hardware are evaluated for address generation, and a fully Programmable AGU (PAGU) is presented, which can adapt to these algorithms. These algorithms are Standard, Strided, Dilated, Upsampled and Padded convolution, and MaxPooling. The proposed AGU architecture is a Very Long Instruction Word based Application Specific Instruction Processor which has specialized components like hardware counters and zero-overhead loops and a powerful Instruction Set Architecture (ISA), which can model static and dynamic constraints and affine and non-affine Address Equations. The target has been to minimize the flexibility vs. area, power, and performance trade-off. For a working test network of Semantic Segmentation, results have shown that PAGU shows close to the ideal performance, one cycle per address, for all the algorithms under consideration excepts Upsampled Convolution for which it is 1.7 cycles per address. The area of PAGU is approx. 4.6 times larger than the Parametrizable Datapath approach, which is still reasonable considering the high flexibility benefits. The potential of PAGU is not just limited to neural network applications but also in more general digital signal processing areas, which can be explored in the future.<br>Convolutional Neural Networks blir mer och mer populära på grund av deras applikationer inom revolutionerande tekniker som autonom körning, biomedicinsk bildbehandling och naturligt språkbearbetning. Med denna ökning av antagandet ökar också komplexiteten hos underliggande algoritmer. Detta medför implikationer för beräkningsplattformarna såväl som GPU: er, FPGAeller ASIC-baserade acceleratorer, särskilt för Adressgenerationsenheten (AGU) som är ansvarig för minnesåtkomst. Befintliga acceleratorer har normalt Parametrizable Datapath AGU: er som har mycket begränsad anpassningsförmåga till utveckling i algoritmer. Därför krävs ny hårdvara för nya algoritmer, vilket är en mycket ineffektiv metod när det gäller tid, resurser och återanvändbarhet. I denna forskning utvärderas sex algoritmer med olika implikationer för hårdvara för adressgenerering och en helt programmerbar AGU (PAGU) presenteras som kan anpassa sig till dessa algoritmer. Dessa algoritmer är Standard, Strided, Dilated, Upsampled och Padded convolution och MaxPooling. Den föreslagna AGU-arkitekturen är en Very Long Instruction Word-baserad applikationsspecifik instruktionsprocessor som har specialiserade komponenter som hårdvara räknare och noll-overhead-slingor och en kraftfull Instruktionsuppsättning Arkitektur (ISA) som kan modellera statiska och dynamiska begränsningar och affinera och icke-affinerad adress ekvationer. Målet har varit att minimera flexibiliteten kontra avvägning av område, kraft och prestanda. För ett fungerande testnätverk av semantisk segmentering har resultaten visat att PAGU visar nära den perfekta prestanda, 1 cykel per adress, för alla algoritmer som beaktas undantar Upsampled Convolution för vilken det är 1,7 cykler per adress. Området för PAGU är ungefär 4,6 gånger större än Parametrizable Datapath-metoden, vilket fortfarande är rimligt med tanke på de stora flexibilitetsfördelarna. Potentialen för PAGU är inte bara begränsad till neurala nätverksapplikationer utan också i mer allmänna digitala signalbehandlingsområden som kan utforskas i framtiden.
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Jalasutram, Rommel. "Acceleration of spiking neural networks on multicore architectures." Connect to this title online, 2009. http://etd.lib.clemson.edu/documents/1252424720/.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Han, Bing. "ACCELERATION OF SPIKING NEURAL NETWORK ON GENERAL PURPOSE GRAPHICS PROCESSORS." University of Dayton / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1271368713.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Chen, Yu-Hsin Ph D. Massachusetts Institute of Technology. "Architecture design for highly flexible and energy-efficient deep neural network accelerators." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/117838.

Texto completo
Resumen
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.<br>This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.<br>Cataloged from student-submitted PDF version of thesis.<br>Includes bibliographical references (pages 141-147).<br>Deep neural networks (DNNs) are the backbone of modern artificial intelligence (AI). However, due to their high computational complexity and diverse shapes and sizes, dedicated accelerators that can achieve high performance and energy efficiency across a wide range of DNNs are critical for enabling AI in real-world applications. To address this, we present Eyeriss, a co-design of software and hardware architecture for DNN processing that is optimized for performance, energy efficiency and flexibility. Eyeriss features a novel Row-Stationary (RS) dataflow to minimize data movement when processing a DNN, which is the bottleneck of both performance and energy efficiency. The RS dataflow supports highly-parallel processing while fully exploiting data reuse in a multi-level memory hierarchy to optimize for the overall system energy efficiency given any DNN shape and size. It achieves 1.4x to 2.5x higher energy efficiency than other existing dataflows. To support the RS dataflow, we present two versions of the Eyeriss architecture. Eyeriss v1 targets large DNNs that have plenty of data reuse. It features a flexible mapping strategy for high performance and a multicast on-chip network (NoC) for high data reuse, and further exploits data sparsity to reduce processing element (PE) power by 45% and off-chip bandwidth by up to 1.9x. Fabricated in a 65nm CMOS, Eyeriss v1 consumes 278 mW at 34.7 fps for the CONV layers of AlexNet, which is 10x more efficient than a mobile GPU. Eyeriss v2 addresses support for the emerging compact DNNs that introduce higher variation in data reuse. It features a RS+ dataflow that improves PE utilization, and a flexible and scalable NoC that adapts to the bandwidth requirement while also exploiting available data reuse. Together, they provide over 10x higher throughput than Eyeriss v1 at 256 PEs. Eyeriss v2 also exploits sparsity and SIMD for an additional 6x increase in throughput.<br>by Yu-Hsin Chen.<br>Ph. D.
Los estilos APA, Harvard, Vancouver, ISO, etc.
Más fuentes

Libros sobre el tema "Neural network accelerator"

1

Whitehead, P. A. Design considerations for a hardware accelerator for Kohonen unsupervised learning in artificial neural networks. UMIST, 1997.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Jones, Steven P. Neural network models of simple mechanical systems illustrating the feasibility of accelerated life testing. National Aeronautics and Space Administration, 1996.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

A, Daglis I., ed. Effects of space weather on technology infrastructure. Kluwer Academic Publishers, 2004.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Kong, Joonho, and Mahmood Azhar Qureshi. Accelerators for Convolutional Neural Networks. Wiley & Sons, Incorporated, John, 2023.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Kong, Joonho, and Mahmood Azhar Qureshi. Accelerators for Convolutional Neural Networks. Wiley & Sons, Incorporated, John, 2023.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Kong, Joonho, and Mahmood Azhar Qureshi. Accelerators for Convolutional Neural Networks. Wiley & Sons, Incorporated, John, 2023.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Munir. Accelerators for Convolutional Neural Networks. Wiley & Sons, Limited, John, 2023.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Accelerated training for large feedforward neural networks. National Aeronautics and Space Administration, Ames Research Center, 1998.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Raff, Lionel, Ranga Komanduri, Martin Hagan, and Satish Bukkapatnam. Neural Networks in Chemical Reaction Dynamics. Oxford University Press, 2012. http://dx.doi.org/10.1093/oso/9780199765652.001.0001.

Texto completo
Resumen
This monograph presents recent advances in neural network (NN) approaches and applications to chemical reaction dynamics. Topics covered include: (i) the development of ab initio potential-energy surfaces (PES) for complex multichannel systems using modified novelty sampling and feedforward NNs; (ii) methods for sampling the configuration space of critical importance, such as trajectory and novelty sampling methods and gradient fitting methods; (iii) parametrization of interatomic potential functions using a genetic algorithm accelerated with a NN; (iv) parametrization of analytic interatomic potential functions using NNs; (v) self-starting methods for obtaining analytic PES from ab inito electronic structure calculations using direct dynamics; (vi) development of a novel method, namely, combined function derivative approximation (CFDA) for simultaneous fitting of a PES and its corresponding force fields using feedforward neural networks; (vii) development of generalized PES using many-body expansions, NNs, and moiety energy approximations; (viii) NN methods for data analysis, reaction probabilities, and statistical error reduction in chemical reaction dynamics; (ix) accurate prediction of higher-level electronic structure energies (e.g. MP4 or higher) for large databases using NNs, lower-level (Hartree-Fock) energies, and small subsets of the higher-energy database; and finally (x) illustrative examples of NN applications to chemical reaction dynamics of increasing complexity starting from simple near equilibrium structures (vibrational state studies) to more complex non-adiabatic reactions. The monograph is prepared by an interdisciplinary group of researchers working as a team for nearly two decades at Oklahoma State University, Stillwater, OK with expertise in gas phase reaction dynamics; neural networks; various aspects of MD and Monte Carlo (MC) simulations of nanometric cutting, tribology, and material properties at nanoscale; scaling laws from atomistic to continuum; and neural networks applications to chemical reaction dynamics. It is anticipated that this emerging field of NN in chemical reaction dynamics will play an increasingly important role in MD, MC, and quantum mechanical studies in the years to come.
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

AI Ladder: Accelerate Your Journey to AI. O'Reilly Media, Incorporated, 2020.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.

Capítulos de libros sobre el tema "Neural network accelerator"

1

Huang, Hantao, and Hao Yu. "Distributed-Solver for Networked Neural Network." In Compact and Fast Machine Learning Accelerator for IoT Devices. Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-13-3323-1_5.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Nakajima, Toshiya. "Architecture of the Neural Network Simulation Accelerator NEUROSIM/L." In International Neural Network Conference. Springer Netherlands, 1990. http://dx.doi.org/10.1007/978-94-009-0643-3_61.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Reagen, Brandon, Robert Adolf, Paul Whatmough, Gu-Yeon Wei, and David Brooks. "Neural Network Accelerator Optimization: A Case Study." In Deep Learning for Computer Architects. Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-031-01756-8_4.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Huang, Hantao, and Hao Yu. "Tensor-Solver for Deep Neural Network." In Compact and Fast Machine Learning Accelerator for IoT Devices. Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-13-3323-1_4.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Ae, Tadashi, and Reiji Aibara. "A Neural Network for 3-D VLSI Accelerator." In The Kluwer International Series in Engineering and Computer Science. Springer US, 1989. http://dx.doi.org/10.1007/978-1-4613-1619-0_16.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Huang, Hantao, and Hao Yu. "Least-Squares-Solver for Shallow Neural Network." In Compact and Fast Machine Learning Accelerator for IoT Devices. Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-13-3323-1_3.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Hu, Lili. "Frameworks for Efficient Convolutional Neural Network Accelerator on FPGA." In Advances in Intelligent Systems and Computing. Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-10-8944-2_75.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Ravikumar, B., B. Chandrababu Naik, Muhsin Jaber Jweeg, et al. "FPGA Realization of Neural Network Accelerator for Image Classification." In Studies in Systems, Decision and Control. Springer Nature Switzerland, 2025. https://doi.org/10.1007/978-3-031-84628-1_43.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Cheung, Kit, Simon R. Schultz, and Wayne Luk. "A Large-Scale Spiking Neural Network Accelerator for FPGA Systems." In Artificial Neural Networks and Machine Learning – ICANN 2012. Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-33269-2_15.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Wu, Jin, Xiangyang Shi, Wenting Pang, and Yu Wang. "Research on FPGA Accelerator Optimization Based on Graph Neural Network." In Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery. Springer International Publishing, 2023. http://dx.doi.org/10.1007/978-3-031-20738-9_61.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.

Actas de conferencias sobre el tema "Neural network accelerator"

1

Fatima, Eeman, Muhammad Fahad, Hiba Abrar, Haroon-ur-Rashid, and Haroon Waris. "FPGA Based Artificial Neural Network Accelerator." In 2024 26th International Multitopic Conference (INMIC). IEEE, 2024. https://doi.org/10.1109/inmic64792.2024.11004346.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Wang, Mengxuan, and Chang Wu. "Layer Pipelined Neural Network Accelerator Design on 2.5D FPGAs." In 2024 IEEE 17th International Conference on Solid-State & Integrated Circuit Technology (ICSICT). IEEE, 2024. https://doi.org/10.1109/icsict62049.2024.10831139.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Zhao, Denghui, Jianrui He, Xuyu Jing, and Xibiao Hou. "DNN performance optimization based on Gemmini neural network hardware accelerator." In Fourth International Conference on Advanced Algorithms and Neural Networks (AANN 2024), edited by Qinghua Lu and Weishan Zhang. SPIE, 2024. http://dx.doi.org/10.1117/12.3049564.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Wen, Fangxin, Zhongzu Zhou, Jiang Zhao, et al. "Fault Identification Method Based on BP Neural Network in Accelerator Distribution Network." In 2025 2nd International Conference on Smart Grid and Artificial Intelligence (SGAI). IEEE, 2025. https://doi.org/10.1109/sgai64825.2025.11009449.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Shiflett, Kyle, Dylan Wright, Avinash Karanth, and Ahmed Louri. "PIXEL: Photonic Neural Network Accelerator." In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2020. http://dx.doi.org/10.1109/hpca47549.2020.00046.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Xu, David, A. Barış Özgüler, Giuseppe Di Guglielmo, et al. "Neural network accelerator for quantum control." In Neural network accelerator for quantum control. US DOE, 2023. http://dx.doi.org/10.2172/1959815.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Yang, Zunming, Zhanzhuang He, Jing Yang, and Zhong Ma. "An LSTM Acceleration Method Based on Embedded Neural Network Accelerator." In ACAI'21: 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence. ACM, 2021. http://dx.doi.org/10.1145/3508546.3508649.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Yi, Qian. "FPGA Implementation of Neural Network Accelerator." In 2018 2nd IEEE Advanced Information Management,Communicates, Electronic and Automation Control Conference (IMCEC). IEEE, 2018. http://dx.doi.org/10.1109/imcec.2018.8469659.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Vogt, Michael C. "Neural network-based sensor signal accelerator." In Intelligent Systems and Smart Manufacturing, edited by Peter E. Orban and George K. Knopf. SPIE, 2001. http://dx.doi.org/10.1117/12.417242.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Wang, Hong, Xiao Zhang, Dehui Kong, et al. "Convolutional Neural Network Accelerator on FPGA." In 2019 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA). IEEE, 2019. http://dx.doi.org/10.1109/icta48799.2019.9012821.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.

Informes sobre el tema "Neural network accelerator"

1

Aimone, James, Christopher Bennett, Suma Cardwell, Ryan Dellana, and Tianyao Xiao. Mosaic The Best of Both Worlds: Analog devices with Digital Spiking Communication to build a Hybrid Neural Network Accelerator. Office of Scientific and Technical Information (OSTI), 2020. http://dx.doi.org/10.2172/1673175.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Meni, Mackenzie, Ryan White, Michael Mayo, and Kevin Pilkiewicz. Entropy-based guidance of deep neural networks for accelerated convergence and improved performance. Engineer Research and Development Center (U.S.), 2025. https://doi.org/10.21079/11681/49805.

Texto completo
Resumen
Neural networks have dramatically increased our capacity to learn from large, high-dimensional datasets across innumerable disciplines. However, their decisions are not easily interpretable, their computational costs are high, and building and training them are not straightforward processes. To add structure to these efforts, we derive new mathematical results to efficiently measure the changes in entropy as fully-connected and convolutional neural networks process data. By measuring the change in entropy as networks process data effectively, patterns critical to a well-performing network can be visualized and identified. Entropy-based loss terms are developed to improve dense and convolutional model accuracy and efficiency by promoting the ideal entropy patterns. Experiments in image compression, image classification, and image segmentation on benchmark datasets demonstrate these losses guide neural networks to learn rich latent data representations in fewer dimensions, converge in fewer training epochs, and achieve higher accuracy.
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Morgan, Nelson, Jerome Feldman, and John Wawrzynek. Accelerator Systems for Neural Networks, Speech, and Related Applications. Defense Technical Information Center, 1995. http://dx.doi.org/10.21236/ada298954.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Garg, Raveesh, Eric Qin, Francisco Martinez, et al. Understanding the Design Space of Sparse/Dense Multiphase Dataflows for Mapping Graph Neural Networks on Spatial Accelerators. Office of Scientific and Technical Information (OSTI), 2021. http://dx.doi.org/10.2172/1821960.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Pasupuleti, Murali Krishna. Quantum-Enhanced Machine Learning: Harnessing Quantum Computing for Next-Generation AI Systems. National Education Services, 2025. https://doi.org/10.62311/nesx/rrv125.

Texto completo
Resumen
Abstract Quantum-enhanced machine learning (QML) represents a paradigm shift in artificial intelligence by integrating quantum computing principles to solve complex computational problems more efficiently than classical methods. By leveraging quantum superposition, entanglement, and parallelism, QML has the potential to accelerate deep learning training, optimize combinatorial problems, and enhance feature selection in high-dimensional spaces. This research explores foundational quantum computing concepts relevant to AI, including quantum circuits, variational quantum algorithms, and quantum kernel methods, while analyzing their impact on neural networks, generative models, and reinforcement learning. Hybrid quantum-classical AI architectures, which combine quantum subroutines with classical deep learning models, are examined for their ability to provide computational advantages in optimization and large-scale data processing. Despite the promise of quantum AI, challenges such as qubit noise, error correction, and hardware scalability remain barriers to full-scale implementation. This study provides an in-depth evaluation of quantum-enhanced AI, highlighting existing applications, ongoing research, and future directions in quantum deep learning, autonomous systems, and scientific computing. The findings contribute to the development of scalable quantum machine learning frameworks, offering novel solutions for next-generation AI systems across finance, healthcare, cybersecurity, and robotics. Keywords Quantum machine learning, quantum computing, artificial intelligence, quantum neural networks, quantum kernel methods, hybrid quantum-classical AI, variational quantum algorithms, quantum generative models, reinforcement learning, quantum optimization, quantum advantage, deep learning, quantum circuits, quantum-enhanced AI, quantum deep learning, error correction, quantum-inspired algorithms, quantum annealing, probabilistic computing.
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Pasupuleti, Murali Krishna. Quantum Semiconductors for Scalable and Fault-Tolerant Computing. National Education Services, 2025. https://doi.org/10.62311/nesx/rr825.

Texto completo
Resumen
Abstract: Quantum semiconductors are revolutionizing computing by enabling scalable, fault-tolerant quantum processors that overcome the limitations of classical computing. As quantum technologies advance, superconducting qubits, silicon spin qubits, topological qubits, and hybrid quantum-classical architectures are emerging as key solutions for achieving high-fidelity quantum operations and long-term coherence. This research explores the materials, device engineering, and fabrication challenges associated with quantum semiconductors, focusing on quantum error correction, cryogenic control systems, and scalable quantum interconnects. The study also examines the economic feasibility, industry adoption trends, and policy implications of quantum semiconductors, assessing their potential impact on AI acceleration, quantum cryptography, and large-scale simulations. Through a comprehensive analysis of quantum computing frameworks, market trends, and emerging applications, this report provides a roadmap for integrating quantum semiconductors into next-generation high-performance computing infrastructures. Keywords: Quantum semiconductors, scalable quantum computing, fault-tolerant quantum processors, superconducting qubits, silicon spin qubits, topological qubits, hybrid quantum-classical computing, quantum error correction, quantum coherence, cryogenic quantum systems, quantum interconnects, quantum cryptography, AI acceleration, quantum neural networks, post-quantum security, quantum-enabled simulations, quantum market trends, quantum computing policy, quantum fabrication techniques.
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Wideman, Jr., Robert F., Nicholas B. Anthony, Avigdor Cahaner, Alan Shlosberg, Michel Bellaiche, and William B. Roush. Integrated Approach to Evaluating Inherited Predictors of Resistance to Pulmonary Hypertension Syndrome (Ascites) in Fast Growing Broiler Chickens. United States Department of Agriculture, 2000. http://dx.doi.org/10.32747/2000.7575287.bard.

Texto completo
Resumen
Background PHS (pulmonary hypertension syndrome, ascites syndrome) is a serious cause of loss in the broiler industry, and is a prime example of an undesirable side effect of successful genetic development that may be deleteriously manifested by factors in the environment of growing broilers. Basically, continuous and pinpointed selection for rapid growth in broilers has led to higher oxygen demand and consequently to more frequent manifestation of an inherent potential cardiopulmonary incapability to sufficiently oxygenate the arterial blood. The multifaceted causes and modifiers of PHS make research into finding solutions to the syndrome a complex and multi threaded challenge. This research used several directions to better understand the development of PHS and to probe possible means of achieving a goal of monitoring and increasing resistance to the syndrome. Research Objectives (1) To evaluate the growth dynamics of individuals within breeding stocks and their correlation with individual susceptibility or resistance to PHS; (2) To compile data on diagnostic indices found in this work to be predictive for PHS, during exposure to experimental protocols known to trigger PHS; (3) To conduct detailed physiological evaluations of cardiopulmonary function in broilers; (4) To compile data on growth dynamics and other diagnostic indices in existing lines selected for susceptibility or resistance to PHS; (5) To integrate growth dynamics and other diagnostic data within appropriate statistical procedures to provide geneticists with predictive indices that characterize resistance or susceptibility to PHS. Revisions In the first year, the US team acquired the costly Peckode weigh platform / individual bird I.D. system that was to provide the continuous (several times each day), automated weighing of birds, for a comprehensive monitoring of growth dynamics. However, data generated were found to be inaccurate and irreproducible, so making its use implausible. Henceforth, weighing was manual, this highly labor intensive work precluding some of the original objectives of using such a strategy of growth dynamics in selection procedures involving thousands of birds. Major conclusions, solutions, achievements 1. Healthy broilers were found to have greater oscillations in growth velocity and acceleration than PHS susceptible birds. This proved the scientific validity of our original hypothesis that such differences occur. 2. Growth rate in the first week is higher in PHS-susceptible than in PHS-resistant chicks. Artificial neural network accurately distinguished differences between the two groups based on growth patterns in this period. 3. In the US, the unilateral pulmonary occlusion technique was used in collaboration with a major broiler breeding company to create a commercial broiler line that is highly resistant to PHS induced by fast growth and low ambient temperatures. 4. In Israel, lines were obtained by genetic selection on PHS mortality after cold exposure in a dam-line population comprising of 85 sire families. The wide range of PHS incidence per family (0-50%), high heritability (about 0.6), and the results in cold challenged progeny, suggested a highly effective and relatively easy means for selection for PHS resistance 5. The best minimally-invasive diagnostic indices for prediction of PHS resistance were found to be oximetry, hematocrit values, heart rate and electrocardiographic (ECG) lead II waves. Some differences in results were found between the US and Israeli teams, probably reflecting genetic differences in the broiler strains used in the two countries. For instance the US team found the S wave amplitude to predict PHS susceptibility well, whereas the Israeli team found the P wave amplitude to be a better valid predictor. 6. Comprehensive physiological studies further increased knowledge on the development of PHS cardiopulmonary characteristics of pre-ascitic birds, pulmonary arterial wedge pressures, hypotension/kidney response, pulmonary hemodynamic responses to vasoactive mediators were all examined in depth. Implications, scientific and agricultural Substantial progress has been made in understanding the genetic and environmental factors involved in PHS, and their interaction. The two teams each successfully developed different selection programs, by surgical means and by divergent selection under cold challenge. Monitoring of the progress and success of the programs was done be using the in-depth estimations that this research engendered on the reliability and value of non-invasive predictive parameters. These findings helped corroborate the validity of practical means to improve PHT resistance by research-based programs of selection.
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

DEEP LEARNING DAMAGE IDENTIFICATION METHOD FOR STEEL- FRAME BRACING STRUCTURES USING TIME–FREQUENCY ANALYSIS AND CONVOLUTIONAL NEURAL NETWORKS. The Hong Kong Institute of Steel Construction, 2023. http://dx.doi.org/10.18057/ijasc.2023.19.4.8.

Texto completo
Resumen
Lattice bracing, commonly used in steel construction systems, is vulnerable to damage and failure when subjected to horizontal seismic pressure. To identify damage, manual examination is the conventional method applied. However, this approach is time-consuming and typically unable to detect damage in its early stage. Determining the exact location of damage has been problematic for researchers. Nevertheless, detecting the failure of lateral supports in various parts of a structure using time–frequency analysis and deep learning methods, such as convolutional neural networks, is possible. Then, the damaged structure can be rapidly rebuilt to ensure safety. Experiments are conducted to determine the vibration acceleration modes of a four-storey steel structure considering various support structure damage scenarios. The acceleration signals at each measurement point are then analysed with respect to time and frequency to generate appropriate three-dimensional spectral matrices. In this study, the MobileNetV2 deep learning model was trained on a labelled picture collection of damaged matrix images. Hyperparameter tweaking and training resulted in a prediction accuracy of 97.37% for the complete dataset and 99.30% and 96.23% for the training and testing sets, respectively. The findings indicate that a combination of time–frequency analysis and deep learning methods may pinpoint the position of the damaged steel frame support components more accurately.
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!