Log in

Relevant bibliographies by topics / Domain-specific hardware accelerator / Journal articles

To see the other types of publications on this topic, follow the link: Domain-specific hardware accelerator.

Journal articles on the topic 'Domain-specific hardware accelerator'

Author: Grafiati

Published: 4 June 2021

Last updated: 14 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 21 journal articles for your research on the topic 'Domain-specific hardware accelerator.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Cong, Jason, Mohammad Ali Ghodrat, Michael Gill, Beayna Grigorian, and Glenn Reinman. "Architecture Support for Domain-Specific Accelerator-Rich CMPs." ACM Transactions on Embedded Computing Systems 13, no. 4s (2014): 1–26. http://dx.doi.org/10.1145/2584664.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Sotiriou-Xanthopoulos, Efstathios, Sotirios Xydis, Kostas Siozios, George Economakos, and Dimitrios Soudris. "A Framework for Interconnection-Aware Domain-Specific Many-Accelerator Synthesis." ACM Transactions on Embedded Computing Systems 16, no. 1 (2016): 1–26. http://dx.doi.org/10.1145/2983624.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Sunny, Febin P., Asif Mirza, Mahdi Nikdast, and Sudeep Pasricha. "ROBIN: A Robust Optical Binary Neural Network Accelerator." ACM Transactions on Embedded Computing Systems 20, no. 5s (2021): 1–24. http://dx.doi.org/10.1145/3476988.

Full text

Abstract:

Domain specific neural network accelerators have garnered attention because of their improved energy efficiency and inference performance compared to CPUs and GPUs. Such accelerators are thus well suited for resource-constrained embedded systems. However, mapping sophisticated neural network models on these accelerators still entails significant energy and memory consumption, along with high inference time overhead. Binarized neural networks (BNNs), which utilize single-bit weights, represent an efficient way to implement and deploy neural network models on accelerators. In this paper, we pres

APA, Harvard, Vancouver, ISO, and other styles

4

Fang, Jian, Yvo T. B. Mulder, Jan Hidders, Jinho Lee, and H. Peter Hofstee. "In-memory database acceleration on FPGAs: a survey." VLDB Journal 29, no. 1 (2019): 33–59. http://dx.doi.org/10.1007/s00778-019-00581-w.

Full text

Abstract:

Abstract While FPGAs have seen prior use in database systems, in recent years interest in using FPGA to accelerate databases has declined in both industry and academia for the following three reasons. First, specifically for in-memory databases, FPGAs integrated with conventional I/O provide insufficient bandwidth, limiting performance. Second, GPUs, which can also provide high throughput, and are easier to program, have emerged as a strong accelerator alternative. Third, programming FPGAs required developers to have full-stack skills, from high-level algorithm design to low-level circuit impl

APA, Harvard, Vancouver, ISO, and other styles

5

Hosseini, Morteza, and Tinoosh Mohsenin. "Binary Precision Neural Network Manycore Accelerator." ACM Journal on Emerging Technologies in Computing Systems 17, no. 2 (2021): 1–27. http://dx.doi.org/10.1145/3423136.

Full text

Abstract:

This article presents a low-power, programmable, domain-specific manycore accelerator, Binarized neural Network Manycore Accelerator (BiNMAC), which adopts and efficiently executes binary precision weight/activation neural network models. Such networks have compact models in which weights are constrained to only 1 bit and can be packed several in one memory entry that minimizes memory footprint to its finest. Packing weights also facilitates executing single instruction, multiple data with simple circuitry that allows maximizing performance and efficiency. The proposed BiNMAC has light-weight

APA, Harvard, Vancouver, ISO, and other styles

6

Reinehr Gobatto, Leonardo, Pablo Rodrigues, Mateus Saquetti Pereira de Carvalho Tirone, Weverton Luis da Costa Cordeiro, and José Rodrigo Furlanetto Azambuja. "Programmable Data Planes meets In-Network Computing: A Review of the State of the Art and Prospective Directions." Journal of Integrated Circuits and Systems 16, no. 2 (2021): 1–8. http://dx.doi.org/10.29292/jics.v16i2.497.

Full text

Abstract:

Improving network traffic in networks is one of the concerns between networking researchers and network operators since the architecture of modern networks still faces challenges to process large data traffic without the cost of consuming a significant amount of resources not related to computing specifically. On the other hand, network programmability has enabled the development of new applications and network services, from software-defined networking to domain-specific languages created to program network devices and specify their behavior. The development of programmable hardware and hardw

APA, Harvard, Vancouver, ISO, and other styles

7

Schmitt, Christian, Moritz Schmid, Sebastian Kuckuk, Harald Köstler, Jürgen Teich, and Frank Hannig. "Reconfigurable Hardware Generation of Multigrid Solvers with Conjugate Gradient Coarse-Grid Solution." Parallel Processing Letters 28, no. 04 (2018): 1850016. http://dx.doi.org/10.1142/s0129626418500160.

Full text

Abstract:

Not only in the field of high-performance computing (HPC), field programmable gate arrays (FPGAs) are a soaringly popular accelerator technology. However, they use a completely different programming paradigm and tool set compared to central processing units (CPUs) or even graphics processing units (GPUs), adding extra development steps and requiring special knowledge, hindering widespread use in scientific computing. To bridge this programmability gap, domain-specific languages (DSLs) are a popular choice to generate low-level implementations from an abstract algorithm description. In this wor

APA, Harvard, Vancouver, ISO, and other styles

8

Dally, William J., Yatish Turakhia, and Song Han. "Domain-specific hardware accelerators." Communications of the ACM 63, no. 7 (2020): 48–57. http://dx.doi.org/10.1145/3361682.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Parravicini, Daniele, Davide Conficconi, Emanuele Del Sozzo, Christian Pilato, and Marco D. Santambrogio. "CICERO: A Domain-Specific Architecture for Efficient Regular Expression Matching." ACM Transactions on Embedded Computing Systems 20, no. 5s (2021): 1–24. http://dx.doi.org/10.1145/3476982.

Full text

Abstract:

Regular Expression (RE) matching is a computational kernel used in several applications. Since RE complexity and data volumes are steadily increasing, hardware acceleration is gaining attention also for this problem. Existing approaches have limited flexibility as they require a different implementation for each RE. On the other hand, it is complex to map efficient RE representations like non-deterministic finite-state automata onto software-programmable engines or parallel architectures. In this work, we present CICERO , an end-to-end framework composed of a domain-specific architecture and a

APA, Harvard, Vancouver, ISO, and other styles

10

Soldavini, Stephanie, and Christian Pilato. "A Survey on Domain-Specific Memory Architectures." Journal of Integrated Circuits and Systems 16, no. 2 (2021): 1–9. http://dx.doi.org/10.29292/jics.v16i2.509.

Full text

Abstract:

The never-ending demand for high performance and energy efficiency is pushing designers towards an increasing level of heterogeneity and specialization in modern computing systems. In such systems, creating efficient memory architectures is one of the major opportunities for optimizing modern workloads (e.g., computer vision, machine learning, graph analytics, etc.) that are extremely data-driven. However, designers demand proper design methods to tackle the increasing design complexity and address several new challenges, like the security and privacy of the data to be elaborated.This paper ov

APA, Harvard, Vancouver, ISO, and other styles

11

Clark, N. T., Hongtao Zhong, and S. A. Mahlke. "Automated Custom Instruction Generation for Domain-Specific Processor Acceleration." IEEE Transactions on Computers 54, no. 10 (2005): 1258–70. http://dx.doi.org/10.1109/tc.2005.156.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Khasanov, Robert, Julian Robledo, Christian Menard, Andrés Goens, and Jeronimo Castrillon. "Domain-specific Hybrid Mapping for Energy-efficient Baseband Processing in Wireless Networks." ACM Transactions on Embedded Computing Systems 20, no. 5s (2021): 1–26. http://dx.doi.org/10.1145/3476991.

Full text

Abstract:

Advancing telecommunication standards continuously push for larger bandwidths, lower latencies, and faster data rates. The receiver baseband unit not only has to deal with a huge number of users expecting connectivity but also with a high workload heterogeneity. As a consequence of the required flexibility, baseband processing has seen a trend towards software implementations in cloud Radio Access Networks (cRANs). The flexibility gained from software implementation comes at the price of impoverished energy efficiency. This paper addresses the trade-off between flexibility and efficiency by pr

APA, Harvard, Vancouver, ISO, and other styles

13

Fuhrer, Oliver, Tarun Chadha, Torsten Hoefler, et al. "Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0." Geoscientific Model Development 11, no. 4 (2018): 1665–81. http://dx.doi.org/10.5194/gmd-11-1665-2018.

Full text

Abstract:

Abstract. The best hope for reducing long-standing global climate model biases is by increasing resolution to the kilometer scale. Here we present results from an ultrahigh-resolution non-hydrostatic climate model for a near-global setup running on the full Piz Daint supercomputer on 4888 GPUs (graphics processing units). The dynamical core of the model has been completely rewritten using a domain-specific language (DSL) for performance portability across different hardware architectures. Physical parameterizations and diagnostics have been ported using compiler directives. To our knowledge th

APA, Harvard, Vancouver, ISO, and other styles

14

Bonelli, Nicola, Stefano Giordano, and Gregorio Procissi. "Enif-Lang: A Specialized Language for Programming Network Functions on Commodity Hardware." Journal of Sensor and Actuator Networks 7, no. 3 (2018): 34. http://dx.doi.org/10.3390/jsan7030034.

Full text

Abstract:

The maturity level reached by today’s commodity platforms makes even low-cost PCs viable alternatives to dedicated hardware to implement real network functions without sacrificing performance. Indeed, the availability of multi-core processing packages and multi-queue network interfaces that can be managed by accelerated I/O frameworks, provides off-the-shelf servers with the necessary power capability for running a broad variety of network applications with near hardware-class performance. At the same time, the introduction of the Software Defined Networks (SDN) and the Network Functions Virtu

APA, Harvard, Vancouver, ISO, and other styles

15

Pfänder, O. A., H. J. Pfleiderer, and S. W. Lachowicz. "Configurable multiplier modules for an adaptive computing system." Advances in Radio Science 4 (September 6, 2006): 231–36. http://dx.doi.org/10.5194/ars-4-231-2006.

Full text

Abstract:

Abstract. The importance of reconfigurable hardware is increasing steadily. For example, the primary approach of using adaptive systems based on programmable gate arrays and configurable routing resources has gone mainstream and high-performance programmable logic devices are rivaling traditional application-specific hardwired integrated circuits. Also, the idea of moving from the 2-D domain into a 3-D design which stacks several active layers above each other is gaining momentum in research and industry, to cope with the demand for smaller devices with a higher scale of integration. However,

APA, Harvard, Vancouver, ISO, and other styles

16

Skhiri, Rym, Virginie Fresse, Jean Paul Jamont, Benoit Suffran, and Jihene Malek. "From FPGA to Support Cloud to Cloud of FPGA: State of the Art." International Journal of Reconfigurable Computing 2019 (December 5, 2019): 1–17. http://dx.doi.org/10.1155/2019/8085461.

Full text

Abstract:

Field Programmable Gate Array (FPGA) draws a significant attention from both industry and academia by accelerating computationally expensive applications and achieving low power consumption. FPGAs are interesting due to the flexibility and reconfigurabiltiy of their device. Cloud computing becomes a major trend towards infrastructure and computing resources dematerialization. It provides “unlimited” storage capacities and a large number of data and applications that make collaboration easier between multiple (not domain specific) designers. Many papers in the literature have surveyed Cloud and

APA, Harvard, Vancouver, ISO, and other styles

17

Benkrid, Khaled, Ali Akoglu, Cheng Ling, Yang Song, Ying Liu, and Xiang Tian. "High Performance Biological Pairwise Sequence Alignment: FPGA versus GPU versus Cell BE versus GPP." International Journal of Reconfigurable Computing 2012 (2012): 1–15. http://dx.doi.org/10.1155/2012/752910.

Full text

Abstract:

This paper explores the pros and cons of reconfigurable computing in the form of FPGAs for high performance efficient computing. In particular, the paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs), Graphics Processor Units (GPUs), and IBM’s Cell Broadband Engine (Cell BE), in the design and implementation of the widely-used Smith-Waterman pairwise sequence alignment algorithm, with general purpose processors as a base reference implementation. Comparison criteria include speed, energy consumption

APA, Harvard, Vancouver, ISO, and other styles

18

Morrissey, John P., Prabhat Totoo, Kevin J. Hanley, et al. "Post-processing and visualization of large-scale DEM simulation data with the open-source VELaSSCo platform." SIMULATION 96, no. 7 (2020): 567–81. http://dx.doi.org/10.1177/0037549720906465.

Full text

Abstract:

Regardless of its origin, in the near future the challenge will not be how to generate data, but rather how to manage big and highly distributed data to make it more easily handled and more accessible by users on their personal devices. VELaSSCo (Visualization for Extremely Large-Scale Scientific Computing) is a platform developed to provide new visual analysis methods for large-scale simulations serving the petabyte era. The platform adopts Big Data tools/architectures to enable in-situ processing for analytics of engineering and scientific data and hardware-accelerated interactive visualizat

APA, Harvard, Vancouver, ISO, and other styles

19

Rumetshofer, Johannes, Michael Stolz, and Daniel Watzenig. "A Generic Interface Enabling Combinations of State-of-the-Art Path Planning and Tracking Algorithms." Electronics 10, no. 7 (2021): 788. http://dx.doi.org/10.3390/electronics10070788.

Full text

Abstract:

In the development of Level 4 automated driving functions, very specific, but diverse, requirements with respect to the operational design domain have to be considered. In order to accelerate this development, it is advantageous to combine dedicated state-of-the-art software components, as building blocks in modular automated driving function architectures, instead of developing special solutions from scratch. However, e.g., in local motion planning and control, the combination of components is still limited in practice, due to necessary interface alignments, which might yield sub-optimal solu

APA, Harvard, Vancouver, ISO, and other styles

20

Xing, Fei, Yi Ping Yao, Zhi Wen Jiang, and Bing Wang. "Fine-Grained Parallel and Distributed Spatial Stochastic Simulation of Biological Reactions." Advanced Materials Research 345 (September 2011): 104–12. http://dx.doi.org/10.4028/www.scientific.net/amr.345.104.

Full text

Abstract:

To date, discrete event stochastic simulations of large scale biological reaction systems are extremely compute-intensive and time-consuming. Besides, it has been widely accepted that spatial factor plays a critical role in the dynamics of most biological reaction systems. The NSM (the Next Sub-Volume Method), a spatial variation of the Gillespie’s stochastic simulation algorithm (SSA), has been proposed for spatially stochastic simulation of those systems. While being able to explore high degree of parallelism in systems, NSM is inherently sequential, which still suffers from the problem of l

APA, Harvard, Vancouver, ISO, and other styles

21

Russo, Enrico, Maurizio Palesi, Salvatore Monteleone, et al. "DNN Model Compression for IoT Domain Specific Hardware Accelerators." IEEE Internet of Things Journal, 2021, 1. http://dx.doi.org/10.1109/jiot.2021.3111723.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!