To see the other types of publications on this topic, follow the link: Fused floating-point arithmetic unit.

Journal articles on the topic 'Fused floating-point arithmetic unit'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Fused floating-point arithmetic unit.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Kim, Hyunpil, and Sangook Moon. "Proxy Bits for Low Cost Floating-Point Fused Multiply–Add Unit." Journal of Circuits, Systems and Computers 25, no. 10 (2016): 1650127. http://dx.doi.org/10.1142/s0218126616501279.

Full text
Abstract:
A new floating-point fused multiply–add (FMA) unit is proposed in this paper. We observed a group of redundant bits that have no effect on the effective results of the floating-point FMA arithmetic, and figured out that two proxy bits can replace the redundant bits. We proved the existence of the proxy bits using binary arithmetic keeping track of the negligible bits. Using proxy bits, the proposed FMA unit achieves improvement in terms of cost, power consumption, and performance. The results show that the proposed FMA unit reduces the total area and latency by approximately 17.0% and 32% respectively, compared with a current widely used FMA unit.
APA, Harvard, Vancouver, ISO, and other styles
2

Prabhu, E., H. Mangalam, and S. Karthick. "Design of area and power efficient Radix-4 DIT FFT butterfly unit using floating point fused arithmetic." Journal of Central South University 23, no. 7 (2016): 1669–81. http://dx.doi.org/10.1007/s11771-016-3221-y.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

AnanthaLakshmi, A. V., and Gnanou Florence Sudha. "A novel power efficient 0.64-GFlops fused 32-bit reversible floating point arithmetic unit architecture for digital signal processing applications." Microprocessors and Microsystems 51 (June 2017): 366–85. http://dx.doi.org/10.1016/j.micpro.2017.01.002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Timmermann, D., B. Rix, H. Hahn, and B. J. Hosticka. "A CMOS floating-point vector-arithmetic unit." IEEE Journal of Solid-State Circuits 29, no. 5 (1994): 634–39. http://dx.doi.org/10.1109/4.284719.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Liu, De, MingJiang Wang, and Shikai Zuo. "Delay-optimized floating point fused add-subtract unit." IEICE Electronics Express 12, no. 17 (2015): 20150642. http://dx.doi.org/10.1587/elex.12.20150642.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Sohn, Jongwook, and Earl E. Swartzlander. "A Fused Floating-Point Four-Term Dot Product Unit." IEEE Transactions on Circuits and Systems I: Regular Papers 63, no. 3 (2016): 370–78. http://dx.doi.org/10.1109/tcsi.2016.2525042.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Tang, Xia Qing, Xiang Liu, Jun Qiang Gao, and Bo Lin. "Design and Implementation of FPGA-Based High-Performance Floating Point Arithmetic Unit." Applied Mechanics and Materials 599-601 (August 2014): 1465–69. http://dx.doi.org/10.4028/www.scientific.net/amm.599-601.1465.

Full text
Abstract:
Since FPGA processing data, the presence of fixed-point processing accuracy is not high, and IP Core floating point unit and there are some problems in the use of design risk. Based on the improved floating point unit and program optimization algorithm is designed to achieve single-precision floating-point add / subtract, multiply, and divide operations operator. IP Core for floating-point unit design and FPGA development software provides comparative results: both the maximum clock frequency and latency basically unchanged, while the former occupies less hardware resources, to complete a plus / minus, multiply, divide computation time required for the former than the latter were reduced by 46%, 37% and 57%. The program is downloaded to the FPGA chip to get the same results with the simulation results verify the correctness and feasibility of the design.
APA, Harvard, Vancouver, ISO, and other styles
8

Sohn, Jongwook, and Earl E. Swartzlander. "Improved Architectures for a Fused Floating-Point Add-Subtract Unit." IEEE Transactions on Circuits and Systems I: Regular Papers 59, no. 10 (2012): 2285–91. http://dx.doi.org/10.1109/tcsi.2012.2188955.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Manolopoulos, K., D. Reisis, and V. A. Chouliaras. "An efficient multiple precision floating-point Multiply-Add Fused unit." Microelectronics Journal 49 (March 2016): 10–18. http://dx.doi.org/10.1016/j.mejo.2015.10.012.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Salman Faraz, Shaikh, Yogesh Suryawanshi, Sandeep Kakde, Ankita Tijare, and Rajesh Thakare. "Design and Synthesis of Restoring Technique Based Dual Mode Floating Point Divider for Fast Computing Applications." International Journal of Engineering & Technology 7, no. 3.6 (2018): 48. http://dx.doi.org/10.14419/ijet.v7i3.6.14936.

Full text
Abstract:
Floating point division plays a vital role in quick processing applications. A division is one amongst the complicated modules needed in processors. Area, delay and power consumption are the main factors that play a significant role once planning a floating point dual-precision divider. Compared to different floating-point arithmetic, the design of division is way a lot of sophisticated and needs longer time. Floating point division is that the main arithmetic unit that is employed within the design of the many processors in the field of DSP, math processors and plenty of different applications. This paper relies on the dual-mode practicality of floating point division. The proposed designed architecture supports the single precision (32-bit) as well as double precision (64-bit) IEEE 754 floating point format. It uses restoring division technique for the fraction part division. This design consists of varied sub-modules like shifters, exceptional handlers, Normalizers and many more.
APA, Harvard, Vancouver, ISO, and other styles
11

Yun, Hyoung-Kie, and Dai-Tchul Moon. "Design of Parallel Decimal Floating-Point Arithmetic Unit for High-speed Operations." Journal of the Korea Institute of Information and Communication Engineering 17, no. 12 (2013): 2921–26. http://dx.doi.org/10.6109/jkiice.2013.17.12.2921.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Wang, Mingjiang, De Liu, Ming Liu, and Boya Zhao. "A two-item floating point fused dot-product unit with latency reduced." IEICE Electronics Express 13, no. 23 (2016): 20160937. http://dx.doi.org/10.1587/elex.13.20160937.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Sivanantham, S., and J. Jean Jenifer Nesam. "Reconfigurable half-precision floating-point real/complex fused multiply and add unit." International Journal of Materials and Product Technology 60, no. 1 (2020): 58. http://dx.doi.org/10.1504/ijmpt.2020.10030442.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Nesam, J. Jean Jenifer, and S. Sivanantham. "Reconfigurable half-precision floating-point real/complex fused multiply and add unit." International Journal of Materials and Product Technology 60, no. 1 (2020): 58. http://dx.doi.org/10.1504/ijmpt.2020.108488.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Mathapati, Rajeshwari, and Shrikant K.Shirakol. "A Decimal Floating Point Arithmetic Unit for Embedded System Applications using VLSI Techniques." International Journal of Engineering Trends and Technology 12, no. 8 (2014): 365–70. http://dx.doi.org/10.14445/22315381/ijett-v12p271.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Przybył, Andrzej. "Fixed-Point Arithmetic Unit with a Scaling Mechanism for FPGA-Based Embedded Systems." Electronics 10, no. 10 (2021): 1164. http://dx.doi.org/10.3390/electronics10101164.

Full text
Abstract:
The work describes the new architecture of a fixed-point arithmetic unit. It is based on the use of integer arithmetic operations for which the information about the scale of the processed numbers is contained in the binary code of the arithmetic instruction being executed. Therefore, this approach is different from the typical way of implementing fixed-point operations on standard processors. The presented solution is also significantly different from the one used in floating-point arithmetic, as the decision to determine the appropriate scale is made at the stage of compiling the code and not during its execution. As a result, the real-time processing of real numbers is simplified and, therefore, faster. The described method provides a better ratio of the processing efficiency to the complexity of the digital system than other methods. In particular, the advantage of using the described method in FPGA-based embedded control systems should be indicated. Experimental tests on an industrial servo-drive confirm the correctness of the described solution.
APA, Harvard, Vancouver, ISO, and other styles
17

SHARMA, SUBHASH KUMAR, SHRI PRAKASH DUBEY, and ANIL KUMAR MISHRA. "Development of Library Components for Floating Point Processor." Journal of Ultra Scientist of Physical Sciences Section A 33, no. 4 (2021): 42–50. http://dx.doi.org/10.22147/jusps-a/330402.

Full text
Abstract:
This paper deals with development of an n-bit binary to decimal conversion, decimal to n bit binary conversion and decimal to IEEE-754 conversion for floating point arithmetic logic unit (FPALU) using VHDL. Normally most of the industries now a days are using either 4-bit conversion of ALU or 8-bit conversions of ALU, so we have generalized this, thus we need not to worry about the bit size of conversion of ALU. It has solved all the problems of 4-bit, 8-bit, 16-bit conversions of ALU’s and so on. Hence, we have utilized VHSIC Hardware Description Language and Xilinx in accomplishing this task of development of conversions processes of ALU
APA, Harvard, Vancouver, ISO, and other styles
18

Kumar, J. Vijay, B. Naga Raju, M. Vasu Babu, and T. Ramanjappa. "Implementation of Low Power Pipelined 64-bit RISC Processor with Unbiased FPU on CPLD." International Journal of Reconfigurable and Embedded Systems (IJRES) 5, no. 2 (2016): 118. http://dx.doi.org/10.11591/ijres.v5.i2.pp118-123.

Full text
Abstract:
This article represents the implementation of low power pipelined 64-bit RISC processor on Altera MAXV CPLD device. The design is verified for arithmetic operations of both fixed and floating point numbers, branch and logical function of RISC processor. For all the jump instruction, the processor architecture will automatically flush the data in the pipeline, so as to avoid any misbehavior. This processor contains FPU unit, which supports double precision IEEE-754 format operations very accurately. The simulation results have been verified by using ModelSim software. The ALU operations and double precision floating point arithmetic operation results are displayed on 7-Segments. The necessary code is written in Verilog HDL.
APA, Harvard, Vancouver, ISO, and other styles
19

Aliyu, Farouq. "Design and Analysis of a Floating Point Fused Multiply Add Unit using VHDL." International Journal of Engineering Trends and Technology 24, no. 4 (2015): 169–76. http://dx.doi.org/10.14445/22315381/ijett-v24p232.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Gayathri, S. S., R. Kumar, Samiappan Dhanalakshmi, Gerard Dooly, and Dinesh Babu Duraibabu. "T-Count Optimized Quantum Circuit Designs for Single-Precision Floating-Point Division." Electronics 10, no. 6 (2021): 703. http://dx.doi.org/10.3390/electronics10060703.

Full text
Abstract:
The implementation of quantum computing processors for scientific applications includes quantum floating points circuits for arithmetic operations. This work adopts the standard division algorithms for floating-point numbers with restoring, non-restoring, and Goldschmidt division algorithms for single-precision inputs. The design proposals are carried out while using the quantum Clifford+T gates set, and resource estimates in terms of numbers of qubits, T-count, and T-depth are provided for the proposed circuits. By improving the leading zero detector (LZD) unit structure, the proposed division circuits show a significant reduction in the T-count when compared to the existing works on floating-point division.
APA, Harvard, Vancouver, ISO, and other styles
21

Nievergelt, Yves. "Scalar fused multiply-add instructions produce floating-point matrix arithmetic provably accurate to the penultimate digit." ACM Transactions on Mathematical Software 29, no. 1 (2003): 27–48. http://dx.doi.org/10.1145/641876.641878.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Mu`ñoz, Daniel M., Diego F. Sanchez, Carlos H. Llanos, and Mauricio Ayala-Rincón. "Tradeoff of FPGA Design of a Floating-point Library for Arithmetic Operators." Journal of Integrated Circuits and Systems 5, no. 1 (2010): 42–52. http://dx.doi.org/10.29292/jics.v5i1.309.

Full text
Abstract:
Many scientific and engineering applications require to perform a large number of arithmetic operations that must be computed in an efficient manner using a high precision and a large dynamic range. Commonly, these applications are implemented on personal computers taking advantage of the floating-point arithmetic to perform the computations and high operational frequencies. However, most common software architectures execute the instructions in a sequential way due to the von Neumann model and, consequently, several delays are introduced in the data transfer between the program memory and the Arithmetic Logic Unit (ALU). There are several mobile applications which require to operate with a high performance in terms of accuracy of the computations and execution time as well as with low power consumption. Modern Field Programmable Gate Arrays (FPGAs) are a suitable solution for high performance embedded applications given the flexibility of their architectures and their parallel capabilities, which allows the implementation of complex algorithms and performance improvements. This paper describes a parameterizable floating-point library for arithmetic operators based on FPGAs. A general architecture was implemented for addition/subtraction and multiplication and two different architectures based on the Goldschmidt’s and the Newton-Raphson algorithms were implemented for division and square root. Additionally, a tradeoff analysis of the hardware implementation was performed, which enables the designer to choose, for general purpose applications, the suitable bit-width representation and error associated, as well as the area cost, elapsed time and power consumption for each arithmetic operator. Synthesis results have demonstrated the effectiveness of the implemented cores on commercial FPGAs and showed that the most critical parameter is the dedicated Digital Signal Processing (DSP) slices consumption. Simulation results were addressed to compute the mean square error (MSE) and maximum absolute error demonstrating the correctness of the implemented floating-point library and achieving and experimental error analysis. The Newton-Raphson algorithm achieves similar MSE results as the Goldschmidt’s algorithm, operating with similar frequencies; however, the first one saves more logic area and dedicated DSP blocks.
APA, Harvard, Vancouver, ISO, and other styles
23

CVS, Chaitanya, Sundaresan C, P. R Venkateswaran, and Keerthana Prasad. "Design of modified booth based multiplier with carry pre-computation." Indonesian Journal of Electrical Engineering and Computer Science 13, no. 3 (2019): 1048. http://dx.doi.org/10.11591/ijeecs.v13.i3.pp1048-1055.

Full text
Abstract:
Arithmetic unit is the most important component of modern embedded computer systems. Arithmetic unit generally includes floating point and fixed-point arithmetic operations and trigonometric functions. Multipliers units are the most important hardware structures in a complex arithmetic unit. With increase in chip frequency, the designer must be able to find the best set of trade-offs. The ability for faster computation is essential to achieve high performance in many DSP and Graphic processing algorithms and is why there is at least one dedicated Multiplier unit in all of the modern commercial DSP processors. Tremendous advances in VLSI technology over the past several years resulted in an increased need for high speed multipliers and compelled the designers to go for trade-offs among speed, power consumption and area. A novel modified booth multiplier design for high speed VLSI applications using pre-computation logic has been presented in this paper. The proposed architecture modeled using Verilog HDL, simulated using Cadence NCSIM and synthesized using Cadence RTL Compiler with 65nm TSMC library.The proposed multiplier architecture is compared with the existing multipliers and the results show significant improvement in speed and power dissipation.
APA, Harvard, Vancouver, ISO, and other styles
24

AnanthaLakshmi, A. V. "DESIGN OF A REVERSIBLE FUSED 32-POINT RADIX -2 FLOATING POINT FFT UNIT USING 3:2 COMPRESSOR." International Journal of New Computer Architectures and their Applications 4, no. 4 (2014): 201–10. http://dx.doi.org/10.17781/p0020.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Kumar, Amit, Saxena A.K, and Dasgupta S. "IMPLEMENTATION OF FLOATING POINT AND LOGARITHMIC NUMBER SYSTEM ARITHMETIC UNIT AND THEIR COMPARISON FOR FPGA." International Journal on Intelligent Electronic Systems 2, no. 1 (2008): 1–6. http://dx.doi.org/10.18000/ijies.30016.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Arumalla, Anitha, and Madhavi Makkena. "An Effective Implementation of Dual Path Fused Floating-Point Add-Subtract Unit for Reconfigurable Architectures." International Journal of Intelligent Engineering and Systems 10, no. 3 (2017): 40–47. http://dx.doi.org/10.22266/ijies2017.0430.05.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Chen, C., L. A. Chen, and J. R. Cheng. "Architectural design of a fast floating-point multiplication-add fused unit using signed-digit addition." IEE Proceedings - Computers and Digital Techniques 149, no. 4 (2002): 113. http://dx.doi.org/10.1049/ip-cdt:20020409.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Acharya, Shivani, and Augusta Sophy Beulet. "EFFICIENT FLOATING POINT FAST FOURIER TRANSFORM BUTTERFLY ARCHITECTURE USING BINARY SIGNED DIGIT MULTIPLIER AND ADDERS." Asian Journal of Pharmaceutical and Clinical Research 10, no. 13 (2017): 73. http://dx.doi.org/10.22159/ajpcr.2017.v10s1.19568.

Full text
Abstract:
Fast Fourier transform (FFT) is one of the most important tools in digital signal processing as well as communication system because transforming time domain to S-plane is very convenient using FFT. As FFT uses various techniques to convert a signal from time domain to S-domain and inverse, out of which butterfly technique is the one on which paper is focused on. Butterfly technique uses additions and multiplications of operands to get the required output. Floating point (FP) is used as operands due to their flexibility. As the computations involving FP has less speed, we have used binary signed digit (BSD). BSD will take the less time for addition and subtraction. Three bit BSD adder and FP adder together will make a fused dot product add (FDPA) unit. In FDPA, unit addition and subtraction will be one group and multiplication will be one group and then their respective results will be fused. Modified booth encoding and decoding algorithm are used here to make the complex multiplication with ease.
APA, Harvard, Vancouver, ISO, and other styles
29

K. Rama Naidu, M. Madhu Babu,. "Area and Power Efficient Fused Floating-point Dot Product Unit based on Radix-2r Multiplier & Pipeline Feedforward-Cutset-Free Carry-Lookahead Adder." INFORMATION TECHNOLOGY IN INDUSTRY 9, no. 2 (2021): 782–88. http://dx.doi.org/10.17762/itii.v9i2.411.

Full text
Abstract:
Fused floating point operations play a major role in many DSP applications to reduce operational area & power consumption. Radix-2r multiplier (using 7-bit encoder technique) & pipeline feedforward-cutset-free carry-lookahead adder(PFCF-CLA) are used to enhance the traditional FDP unit. Pipeline concept is also infused into system to get the desired pipeline fused floating-point dot product (PFFDP) operations. Synthesis results are obtained using 60nm standard library with 1GHz clock. Power consumption of single & double precision operations are 2.24mW & 3.67mW respectively. The die areas are 27.48 mm2 , 46.72mm2 with an execution time of 1.91 ns , 2.07 ns for a single & double precision operations respectively. Comparison with previous data has also been performed. The area-delay product(ADP) & power-delay product(PDP) of our proposed architecture are 18%,22% & 27%,18% for single and double precision operations respectively.
APA, Harvard, Vancouver, ISO, and other styles
30

Zhang, Xiao Yan, Yiu-Hing Chan, Robert Montoye, Leon Sigal, Eric Schwarz, and Michael Kelly. "A 270ps 20mW 108-bit End-around Carry Adder for Multiply-Add Fused Floating Point Unit." Journal of Signal Processing Systems 58, no. 2 (2009): 139–44. http://dx.doi.org/10.1007/s11265-008-0325-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Dinesh Kumar, J. R., C. Ganesh Babu, V. R. Balaji, and C. Visvesvaran. "Analysis of effectiveness of power on refined numerical models of floating point arithmetic unit for biomedical applications." IOP Conference Series: Materials Science and Engineering 764 (March 7, 2020): 012032. http://dx.doi.org/10.1088/1757-899x/764/1/012032.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Grover, Naresh, and M. K. Soni. "Design of FPGA based 32-bit Floating Point Arithmetic Unit and verification of its VHDL code using MATLAB." International Journal of Information Engineering and Electronic Business 6, no. 1 (2014): 1–14. http://dx.doi.org/10.5815/ijieeb.2014.01.01.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Cococcioni, Marco, Federico Rossi, Emanuele Ruffaldi, and Sergio Saponara. "Fast Approximations of Activation Functions in Deep Neural Networks when using Posit Arithmetic." Sensors 20, no. 5 (2020): 1515. http://dx.doi.org/10.3390/s20051515.

Full text
Abstract:
With increasing real-time constraints being put on the use of Deep Neural Networks (DNNs) by real-time scenarios, there is the need to review information representation. A very challenging path is to employ an encoding that allows a fast processing and hardware-friendly representation of information. Among the proposed alternatives to the IEEE 754 standard regarding floating point representation of real numbers, the recently introduced Posit format has been theoretically proven to be really promising in satisfying the mentioned requirements. However, with the absence of proper hardware support for this novel type, this evaluation can be conducted only through a software emulation. While waiting for the widespread availability of the Posit Processing Units (the equivalent of the Floating Point Unit (FPU)), we can already exploit the Posit representation and the currently available Arithmetic-Logic Unit (ALU) to speed up DNNs by manipulating the low-level bit string representations of Posits. As a first step, in this paper, we present new arithmetic properties of the Posit number system with a focus on the configuration with 0 exponent bits. In particular, we propose a new class of Posit operators called L1 operators, which consists of fast and approximated versions of existing arithmetic operations or functions (e.g., hyperbolic tangent (TANH) and extended linear unit (ELU)) only using integer arithmetic. These operators introduce very interesting properties and results: (i) faster evaluation than the exact counterpart with a negligible accuracy degradation; (ii) an efficient ALU emulation of a number of Posits operations; and (iii) the possibility to vectorize operations in Posits, using existing ALU vectorized operations (such as the scalable vector extension of ARM CPUs or advanced vector extensions on Intel CPUs). As a second step, we test the proposed activation function on Posit-based DNNs, showing how 16-bit down to 10-bit Posits represent an exact replacement for 32-bit floats while 8-bit Posits could be an interesting alternative to 32-bit floats since their performances are a bit lower but their high speed and low storage properties are very appealing (leading to a lower bandwidth demand and more cache-friendly code). Finally, we point out how small Posits (i.e., up to 14 bits long) are very interesting while PPUs become widespread, since Posit operations can be tabulated in a very efficient way (see details in the text).
APA, Harvard, Vancouver, ISO, and other styles
34

ÖZKILBAÇ, Bahadır. "Implementation and Design of 32 Bit Floating-Point ALU on a Hybrid FPGA-ARM Platform." Brilliant Engineering 1, no. 1 (2019): 26–32. http://dx.doi.org/10.36937/ben.2020.001.005.

Full text
Abstract:
FPGAs have capabilities such as low power consumption, multiple I/O pins, and parallel processing. Because of these capabilities, FPGAs are commonly used in numerous areas that require mathematical computing such as signal processing, artificial neural network design, image processing and filter applications. From the simplest to the most complex, all mathematical applications are based on multiplication, division, subtraction, addition. When calculating, it is often necessary to deal with numbers that are fractional, large or negative. In this study, the Arithmetic Logic Unit (ALU), which uses multiplication, division, addition, subtraction in the form of IEEE754 32-bit floating-point number used to represent fractional and large numbers is designed using FPGA part of the Xilinx Zynq-7000 integrated circuit. The programming language used is VHDL. Then, the ALU designed by the ARM processor part of the same integrated circuit was sent by the commands and controlled.
APA, Harvard, Vancouver, ISO, and other styles
35

Ji, Hao, Michael Mascagni, and Yaohang Li. "Gaussian variant of Freivalds’ algorithm for efficient and reliable matrix product verification." Monte Carlo Methods and Applications 26, no. 4 (2020): 273–84. http://dx.doi.org/10.1515/mcma-2020-2076.

Full text
Abstract:
AbstractIn this article, we consider the general problem of checking the correctness of matrix multiplication. Given three n\times n matrices 𝐴, 𝐵 and 𝐶, the goal is to verify that A\times B=C without carrying out the computationally costly operations of matrix multiplication and comparing the product A\times B with 𝐶, term by term. This is especially important when some or all of these matrices are very large, and when the computing environment is prone to soft errors. Here we extend Freivalds’ algorithm to a Gaussian Variant of Freivalds’ Algorithm (GVFA) by projecting the product A\times B as well as 𝐶 onto a Gaussian random vector and then comparing the resulting vectors. The computational complexity of GVFA is consistent with that of Freivalds’ algorithm, which is O(n^{2}). However, unlike Freivalds’ algorithm, whose probability of a false positive is 2^{-k}, where 𝑘 is the number of iterations, our theoretical analysis shows that, when A\times B\neq C, GVFA produces a false positive on set of inputs of measure zero with exact arithmetic. When we introduce round-off error and floating-point arithmetic into our analysis, we can show that the larger this error, the higher the probability that GVFA avoids false positives. Moreover, by iterating GVFA 𝑘 times, the probability of a false positive decreases as p^{k}, where 𝑝 is a very small value depending on the nature of the fault on the result matrix and the arithmetic system’s floating-point precision. Unlike deterministic algorithms, there do not exist any fault patterns that are completely undetectable with GVFA. Thus GVFA can be used to provide efficient fault tolerance in numerical linear algebra, and it can be efficiently implemented on modern computing architectures. In particular, GVFA can be very efficiently implemented on architectures with hardware support for fused multiply-add operations.
APA, Harvard, Vancouver, ISO, and other styles
36

Pietras, M., and P. Klęsk. "FPGA implementation of logarithmic versions of Baum-Welch and Viterbi algorithms for reduced precision hidden Markov models." Bulletin of the Polish Academy of Sciences Technical Sciences 65, no. 6 (2017): 935–47. http://dx.doi.org/10.1515/bpasts-2017-0101.

Full text
Abstract:
Abstract This paper presents a programmable system-on-chip implementation to be used for acceleration of computations within hidden Markov models. The high level synthesis (HLS) and “divide-and-conquer” approaches are presented for parallelization of Baum-Welch and Viterbi algorithms. To avoid arithmetic underflows, all computations are performed within the logarithmic space. Additionally, in order to carry out computations efficiently – i.e. directly in an FPGA system or a processor cache – we postulate to reduce the floating-point representations of HMMs. We state and prove a lemma about the length of numerically unsafe sequences for such reduced precision models. Finally, special attention is devoted to the design of a multiple logarithm and exponent approximation unit (MLEAU). Using associative mapping, this unit allows for simultaneous conversions of multiple values and thereby compensates for computational efforts of logarithmic-space operations. Design evaluation reveals absolute stall delay occurring by multiple hardware conversions to logarithms and to exponents, and furthermore the experiments evaluation reveals HMMs computation boundaries related to their probabilities and floating-point representation. The performance differences at each stage of computation are summarized in performance comparison between hardware acceleration using MLEAU and typical software implementation on an ARM or Intel processor.
APA, Harvard, Vancouver, ISO, and other styles
37

Cabodi, G., A. Garbo, C. Loiacono, S. Quer, and G. Francini. "Efficient Complex High-Precision Computations on GPUs without Precision Loss." Journal of Circuits, Systems and Computers 26, no. 12 (2017): 1750187. http://dx.doi.org/10.1142/s0218126617501870.

Full text
Abstract:
General-purpose computing on graphics processing units is the utilization of a graphics processing unit (GPU) to perform computation in applications traditionally handled by the central processing unit. Many attempts have been made to implement well-known algorithms on embedded and mobile GPUs. Unfortunately, these applications are computationally complex and often require high precision arithmetic, whereas embedded and mobile GPUs are designed specifically for graphics, and thus are very restrictive in terms of input/output, precision, programming style and primitives available. This paper studies how to implement efficient and accurate high-precision algorithms on embedded GPUs adopting the OpenGL ES language. We discuss the problems arising during the design phase, and we detail our implementation choices, focusing on the SIFT and ALP key-point detectors. We transform standard, i.e., single (or double) precision floating-point computations, to reduced-precision GPU arithmetic without precision loss. We develop a desktop framework to simulate Gaussian Scale Space transforms on all possible target embedded GPU platforms, and with all possible range and precision arithmetic. We illustrate how to re-engineer standard Gaussian Scale Space computations to mobile multi-core parallel GPUs using the OpenGL ES language. We present experiments on a large set of standard images, proving how efficiency and accuracy can be maintained on different target platforms. To sum up, we present a complete framework to minimize future programming effort, i.e., to easily check, on different embedded platforms, the accuracy and performance of complex algorithms requiring high-precision computations.
APA, Harvard, Vancouver, ISO, and other styles
38

Bisoyi, Abhyarthana, and Aruna Tripathy. "Design of a Novel Fused Add-Sub Module for IEEE 754-2008 Floating Point Unit in High Speed Applications." Communications on Applied Electronics 7, no. 33 (2020): 1–7. http://dx.doi.org/10.5120/cae2020652854.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Chang, Yisong, Jizeng Wei, Wei Guo, and Jizhou Sun. "A high performance, area efficient TTA-like vertex shader architecture with optimized floating point arithmetic unit for embedded graphics applications." Microprocessors and Microsystems 37, no. 6-7 (2013): 725–38. http://dx.doi.org/10.1016/j.micpro.2012.06.003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Moroz, Leonid V., Volodymyr V. Samotyy, and Oleh Y. Horyachyy. "Modified Fast Inverse Square Root and Square Root Approximation Algorithms: The Method of Switching Magic Constants." Computation 9, no. 2 (2021): 21. http://dx.doi.org/10.3390/computation9020021.

Full text
Abstract:
Many low-cost platforms that support floating-point arithmetic, such as microcontrollers and field-programmable gate arrays, do not include fast hardware or software methods for calculating the square root and/or reciprocal square root. Typically, such functions are implemented using direct lookup tables or polynomial approximations, with a subsequent application of the Newton–Raphson method. Other, more complex solutions include high-radix digit-recurrence and bipartite or multipartite table-based methods. In contrast, this article proposes a simple modification of the fast inverse square root method that has high accuracy and relatively low latency. Algorithms are given in C/C++ for single- and double-precision numbers in the IEEE 754 format for both square root and reciprocal square root functions. These are based on the switching of magic constants in the initial approximation, depending on the input interval of the normalized floating-point numbers, in order to minimize the maximum relative error on each subinterval after the first iteration—giving 13 correct bits of the result. Our experimental results show that the proposed algorithms provide a fairly good trade-off between accuracy and latency after two iterations for numbers of type float, and after three iterations for numbers of type double when using fused multiply–add instructions—giving almost complete accuracy.
APA, Harvard, Vancouver, ISO, and other styles
41

KARNER, HERBERT, MARTIN AUER, and CHRISTOPH W. UEBERHUBER. "MULTIPLY-ADD OPTIMIZED FFT KERNELS." Mathematical Models and Methods in Applied Sciences 11, no. 01 (2001): 105–17. http://dx.doi.org/10.1142/s0218202501000775.

Full text
Abstract:
Modern computer architecture provides a special instruction — the fused multiply-add (FMA) instruction — to perform both a multiplication and an addition operation at the same time. In this paper newly developed radix-2, radix-3, and radix-5 FFT kernels that efficiently take advantage of this powerful instruction are presented. If a processor is provided with FMA instructions, the radix-2 FFT algorithm introduced has the lowest complexity of all Cooley–Tukey radix-2 algorithms. All floating-point operations are executed as FMA instructions. Compared to conventional radix-3 and radix-5 kernels, the new radix-3 and radix-5 kernels greatly improve the utilization of FMA instructions, which results in a significant reduction in complexity. In general, the advantages of the FFT algorithms presented in this paper are their low arithmetic complexity, their high efficiency, and their striking simplicity. Numerical experiments show that FFT programs using the new kernels clearly outperform even the best conventional FFT routines.
APA, Harvard, Vancouver, ISO, and other styles
42

Büscher, Nils, Daniel Gis, Volker Kühn, and Christian Haubelt. "On the Functional and Extra-Functional Properties of IMU Fusion Algorithms for Body-Worn Smart Sensors." Sensors 21, no. 8 (2021): 2747. http://dx.doi.org/10.3390/s21082747.

Full text
Abstract:
In this work, four sensor fusion algorithms for inertial measurement unit data to determine the orientation of a device are assessed regarding their usability in a hardware restricted environment such as body-worn sensor nodes. The assessment is done for both the functional and the extra-functional properties in the context of human operated devices. The four algorithms are implemented in three data formats: 32-bit floating-point, 32-bit fixed-point and 16-bit fixed-point and compared regarding code size, computational effort, and fusion quality. Code size and computational effort are evaluated on an ARM Cortex M0+. For the assessment of the functional properties, the sensor fusion output is compared to a camera generated reference and analyzed in an extensive statistical analysis to determine how data format, algorithm, and human interaction influence the quality of the sensor fusion. Our experiments show that using fixed-point arithmetic can significantly decrease the computational complexity while still maintaining a high fusion quality and all four algorithms are applicable for applications with human interaction.
APA, Harvard, Vancouver, ISO, and other styles
43

Tiwari, Sugandha, Neel Gala, Chester Rebeiro, and V. Kamakoti. "PERI." ACM Transactions on Architecture and Code Optimization 18, no. 3 (2021): 1–26. http://dx.doi.org/10.1145/3446210.

Full text
Abstract:
Owing to the failure of Dennard’s scaling, the past decade has seen a steep growth of prominent new paradigms leveraging opportunities in computer architecture. Two technologies of interest are Posit and RISC-V. Posit was introduced in mid-2017 as a viable alternative to IEEE-754, and RISC-V provides a commercial-grade open source Instruction Set Architecture (ISA). In this article, we bring these two technologies together and propose a Configurable Posit Enabled RISC-V Core called PERI. The article provides insights on how the Single-Precision Floating Point (“F”) extension of RISC-V can be leveraged to support posit arithmetic. We also present the implementation details of a parameterized and feature-complete posit Floating Point Unit (FPU). The configurability and the parameterization features of this unit generate optimal hardware, which caters to the accuracy and energy/area tradeoffs imposed by the applications, a feature not possible with IEEE-754 implementation. The posit FPU has been integrated with the RISC-V compliant SHAKTI C-class core as an execution unit. To further leverage the potential of posit , we enhance our posit FPU to support two different exponent sizes (with posit-size being 32-bits), thereby enabling multiple-precision at runtime. To enable the compilation and execution of C programs on PERI, we have made minimal modifications to the GNU C Compiler (GCC), targeting the “F” extension of the RISC-V. We compare posit with IEEE-754 in terms of hardware area, application accuracy, and runtime. We also present an alternate methodology of integrating the posit FPU with the RISC-V core as an accelerator using the custom opcode space of RISC-V.
APA, Harvard, Vancouver, ISO, and other styles
44

Wei, Xin, Wenchao Liu, Lei Chen, Long Ma, He Chen, and Yin Zhuang. "FPGA-Based Hybrid-Type Implementation of Quantized Neural Networks for Remote Sensing Applications." Sensors 19, no. 4 (2019): 924. http://dx.doi.org/10.3390/s19040924.

Full text
Abstract:
Recently, extensive convolutional neural network (CNN)-based methods have been used in remote sensing applications, such as object detection and classification, and have achieved significant improvements in performance. Furthermore, there are a lot of hardware implementation demands for remote sensing real-time processing applications. However, the operation and storage processes in floating-point models hinder the deployment of networks in hardware implements with limited resource and power budgets, such as field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs). To solve this problem, this paper focuses on optimizing the hardware design of CNN with low bit-width integers by quantization. First, a symmetric quantization scheme-based hybrid-type inference method was proposed, which uses the low bit-width integer to replace floating-point precision. Then, a training approach for the quantized network is introduced to reduce accuracy degradation. Finally, a processing engine (PE) with a low bit-width is proposed to optimize the hardware design of FPGA for remote sensing image classification. Besides, a fused-layer PE is also presented for state-of-the-art CNNs equipped with Batch-Normalization and LeakyRelu. The experiments performed on the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset using a graphics processing unit (GPU) demonstrate that the accuracy of 8-bit quantized model drops by about 1%, which is an acceptable accuracy loss. The accuracy result tested on FPGA is consistent with that of GPU. As for the resource consumptions of FPGA, the Look Up Table (LUT), Flip-flop (FF), Digital Signal Processor (DSP), and Block Random Access Memory (BRAM) are reduced by 46.21%, 43.84%, 45%, and 51%, respectively, compared with that of floating-point implementation.
APA, Harvard, Vancouver, ISO, and other styles
45

Huang, Yi, and Clemens Gühmann. "Temperature estimation of induction machines based on wireless sensor networks." Journal of Sensors and Sensor Systems 7, no. 1 (2018): 267–80. http://dx.doi.org/10.5194/jsss-7-267-2018.

Full text
Abstract:
Abstract. In this paper, a fourth-order Kalman filter (KF) algorithm is implemented in the wireless sensor node to estimate the temperatures of the stator winding, the rotor cage and the stator core in the induction machine. Three separate wireless sensor nodes are used as the data acquisition systems for different input signals. Six Hall sensors are used to acquire the three-phase stator currents and voltages of the induction machine. All of them are processed to root mean square (rms) in ampere and volt. A rotary encoder is mounted for the rotor speed and Pt-1000 is used for the temperature of the coolant air. The processed signals in the physical unit are transmitted wirelessly to the host wireless sensor node, where the KF is implemented with fixed-point arithmetic in Contiki OS. Time-division multiple access (TDMA) is used to make the wireless transmission more stable. Compared to the floating-point implementation, the fixed-point implementation has the same estimation accuracy at only about one-fifth of the computation time. The temperature estimation system can work under any work condition as long as there are currents through the machine. It can also be rebooted for estimation even when wireless transmission has collapsed or packages are missing.
APA, Harvard, Vancouver, ISO, and other styles
46

Zeleneva, І. Ya, Т. V. Golub, T. S. Diachuk, and А. Ye Didenko. "CONVEYOR MODEL AND IMPLEMENTATION OF THE REAL NUMBERS ADDER ON FPGA." ELECTRICAL AND COMPUTER SYSTEMS 33, no. 109 (2020): 21–31. http://dx.doi.org/10.15276/eltecs.33.109.2020.3.

Full text
Abstract:
The purpose of these studies is to develop an effective structure and internal functional blocks of a digital computing device – an adder, that performs addition and subtraction operations on floating- point numbers presented in IEEE Std 754TM-2008 format. To improve the characteristics of the adder, the circuit uses conveying, that is, division into levels, each of which performs a specific action on numbers. This allows you to perform addition / subtraction operations on several numbers at the same time, which increas- es the performance of calculations, and also makes the adder suitable for use in modern synchronous cir- cuits. Each block of the conveyor structure of the adder on FPGA is synthesized as a separate project of a digital functional unit, and thus, the overall task is divided into separate subtasks, which facilitates experi- mental testing and phased debugging of the entire device. Experimental studies were performed using EDA Quartus II. The developed circuit was modeled on FPGAs of the Stratix III and Cyclone III family. An ana- logue of the developed circuit was a functionally similar device from Altera. A comparative analysis is made and reasoned conclusions are drawn that the performance improvement is achieved due to the conveyor structure of the adder. Implementation of arithmetic over the floating-point numbers on programmable logic integrated cir- cuits, in particular on FPGA, has such advantages as flexibility of use and low production costs, and also provides the opportunity to solve problems for which there are no ready-made solutions in the form of stand- ard devices presented on the market. The developed adder has a wide scope, since most modern computing devices need to process floating-point numbers. The proposed conveyor model of the adder is quite simple to implement on the FPGA and can be an alternative to using built-in multipliers and processor cores in cases where the complex functionality of these devices is redundant for a specific task.
APA, Harvard, Vancouver, ISO, and other styles
47

Bělík, Pavel, HeeChan Kang, Andrew Walsh, and Emma Winegar. "On the Dynamics of Laguerre’s Iteration Method for Finding the nth Roots of Unity." International Journal of Computational Mathematics 2014 (November 26, 2014): 1–16. http://dx.doi.org/10.1155/2014/321585.

Full text
Abstract:
Previous analyses of Laguerre’s iteration method have provided results on the behavior of this popular method when applied to the polynomials pn(z)=zn-1, n∈N. In this paper, we summarize known analytical results and provide new results. In particular, we study symmetry properties of the Laguerre iteration function and clarify the dynamics of the method. We show analytically and demonstrate computationally that for each n≥5 the basin of attraction to the roots is a subset of an annulus that contains the unit circle and whose Lebesgue measure shrinks to zero as n→∞. We obtain a good estimate of the size of the bounding annulus. We show that the boundary of the basin of convergence exhibits fractal nature and quasi self-similarity. We also discuss the connectedness of the basin for large values of n. We also numerically find some short finite cycles on the boundary of the basin of convergence for n=5,...,8. Finally, we demonstrate that when using the floating point arithmetic and the general formulation of the method, convergence occurs even from starting values outside of the basin of convergence due to the loss of significance during the evaluation of the iteration function.
APA, Harvard, Vancouver, ISO, and other styles
48

Prabjot Kaur, Rajiv Ranjan, Raminder Preet Pal Singh, and Onkar Singh. "Double Precision Floating Point Arithmetic Unit Implementation- A Review." International Journal of Engineering Research and V4, no. 07 (2015). http://dx.doi.org/10.17577/ijertv4is070766.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Kaur, Prabhjot, Ankur Sharma, and Raminder Preet Pal Singh. "FPGA Implementation of Double Precision Floating Point Arithmetic Unit." International Journal Of Engineering And Computer Science, October 16, 2015. http://dx.doi.org/10.18535/ijecs/v4i9.76.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

"Improved Architectures for Fused Floating Point Add-Subtract Unit." International Journal of Science and Research (IJSR) 4, no. 12 (2015): 496–98. http://dx.doi.org/10.21275/v4i12.nov152018.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography