# Implementation of IEEE 754 Floating Point Multiplier using Partition Technique and Ling-Enhance Adder

Deepak Mishra<sup>1</sup>, Dr. Payal Suhane<sup>2</sup>

M. Tech. Scholar, Department of Electronics and Communication Engineering, Vidhyapeeth Institute of Science and Technology, Bhopal, M.P.<sup>1</sup>

Director, Vidhyapeeth Institute of Science and Technology, Bhopal, M.P.<sup>2</sup>

Abstract— Due to advancement of new technology in the field of VLSI and Embedded system, there is an increasing demand of high speed and low power consumption processor. Speed of processor greatly depends on its multiplier as well as adder performance. In spite of complexity involved in floating point arithmetic, its implementation is increasing day by day. Due to which high speed adder architecture become important. Several adder architecture designs have been developed to increase the efficiency of the adder. In this paper, we introduce an architecture that performs high speed IEEE 754 floating point multiplier using carry select adder (CSA). Here we are introduced two carry select based design. These designs are implementation Xilinx Vertex device family.

Keywords— IEEE754, Single Precision Floating Point (SP FP), Double Precision Floating Point (DP FP), Binary to Execess-1 Converter

## 1. Introduction

The real numbers represented in binary format are known as floating point numbers. Based on IEEE-754 standard, floating point formats are classified into binary and decimal interchange formats. Floating point multipliers are very important in dsp applications. This paper focuses on double precision normalized binary interchange format. Figure 1 shows the IEEE-754 double precision binary format representation. Sign (s) is represented with one bit, exponent (e) and fraction (m or mantissa) are represented with eleven and fifty two bits respectively. For a number is said to be a normalized number, it must consist of one' in the MSB of the significand and exponent is greater than zero and smaller than 1023. The real number is represented by equations (i) & (2).

$$Z = (-1^s) \times 2^{(E-Bias)} \times (1.M) \tag{1}$$

$$Value = (-1^{signbit}) \times 2^{(Exponent-1023)} \times (1.Mantissa)$$
 (2)

Biasing makes the values of exponents within an unsigned range suitable for high speed comparison.

| Sign Bit | Biased Exponent | Significand |  |  |
|----------|-----------------|-------------|--|--|
| 1-bit    | 8/11-bit        | 23/52-bit   |  |  |

Figure 1: IEEE 754 Single Precision and Double Precision Floating Point Format

IEEE 754 Standard Floating Point Multiplication Algorithm

A brief overview of floating point multiplication has been explained below [5-6].

- Both sign bits S<sub>1</sub>, S<sub>2</sub> are need to be Xoring together, then the result will be sign bit of the final product.
- Both the exponent bits E<sub>1</sub>, E<sub>2</sub> are added together, then subtract bias value from it. So, we get exponent field of the final product.
- Significand bits Sig<sub>1</sub> and Sig<sub>2</sub> of both the operands are multiply including their hidden bits.
- Normalize the product found in step 3 and change the exponent accordingly. After normalization, the leading "1 "will become the hidden bit.

Above algorithm of multiplication algorithm is shown in Figure 2.



Figure 2: IEEE754 SP FP and DP FP Multiplier Structure, NE: Normalized exponent, NS: Normalized Significand

#### Parallel Adder:-

Parallel adder can add all bits in parallel manner i.e. simultaneously hence increased the addition speed. In this adder multiple full adders are used to add the two corresponding bits of two binary numbers and carry bit of the previous adder. It produces sum bits and carry bit for the next stage adder. In this adder multiple carry produced by multiple adders are rippled, i.e. carry bit produced from an adder works as one of the input for the adder in its succeeding stage. Hence sometimes it is also known as Ripple Carry Adder (RCA). Generalized diagram of parallel adder is shown in figure 3.



Figure 3: Parallel Adder (n=7 for SPFP and n=10 for DPFP)

An n-bit parallel adder has one half adder and n-1full adders if the last carry bit required. But in 754 multiplier's exponent adder, last carry out does not required so we can use XOR Gate instead of using the last full adder. It not only reduces the area occupied by the circuit but also reduces the delay involved in calculation. For SPFP and DPFP multiplier's exponent adder, here we Simulate 8 bit and 11 bit parallel adders respectively as show in figure 4.



Figure 4: Modified Parallel Adder (n=7 for SPFP and n=10 for DPFP)

## Carry Skip Adder:-

This adder gives the advantage of less delay over Ripple carry adder. It uses the logic of carry skip, i.e. any desired carry can skip any number of adder stages. Here carry skip logic circuitry uses two gates namely "and gate" and "or gate". Due to this fact that carry need not to ripple through each stage. It gives improved delay parameter. It is also known as Carry bypass adder. Generalized figure of Carry Skip Adder is shown in figure 5.



Figure 5: Carry Skip Adder

## Carry Select Adder:-

Carry select adder uses multiplexer along with RCAs in which the carry is used as a select input to choose the correct output sum bits as well as carry bit. Due to this, it is called Carry select adder. In this adder two RCAs are used to calculate the sum bits simultaneously for the same bits assuming two different carry inputs i.e. '1' and '0'. It is the responsibility of multiplexer to choose correct output bits out of the two, once the correct carry input is known to it. Multiplexer delay is included in this adder. Generalized figure of Carry select adder is shown in figure 3.9. Adders are the basic building blocks of most of the ALUs (Arithmetic logic units) used in Digital signal processing and various other applications. Many types of adders are

available in today's scenario and many more are developing day by day. Half adder and Full adder are the two basic types of adders. Almost all other adders are made with the different arrangements of these two basic adders only. Half adder is used to add two bits and produce sum and carry bits whereas full adder can add three bits simultaneously and produces sum and carry bits.



Figure 6: Carry Select Adder

#### 2. PROPOSED DESIGN

In IEEE754 standard floating point representation, 8 bit Exponent field in single precision floating point (SP FP) representation and 11 bit in double precision floating point (DP FP) representation are need to add with another 8 bit exponent and 11 bit exponent respectively, in order to multiply floating point numbers represented in IEEE 754 standard as explained earlier. Ragini et al. [10] has used parallel adder for adding exponent bits in floating point multiplication algorithm. We proposed the use of 8-bit modified CSA with dual RCA and 8-bit modified CSA with RCA and BEC for adding the exponent bits. We have found the improved area of 8-bit modified CSA with dual RCA and BEC over the 8-bit modified CSA with dual RCA.

## o Sign bit calculation

To calculate the sign bit of the resultant product for SP FP and DP FP multiplier, the same strategy will work. We just need to XOR together the sign bits of both the operands. If the resultant bit is '1', then the final product will be a negative number. If the resultant bit is '0', then the final product will be a positive number.

#### Exponent bit calculation

Add the exponent bits of both the operands together, and then the bias value (127 for SPFP and 1023 for DPFP) is subtracted from the result of addition. This result may not be the exponent bits of the final product. After the significand multiplication, normalization has to be done for it. According to the normalized value, exponents need to be adjusted. The adjusted exponent will be the exponent bits of the final product.

# o Significand bit calculation

Significand bits including the one hidden bit are need to be multiply, but the problem is the length of the operands. Number of bits of the operand will become 24 bits in case of SP FP representation and it will be 53 bits in case of DP FP representation, which will result the 48 bits and 106 bits product value respectively. In this paper we use the technique of break up the operands into different groups then multiply them. We get many product terms, add them together carefully by shifting them according to which part of one operand is multiplied by which part of the other operand. We have decomposed the significand bits of both the operands ain four groups. Multiply each group of one operand by each group of second operand. We get

16 product terms. Then we add all of them together very carefully by shifting the term to the left according to which groups of the operands are involved in the product term.

# Partition Multiplier:-

```
Algorithm for partition method
t1: in STD LOGIC VECTOR (7 downto 0);
t2: in STD LOGIC VECTOR (7 downto 0);
t3: out STD_LOGIC_VECTOR (15 downto 0));
h1 \le t1(3 \text{ downto } 0);
h2 \le t1(7 \text{ downto } 4);
h3 \le t2(3 \text{ downto } 0);
h4 \le t2(7 \text{ downto } 4);
su1 <= h1*h3;
su2<=h1*h4:
su3 <= h2*h3;
su4 <= h2*h4;
ad1<=("00000000" & su1);
ad2<=("0000" & su2 & "0000");
ad3<=("0000" & su3 & "0000");
ad4<=(su4 & "00000000");
t3 < = ad1 + ad2 + ad3 + ad4;
```

## 3. SIMULATION RESULT

VHDL is exceptionally versatile, inferable from its engineering, permitting fashioners, electronic plan mechanization organizations and the semiconductor business to try different things with new dialect ideas to guarantee great plan costs and information interoperability.

Having planned the different DSP arrangements, we currently continue to the product union of this plans utilizing VHDL. In the accompanying segments, we have set up the ideal channel yields utilizing separate VHDL codes for each plan. The codes of the plans have been displayed in the Appendix. The construction so was examined effectively executed or blended on XILINX ISE plan suite 6.2i.

# **Number 4-input LUTs**

LUT stands for look up table that reduces the complex mathematics calculations and provide the reduced processing time. Look up table uses so many complex applications such as signal processing, image processing, device modeling and other digital processing etc.

# Number of Slices

If the devices are connected in parallel form then it is called array of the devices. Generally look up table are comprised with number of slices. If the numbers of slices are increased then area will be increased. Numbers of slices are used less as possible as for better result and speed.

# **Propagation Delay**

Generally, the ideal condition of the result is the output of the digital circuit should from level 0 to level 1 or level 1 to level 0 in zero time. But in practice, it takes finite time to switch output levels. The time required to change output levels is called output switching time. It defines separately for switching from level 0 to level 1 and level 1 to level 0.

The spread postponement of the gadget is essentially the time interim between the utilization of an information beat and the event of the subsequent yield beat. The proliferation deferral is an essential normal for rationale circuits since it restricts the velocity at which they can work. The shorter the spread defer, the higher the rate of the circuit and the other way around.

## Number of IOBs

Input output buffers are related to the fan in and fan out of the circuit. Number of gates is dependent on numbers of IOBs. So, for low propagation delay IOBs must be less.

SPFP multiplier is a combinational logic circuit with E1, E2, M1, M2, S1, S2 inputs and E3, M3, S3 outputs depends on the requirement. Figure 7 shows the view technology schematic of SPFP multiplier.



Figure 7: View Technology Schematic of SPFP Multiplier



Figure 8: RTL View of SPFP Multiplier



Figure 9: Output Waveform of SPFP Multiplier

```
Device utilization summary:
Selected Device : 2vp2fg256-7
Number of Slices:
                                    121 out of
                                                          84
 Number of 4 input LUTs:
                                    226 out of
                                                 2816
                                                          81
 Number of bonded IOBs:
                                     96 out of
                                                  140
                                                         68%
 Number of MULTISXISa:
                                     16 out of
                                                        133% (*)
Timing Summary:
Speed Grade: -7
   Minimum period: No path found
   Minimum input arrival time before clock: No path found
   Maximum output required time after clock: No path found
   Maximum combinational path delay: 33.970ns
```

Figure 10: Device Utilization summary of SPFP Multiplier

DPFP multiplier is a combinational logic circuit with E1, E2, M1, M2, S1, S2 inputs and E3, M3, S3 outputs depends on the requirement. Figure 11 shows the view technology schematic of DPFP multiplier.



Figure 11: View Technology Schematic of DPFP Multiplier



Figure 12: RTL View of DPFP Multiplier



Figure 13: Output Waveform of DPFP Multiplier

| Device utilization summary:             |            |         |        |       |     |
|-----------------------------------------|------------|---------|--------|-------|-----|
| *************************************** |            |         |        |       |     |
| Selected Device : 2vp2fg256-7           |            |         |        |       |     |
| Number of Slices:                       | 377        | out of  | 1408   | 26%   |     |
| Number of 4 input LUTs:                 | 682        | out of  | 2816   | 24%   |     |
| Number of bonded IOBs:                  | 192        | out of  | 140    | 137%  | (*) |
| Number of MULT18X18s:                   | 16         | out of  | 12     | 1334  | (*) |
| Timing Summary:                         |            |         |        |       |     |
| Speed Grade: -7                         |            |         |        |       |     |
| Minimum period: No path                 | found      |         |        |       |     |
| Minimum input arrival tim               | me before  | clock:  | No pat | h fou | nd  |
| Maximum output required                 | time after | clock:  | No pa  | th fo | und |
| Maximum combinational pa                | th delay:  | 70.700n | 3      |       |     |

Figure 14: Device Utilization summary of Double Precision Floating Point Multiplier

| Table | T: | Com   | parison | Resul  | í |
|-------|----|-------|---------|--------|---|
| Lanc  |    | COIII | Julioun | ILCour | ı |

| Parameter                              | Previous<br>SPFP<br>Algorithm | Implemented<br>SPFP Multiplier<br>using Partition<br>Method | Previous<br>DPFP<br>Algorithm | Implemented DPFP Multiplier using Partition Method |
|----------------------------------------|-------------------------------|-------------------------------------------------------------|-------------------------------|----------------------------------------------------|
| Number of<br>Slice LUTs                | 705                           | 226                                                         | 5153                          | 682                                                |
| Number of<br>Input Output<br>Bonded    | 96                            | 96                                                          | 192                           | 192                                                |
| Maximum<br>Combinational<br>Path Delay | 44.823 ns                     | 33.97 ns                                                    | 83.169 ns                     | 70.70 ns                                           |

## 4. CONCLUSION

IEEE754 standardize two basic formats for representing floating point numbers namely, single precision floating point and double precision floating point. Floating point arithmetic has vast

applications in many areas like robotics and DSP. Delay provided and area required by hardware are the two key factors which are need to be consider Here we present single precision floating point multiplier by using two different adders namely modified CSA with dual RCA and modified CSA with RCA and BEC.

Among all two adders, modified CSA with RCA and BEC is the least amount of Maximum combinational path delay (MCDP). Also, it takes least number of slices i.e. occupy least area among all two adders.

## REFRENCES

- S. Ross Thompson and James E. Stine, "A Ling-Enhanced Adder for IEEE-compliant Floating-Point Multiplication", IEEE 2020.
- [2] Paldurai.K and Dr.K.Hariharan "FPGA Implementation of Delay Optimized Single Precision Floating point Multiplier", 2015 International Conference on Advanced Computing and Communication Systems (ICACCS- 2015), Jan. 05-07-2015, Coimbatore, INDIA.
- [3] Irine Padma B.T and Suchitra. K, "Pipelined Floating Point Multiplier Based On Vedic Multiplication Technique," International Journal of Innovative Research in Science, Engineering and Technology (IJIRSET), ISSN: 2347-6710, Volume-3, Special Issue -5, July 2014.
- [4] R. Sai Siva Teja and A. Madhusudhan,"FPGA Implementation of Low- Area Floating Point Multiplier Using Vedic Mathematics", International Journal of Emerging Technology and Advanced Engineering (IJETAE), Volume-3, Issue -12, December 2013, pp.362-366.
- [5] Priyanka Koneru, Tinnanti Sreenivasu, and Addanki Purna Ramesh, "Asynchronous Single Precision Floating Point Multiplier Using Verilog HDL," International Journal of Advanced Research in Electronics and Communication Engineering (IJARECE), ISSN:2278-909X, Volume-2, Issue -11, November 2014, pp.885-887.
- [6] I. V. Vaibhav, K. V. Saicharan, B. Sravanthi and D. Srinivasulu, "VHDL Implementation of Floating Point Multiplier using Vedic Mathematics", International Conference on Electrical, Electronics and Communications (ICEEC), ISBN-978-93-81693-66-03, June 2014 pp.110-115
- [7] Ms. Meenu S.Ravi and Mr. Ajit Saraf, "Analysis and study of different multipliers to design floating point MAC units for digital signal processing applications", International Journal of Research in Advent Technology, (IJRAT), ISSN:2321-9637, Volume-2, Issue-3, March 2014, pp.264-267.
- [8] D. Monniaux, "The pitfalls of verifying floating-point computations", ACM Transaction Programming Language System, Vol. 30, No. 3, pp. 1-12, May 2008.
- [9] Soumya Havaldar, K S Gurumurthy, "Design of Vedic IEEE 754 Floating Point Multiplier", IEEE International Conference on Recent Trends in Electronics Information Communication Technology, May 20-21, 2016, India.
- [10] Ragini Parte and Jitendra Jain, "Analysis of Effects of using Exponent Adders in IEEE- 754 Multiplier by VHDL", International Conference on Circuit, Power and Computing Technologies (ICCPCT), 2015 IEEE.
- [11] Ross Thompson and James E. Stine, "An IEEE 754 Double-Precision Floating-Point Multiplier for Denormalized and Normalized Floating-Point Numbers", International conference on IEEE 2015.
- [12] Purna Ramesh Addanki, Venkata Nagaratna Tilak Alapati and Mallikarjuna Prasad Avana, "An FPGA based High Speed IEEE-754 double precision floating point Adder/Subtractor and Multiplier using Verilog", in International Journal of Advance Science and Technology, vol. 52, March 2013.

- [13] Shashank Suresh, Spiridon F. Beldianu and Sotirios G. Ziavras "FPGA and ASIC square root designs for high performance and power efficiency", in 24th IEEE International conference on Application specific-systems, architecture and processors, June 2013.
- [14] M. K. Jaiswal and R. C. C. Cheung, "High Performance FPGA Implementation of Double Precision Floating Point Adder/Subtractor", in International Journal of Hybrid Information Technology, Vol. 4, No. 4, October 2011.