# Area-Delay Efficient LPF and HPF designing in 2-D DWT using CSD Technique and BK Adder

# <sup>1</sup>Gaurav Kumar Mishra and <sup>2</sup>Prof. Amrita Pahadia

M. Tech. Scholar, Department of Electronics and Communication, RKDF College of Engineering, Bhabha University, Bhopal<sup>1</sup> Guide, Department of Electronics and Communication, RKDF College of Engineering, Bhabha University, Bhopal<sup>2</sup>

#### Abstract—

The DWT is expressed in a generalized form know as discrete wavelet transform which analyzes both the low and high sub bands with equal priority at every decomposition level. The DWT is a mathematical technique that provides a new method for signal processing. Due to various useful features like adaptive time-frequency window, lower aliasing distortion and efficient computational complexity, it is widely used in many signal and image processing applications. 2-D DWT is widely used in image and video compression. But flipping scheme introduces some design complexities in selected DWT structures. So in our proposed work, we have implemented BK adder and canonical signed digit (CSD) technique that provides multiplier-less implementation and also will work for every bit. The proposed CSD and BK adder based 2-D DWT algorithm shows good performance as compared to previous algorithm. The proposed architecture for DWT implementation reduces the chip area, less minimizes the computation time and also maximum combinational path delay.

Keywords: - 2-D DWT, CSD, Low-pass Sub-band (LPSB), High-pass Sub-band (HPSB), VHDL Simulation

### I. INTRODUCTION

The revolution in the technology is making computer advanced day by day and so does the use of digital images. Along with this comes the serious issue of storing and transferring the large volume of data representing the images when uncompressed. Multimedia (graphics, audio and video) data requires quite a large storage capacity and transmission bandwidth. Despite of rapid growth in mass storage density, increased speed of the processor and the performance of the digital communication systems, the demand for data storage capacity and data transmission bandwidth keep on increasing the capabilities of on hand technologies [1]. Discrete Wavelet Transform (DWT) has found many applications in digital signal processing, due to the efficient computation and the sufficient properties for non-stationary signal analysis [2]. Nowadays it has become one of the most used techniques for image compression. The image is actually a kind of redundant data i.e. it contains the same information from certain perspective of view. By using data compression techniques, it is possible to remove some of the redundant information contained in images. Image compression minimizes the size in bytes of a graphics file without degrading the quality of the image to an unacceptable level. The reduction in file size allows more images to be stored in a certain amount of disk or memory space [3]-[5]. In this work, we have implemented the compression algorithm using Haar wavelet, in VHDL programming language and in MATLAB. For that we have taken 8X8 matrix in both, with an objective to reduce complexity of understanding of the basic concepts. This new arrival has shown a great promise toward Data/image compression. The latest JPEG2000 standard is also based on this transform coding/algorithm [6]-[8]. In this paper, only Grey-scale image are considered. However, wavelet transform and compression techniques are equally applicable to the color images with three color components. We have to apply this transform to each of this color component independently and have to treat the result as an array of vectored valued wavelet coefficients.

## II. DISCRETE WAVELET TRANSFORM

The basic idea of 2-D architecture is similar to 1-D architecture. A 2-D DWT can be seen as a 1-D wavelet transform along the rows and then a 1-D wavelet transform along the columns, as illustrated in Figure 5. The 2-D DWT operates in a straightforward manner by inserting array transposition between the two 1-D DWT. The rows of the array are processed first with only one level of decomposition. This essentially divides the array into two vertical halves, with the first half storing the average coefficients, while the second vertical half stores the detail coefficients. This process is repeated again with the columns, resulting in four subbands (see figure 1) within the array defined by filter output.



The LL sub-band represents an approximation of the original image, the LL1 sub-band can be considered as a 2:1 subsampled (both horizontally and vertically) version of the original image.

The other three sub-bands HL1, LH1, and HH1 contain higher frequency detail information (mostly local discontinuities in the edges of the image). This process is repeated for as many levels of decomposition as are desired. The JPEG2000 standard specifies five levels of decomposition [1], although three are usually considered acceptable in hardware.

#### III. PROPOSED ARCHITECTURE

In a digital system, particular attention is given to the performance of the system and cost. Effective performance is usually achieved at a higher cost. However, with a moderate increase in the hardware, better performance can be obtained. In the course of a computation, addition and multiplication are the fundamental operations that are performed frequently. The speed with which these operations are performed has a great impact on the overall performance of the digital system. Since the beginning of the digital computers, many fast algorithms for the basic arithmetic operations have been developed and implemented [12, 13].

There has been continuous research and development towards the newer algorithms. The main reason for the emerging algorithms is the rapid change in the technology used to implement these arithmetic operations. Besides the dependence on the technology used to implement the algorithm, it is the unique feature of the algorithm that affects the performance of the arithmetic operator [14].

In this stream graph, the double information is connected to the serial in serial out register. All whole numbers connected to the twofold frame in DWT design. Parallel information is relying upon the word length i.e. assume word length of the twofold info (3 down to 0) implies the information go is 0 to 15.



Figure 2: Block Diagram of 9/7 Wavelet Coefficient based Discrete Wavelet Transform

If takes the LPS coefficients  $h_0$ ,  $h_1$ ,  $h_2$ ,  $h_3$ , and  $h_4$  multiply by  $u_1$ ,  $u_2$ ,  $u_3$ ,  $u_4$  and  $u_5$  then multiplier-less 1-D DWT LPS output is

$$Y_{LPS} = \begin{bmatrix} h_0 & h_1 & h_2 & h_3 & h_4 \end{bmatrix} \bullet \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4 \\ u_5 \end{bmatrix}$$

Where,

$$u_1 = X(n) + X(n-8)$$

$$u_2 = X(n-1) + X(n-7)$$

$$u_3 = X(n-2) + X(n-6)$$

$$u_4 = X(n-3) + X(n-5)$$

$$u_5 = X(n-4)$$

$$Y_{LPS} = \begin{bmatrix} 77 & 34 & -10 & -2 & 3 \end{bmatrix} \bullet \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4 \\ u_5 \end{bmatrix}$$

So,

$$Y_{LPS} = \begin{bmatrix} 01001101 & 00100010 & 11110110 & 11111110 & 00000011 \end{bmatrix} \bullet \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4 \\ u_5 \end{bmatrix}$$

All the LPS coefficient arranges down to up is below:

$$Y_{H} = \begin{bmatrix} 1 & 0 & 0 & 0 & 1 \\ 0 & 1 & 1 & 1 & 1 \\ 1 & 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 1 & 0 \\ 0 & 1 & 1 & 1 & 0 \\ 1 & 0 & 1 & 1 & 0 \\ 0 & 0 & 1 & 1 & 0 \end{bmatrix} \bullet \begin{bmatrix} u_{1} \\ u_{2} \\ u_{3} \\ u_{4} \\ u_{5} \end{bmatrix}$$

All rows pass through look up table and replace LPS coefficient to input

$$Y_{H} = \begin{bmatrix} 1 & 0 & 0 & 0 & 1 \\ 0 & 1 & 1 & 1 & 1 \\ 1 & 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 1 & 0 \\ 0 & 1 & 1 & 1 & 0 \\ 1 & 0 & 1 & 1 & 0 \\ 0 & 0 & 1 & 1 & 0 \end{bmatrix} \bullet \begin{bmatrix} u_{1} \\ u_{2} \\ u_{3} \\ u_{4} \\ u_{5} \end{bmatrix} = \begin{bmatrix} u_{1} + u_{5} \\ u_{2} + u_{3} + u_{4} + u_{5} \\ u_{1} + u_{3} + u_{4} \\ u_{3} + u_{4} \\ u_{1} + u_{3} + u_{4} \\ u_{1} + u_{3} + u_{4} \\ u_{1} + u_{3} + u_{4} \\ u_{3} + u_{4} \end{bmatrix}$$

#### IV. BRENT KUNG ADDER

BK adders are efficient alternatives if the performance is important than the cost of implementation. However, the implementation cost can be reduced by the regularity of the design. The Ling adders are the variation of the BK adders. They achieve a significant hardware saving. The associated delays can be reduced by using a simple principle of the group generated carry signal. The new carry recurrence reduces the logic depth for the carry computation in the BK structures. Many recent works on the parallel prefix adder have presented the creation of Ling recurrence as the prefix computation to obtain a high-performance adder. The Ling adders in the prefix tree lead to the speed improvement than the existing high-performance fast adders.



Figure 3: Block Diagram of Brent Kung Adder

# V. SIMULATION RESULT

Synthesis result of the 2-D DWT using MDA and BKA Technique is shown in this section.

VHDL test bench, and comparison of 2-D DWT architecture for existing architecture. 2-D DWT architecture is consist of SR, different bits of adder and wavelet coefficient. Firstly shows the VHDL test-bench of 9/7 wavelet coefficient shows figure 4. x[3:0] is input and y1 to y8 is in out of SR.



Figure 4: VTS for 4-bit SR

VHDL text-bench in 4-bit BKA shows figure 5. a [3:0] and b [3:0] is input and s [4:0] and yl [11:0] is output of BKA.



Figure 5: VTS for 4-bit BK\_adder

VHDL text-bench in 1-D DWT shows figure 6. e [3:0] is input and yh [11:0] and yl [11:0] is output of LPF and HPF 1-D DWT.



Figure 6: VTS for 1-D DWT

This figure 7 shows the RTL view of second level DWT. It has all the components of 2-D DWT. It contains all the shift registers, D-flip flops, BK adder. This RTL schematic depends on the view technology.



Figure 7: VTS for 2-D DWT



Figure 8: RTL for 2-D DWT

This figure 9 shows the waveform of the second level DWT. Here the input is given as '0011' and the output finally comes for both the filters. 'yh' is '1111101000000000000' for high pass filter output and 'yl' is '000001100000000000' for low pass filter output.



Figure 8: VHDL Test-bench in 2-D DWT

Table I: Synthesis Report for 9/7 2-D DWT using MDA and BKA Technique

| Parameter                      | Used      | Available   |  |
|--------------------------------|-----------|-------------|--|
| Selected Device 4vfx12sf363-12 |           | <u>.</u>    |  |
| Number of Slices               | 514       | 5472        |  |
| Number of Slice Flip Flops     | 224       | 10944       |  |
| 4-inputs LUTs                  | 899       | 10944       |  |
| Number of IOBs                 | 44        | 240         |  |
| Minimum Period                 | 8.741 ns  |             |  |
| Maximum Frequency              | 114.409 M | 114.409 MHz |  |
| MCPD                           | 17.905 ns |             |  |

#### VI. CONCLUSION

For this MDA technique is used which provides approach for multiplier less implementation. It contains adder, shift registers and free of multiplier.

Finally we have designed the 1-D and 2-D DWT using BK adder and MDA technique which provide better efficiency and shows better results than the previous design.

DWT has been an important technique of multimedia applications. This is not only the key algorithm of signal processing, but has also led to revolutions in image and video coding algorithms. There are many DWT architectures of flipping type, folded type and pipeline architecture for signal transform. Each structure has its own advantages and disadvantages. However, an efficient architecture design of DWT in JPEG 2000 is an important area of research to explore.

## REFERENCES

- [1] Jhilam Jana, Sayan Tripathi, Ritesh Sur Chowdhury, Akash Bhattacharya and Jaydeb Bhaumik, "An Area Efficient VLSI Architecture for 1-D and 2-D Discrete Wavelet Transform (DWT) and Inverse Discrete Wavelet Transform (IDWT)", Devices for Integrated Circuit, IEEE 2021.
- [2] Zhang, W., Wu, C., Zhang, P. and Liu, Y., "An Internal Folded Hardware-Efficient Architecture for Lifting-Based Multi-Level 2-D 9/7 DWT", 2019, Applied Sciences, 9(21), p.4635.
- [3] Samit Kumar Dubey, Arvind Kumar Kourav and Shilpi Sharma, "High Speed 2-D Discrete Wavelet Transform using Distributed Arithmetic and Kogge Stone Adder Technique", International Conference on Communication and Signal Processing, April 6-8, 2017, India.
- [4] Rakesh Biswas, Siddarth Reddy Malreddy and Swapna Banerjee, "A High Precision-Low Area Unified Architecture for Lossy and Lossless 3D Multi-Level Discrete Wavelet Transform", IEEE Transactions on Circuits and Systems for Video Technology, Vol. 45, No. 5, pp. 01-11, May 2017.
- [5] Mamatha I, Shikha Tripathi and Sudarshan TSB, "Pipelined Architecture for Filter Bank based 1-D DWT", International Conference on Signal Processing and Integrated Networks (SPIN), pp. 47-52, May 2016.
- [6] Maurizio Martin and Guido Masera, Massimo Ruo Roch and Gianluca Piccinini, "Result-Biased Distributed-Arithmetic-Based Filter Architectures for Approximately Computing the

- DWT", IEEE Transactions on Circuits and Systems—I: Regular Papers, Vol. 62, No. 8, pp. 2103-2113, August 2015.
- [7] Basant Kumar Mohanty, Pramod Kumar Meher, "Memory-Efficient High- Speed Convolution-based Generic Structure for Multilevel 2-D DWT", IEEE transactions on Circuits, Systems for Video Technology, Vol. 23, No. 2, pp. 353-363, February 2013.
- [8] Basant K. Mohanty, Anurag Mahajan, Pramod K. Meher, "Area- and Power-Efficient Architecture for High-Throughput Implementation of Lifting 2-DDWT", IEEE Transactions on Circuits and Systems-II: Express Briefs, Vol.59, No.7, pp. 434-438, July 2012.
- [9] Chengjun Zhang, Chunyan Wang, M. Omair Ahmad, "A Pipeline VLSI Architecture for High-Speed Computation of the 1-D Discrete Wavelet Transform", IEEE transactions on Circuits and Systems-I; Regular Papers, Vol.57, No.10, pp. 2729-2740, October 2010.
- [10] Zhang, Chengjun, Chunyan Wang, and M. Omair Ahmad, "A pipeline VLSI architecture for high-speed computation of the 1-D discrete wavelet transform", IEEE Transactions on Circuits and Systems I: Regular Papers, Vol.57, No. 10,pp: pp. 2729-2740, October 2010.
- [11] S. M. M. Rahman, M. O. Ahmad, and M. N. S. Swamy, "A New Statistical Detector for DWT-Based Additive Image Watermarking using the Gauss-Hermit Expansion," IEEE Transactions Image Processing, Vol. 18, No. 8, pp. 1782–1796, August 2009.
- [12] P. K. Meher, B. K. Mohanty and J. C. Patra, "Hardware-Efficient Systolic-Like Modula Design for Two-Dimensional Discrete Wavelet Transform", IEEE Transactions on Circuits and Systems—Ii: Express Briefs, Vol. 55, No. 2, pp. 1021-1029, February 2008.
- [13] Chao Cheng, Keshab K. Parhi, "High-Speed VLSI Implementation of 2-D Discrete Wavelet Transform", IEEE Transactions on Signal Processing, Vol.56, No.1, pp. 393-403, January 2008.
- [14] C. C. Cheng, C.-T. Huang, C.-Y. Cheng, C.-Jr.Lian and L.-G. Chen, "On-chip Memory Optimization scheme for VLSI Implementation of Line-Based 2-D Discrete Wavelet Transform," IEEE Transactions on circuit and System for Video Technology, vol.17,no.7, pp. 814-822, July 2007.
- [15] M. Martina, and G. Masera, "Multiplier less, folded 9/7-5/3 wavelet VLSI Architecture, "IEEE Transactions on Circuits and System, Express Brief Vol. 54, No. 9, pp. 770-774, September 2007.