# VLSI Architecture for 1-D and 2-D DWT using Canonic Signed Digit Technique # Shivam Singh<sup>1</sup>, Prof. Satyarth Tiwari<sup>2</sup> M. Tech. Scholar, Department of Electronics and Communication, Bhabha Engineering Research Institute, Bhopal Guide, Department of Electronics and Communication, Bhabha Engineering Research Institute, Bhopal 2 Abstract— Several architectures have been suggested for efficient VLSI implementation of 2-D DWT for real-time applications. It is found that multipliers consume more chip area and increases complexity of the DWT architecture. Multiplier-less hardware implementation approach provides a solution to reduce chip area, lower hardware-complexity and higher throughput computation of the DWT architecture. Based on the proposed 1-D & 2-D DWT using canonic signed digit (CSD) are presented in this paper for area-delay efficient realization of multilevel 2-D DWT. We demonstrate that CSD is a very efficient architecture with adders as the main component and free of ROM, multiplication, and The simulation was performed using subtraction. XILINX 14.5i and calculate simulated parameter i.e. number of slice, look up table and delay. Keywords: - 1-D DWT, 2-D DWT, CSD, Xilinx Software #### I. INTRODUCTION The DWT is computationally intensive and most of its application demand real-time processing. One way of achieving high speed performance is to use fast computational algorithm in a general purpose computers. Another way is to exploit the parallelism inherent in the computation for concurrent processing by a set of parallel processor. But, it is not cost effective to use a general purpose computer for a specific application. Also, general purpose computer used for their implementation required more space, large power and more computation time. With the development of very large scale integration (VLSI) technology it facilitates to digital signal processing (DSP) system designer to design a high performance, low cost and low power system in a single chip. The characteristic of VLSI system are that they offer greater potential for large amount of concurrency and offer an enormous amount of computing power within a small area [1, 2]. The computation is very cheap as the hardware is not an obstacle VLSI system. But, the non-localized communication is not only expensive but demands high power dissipation. Thus, a high degree of parallelism and a nearest neighbor communication are crucial for realization of high performance VLSI system [3]. Keeping this in view, high performance application specific VLSI systems are rapidly evolving in recent years. The special purpose VLSI systems maximize processing concurrency by parallel / pipeline processing and provides cost effective alternative for real-time application. Therefore, 2-D DWT is currently implemented in a VLSI system to meet the temporal requirement of real-time application. Keeping this fact in view, several design schemes have been suggested in the last two decades for efficient implementation of 2-D DWT in a VLSI system. Researchers have adopted different algorithm formulation, mapping scheme, and architectural design methods to reduce the computational time, arithmetic complexity or memory complexity of 2-D DWT structures. However, the area-delay performance of the existing structures changes marginally. This is mainly due to the memory complexity, which forms a major hardware component of folded 2-D DWT structure [4]. Now a day, most of the information in Computer processing is handled online. This online information is either graphical or pictorial in nature, and the storage and communication requirements are immense [5]. Hence method of compressing the data prior to storage and transmission are of significant practical and commercial interest. Image compression means reducing the redundant amount of data required to represent a digital image. The Digital image compression in mathematical form can be defined as transformation of a 2-D pixel array by image, into a statistically uncorrelated data set. The transformation is applied on image prior to storage and transmission of Digital Image Data. The compressed image is reconstructed into original image by the process of Decompression [6]. Decompressed image can be an original image or approximation of it. Image compression is the technology for handling the increased spatial resolutions of today's imaging sensors and evolving broadcast television standards. Image compression plays an important role in many important and diverse applications including tele video conferencing, remote sensing, document and medical imaging, facsimile transmission and the control of remotely piloted vehicles in military, space, and hazardous waste management applications [7]. The application list is ever expanding on the efficient manipulation storage and transmission of different types of digital image such as binary images, gray-scale images, and color images etc. ## II. DISCRETE WAVELET TRANSFORM The resolution analysis limit and time-gradation district properties of the DWT has set up it as stunning resource for different applications, for instance, signal examination, picture pressure and numerical assessment, as communicated by Mallat. It is driven different exploration social occasions to make counts and gear models to execute the DWT. Figure 1: 3-level DWT of an image In the standard convolution method for DWT, several Finite Impulse Response (FIR) aqueducts are applied in equal, to decide high\_pass and low\_pass aqueduct coefficients. Mallat's monolith estimation can be recycled to addresses the wavelet coefficients of an illustration in a couple of spatial headings. The plans are by and large crumbled, and can be completely requested into consecutive and equal structures as discussed [7]. The designing discussed executes aqueduct bank structure capably, using digit consecutive pipelining. This building structures the explanation behind the gear execution of sub band rot, using the conversational DWT for JPEG 2000. An accustomed plan in whichever DWT break down the information picture is showed up underneath in Fig. 1. Each crumbling level showed up in Fig. 2 incorporates two stages arrange performs level isolating, and stage 2 operate vertical permeate. In the primary level rot, the breadths of the data picture are N by N size and dissociate four standby federate L\_L, H\_H, L\_H and H\_L. L is imitate by Low and H is imitate by high frequency. Four standby federate are N/2 by N/2 size. L\_L standby federate more dossier compared to other standby federate by virtue of L standby federate is boilerplate value of pixel and H standby federate is difference value of pixel. H\_H standby federate is fewer dossiers. Derived the all standby federate is below: $$x_{LL}^{J}(n_1, n_2) = \sum_{i_1=0}^{K-1} \sum_{i_2=0}^{K-1} h(i_1)h(i_2)x_{LL}^{J-1}(2n_1 - i_1)(2n_2 - i_2)$$ $$x_{LH}^{J}(n_{1},n_{2}) = \sum_{i_{1}=0}^{K-1} \sum_{i_{2}=0}^{K-1} h(i_{1})g(i_{2})x_{LL}^{J-1}(2n_{1}-i_{1})(2n_{2}-i_{2})$$ $$x_{HL}^{J}(n_{1},n_{2}) = \sum_{i_{1}=0}^{K-1} \sum_{i_{2}=0}^{K-1} g(i_{1})h(i_{2})x_{LL}^{J-1}(2n_{1}-i_{1})(2n_{2}-i_{2})$$ $$x_{HH}^{J}(n_{1},n_{2}) = \sum_{i_{1}=0}^{K-1} \sum_{i_{2}=0}^{K-1} g(i_{1})g(i_{2})x_{LL}^{J-1}(2n_{1}-i_{1})(2n_{2}-i_{2})$$ Position of $X_{LL}$ is 2-D data picture, J is boilerplate by decompose, h & g is boilerplate by low and high pass distill coefficient. Explanatory and iterative reproduction calculations are the two philosophies in PC tomography for the examination of picture quality. Explanatory model is one in which it endeavors to locate the immediate answer for the picture remaking from the obscure projections. Investigative calculation is constrained to fragmented projections and scanty in see. In iterative reproduction, Image gauge is dynamically refreshed towards an improved arrangement. To help the iterative picture reproduction calculation, numerous methodologies have been introduced in writing. Among these techniques, the projection based strategy is an proficient and a twisting less method. # III. PROPOSED METHODOLOGY In the DWT, the bi-balanced wavelets are realized by using the lifting strategy. The spatial territory and lifting system is used to create the lifting strategy. In the lifting plan, three guideline steps are generally played out that are, split, anticipate and update. The information picture tests $\mathbf{x}(\mathbf{n})$ are apportioned concerning the odd and even models in the split square. The channel is required for the odd and even guides to keep from the bothersome hailing. Lifting plan is performed by based kind of the channel. Scaling step is used to find the low pass sub-gatherings of the odd and even tests. Channel utilization is changed into the growth cross sections in the lifting plan. Figure 2: Multiplier Based 9/7 Coefficient based 1-D DWT The image pressure is performed successfully by using the lifting plan, and the gear uses are significantly diminished by using the channels. Inner product computation can be expressed by CSD. The DWT formulation using convolution scheme given in can be expressed by inner product, where the 1-D DWT formulation given in (1) – (2) cannot be expressed by inner product. Although, convolution DWT demands more arithmetic resources than DWT, convolution DWT is considered to take the advantages of CSD-based design. CSD formulation of convolution-based DWT using 5/3 biorthogonal filter is presented here. According to (1) and (2), the 5/3 wavelet filter computation in convolution form is expressed as $$Y_L = \sum_{i=0}^4 h(i) X_n(i)$$ $$Y_H = \sum_{i=0}^{2} g(i) X_n(i)$$ The low-pass filter coefficients $\{h(i)\}$ and high-pass filter coefficients $\{g(i)\}$ of the 5/3 wavelet filter coefficient. $Y_H$ is the high pass filter output and $Y_L$ is the low pass filter output. Figure 3: Block Diagram of 5/3 1-D DWT using CSD Technique Where B: Buffer D: Delay flip flop A<sub>1</sub>: First output of the LUT A2: Second output of the LUT and add '0' A<sub>n</sub>: N output of the LUT and add (N-1) zero bit ### IV. SIMULATION RESULT The simulation was performed using XILINX 14.5i and ModelSim simulator. VHDL Descriptions consist of primary design units and secondary design units. The primary design units are the Entity and the Package. The secondary design units are the Architecture and the Package Body. Secondary design units are always related to a primary design unit. Libraries are collections of primary and secondary design units. 9/7 1-D DWT are represents primary and secondary design in fig. 4 & fig. 5. Figure 4: Primary Design of 1\_D DWT including 9/7 coefficient Figure 5: Secondary design of 1\_D DWT including 9/7 coefficient Device utilization summary: Selected Device : 6vcx75tff484-2 Slice Logic Utilization: Number of Slice Registers: Number of Slice LUTs: 212 out of 46560 0% 46560 Number used as Logic: out of Number used as Memory: Number used as SRL: Slice Logic Distribution: Number of LUT Flip Flop pairs used: 231 Number with an unused Flip Flop: Number with an unused LUT: 19 out of 231 Number of fully used LUT-FF pairs: 6% out of Number of unique control sets: Number of IOs: Number of bonded IOBs: 12% out of Timing Summary: Speed Grade: -2 Minimum period: 1.380ns (Maximum Frequency: 724.638MHz) Minimum input arrival time before clock: 0.471ns Maximum output required time after clock: 11.439ns Maximum combinational path delay: 11.127ns Figure 6: Summary of 1\_D DWT including 9/7 coefficient Figure 7: Primary Design of 2\_D DWT including 9/7 coefficient Figure 8: Secondary design of 2\_D DWT including 9/7 coefficient ``` Device utilization summary: Selected Device : 6vcx75tff484-2 Slice Logic Utilization: Number of Slice Registers: Number of Slice LUTs: out of 46560 740 out of Number used as Logic: Slice Logic Distribution: Number of LUT Flip Flop pairs used: Number with an unused Flip Flop: Number with an unused LUT: 607 out of 834 72% out of 834 11% Number of fully used LUT-FF pairs: out of Number of unique control sets: TO Utilization: Number of IOs: Number of bonded IOBs: out of Timing Summary: Speed Grade: -2 Minimum period: 6.916ns (Maximum Frequency: 144.596MHz) ``` Figure 9: Summary of 1\_D DWT including 9/7 coefficient Minimum input arrival time before clock: 6.847ns Maximum combinational path delay: 13.108ns Maximum output required time after clock: 13.177ns #### V. CONCLUSION In this paper, CSD-based architecture for computation of 1-D and 2-D DWT is presented. The proposed CSD-based 1-D DWT structure involves significantly less logic resources than the similar existing multiplier-less designs and, it has less bit-cycle period than others. It is concluded that the CSD based 2\_D DWT provide best result compared to previous design. #### REFERENCES - [1] Yuan-Ho Chen, Chih-Wen Lu, Szi-Wen Chen, Ming-Han Tsai, Shinn-Yn Lin, and Rou-Shayn Chen, "VLSI Implementation of QRS Complex Detector Based on Wavelet Decomposition", IEEE Access 2022. - [2] Jhilam Jana, Sayan Tripathi, Ritesh Sur Chowdhury, Akash Bhattacharya and Jaydeb Bhaumik, "An Area Efficient VLSI Architecture for 1-D and 2-D Discrete Wavelet Transform (DWT) and Inverse Discrete Wavelet Transform (IDWT)", Devices for Integrated Circuit, IEEE 2021. - [3] W. Yan, Y. Ji, L. Hu, T. Zhou, Y. Zhao, Y. Liu, and Y. Li, "A resource efficient, robust QRS detector using data compression and time-sharing architecture," in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2021, pp. 1–5. - [4] Z. Zhang, Q. Yu, J. Li, X.-Z. Wang, and N. Ning, "A 12-bit dynamic tracking algorithm-based SAR ADC with real-time QRS detection," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 67, no. 9, pp. 2923–2933, Sep. 2020. - [5] J. Li, A. Ashraf, B. Cardiff, R. C. Panicker, Y. Lian, and D. John, "Low power optimisations for IoT wearable sensors based on evaluation of nine QRS detection algorithms," IEEE Open J. Circuits Syst., vol. 1, pp. 115–123, 2020. - IEEE Open J. Circuits Syst., vol. 1, pp. 115–123, 2020. B. Mishra, N. Arora, and Y. Vora, "Wearable ECG for real time complex P-QRS-T detection and classification of various arrhythmias," in Proc. 11th Int. Conf. Commun. Syst. Netw. (COMSNETS), Jan. 2019, pp. 870–875. - [7] G. Da Poian, C. J. Rozell, R. Bernardini, R. Rinaldo, and G. D. Clifford, "Matched filtering for heart rate estimation on compressive sensing ECG measurements," IEEE Trans. Biomed. Eng., vol. 65, no. 6, pp. 1349–1358, Jun. 2018. - [8] T. Tekeste, H. Saleh, B. Mohammad, and M. Ismail, "Ultra-low power QRS detection and ECG compression architecture for IoT healthcare devices," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 66, no. 2, pp. 669–679, Feb. 2018. - [9] C.-L. Chen and C.-T. Chuang, "A QRS detection and R point recognition method for wearable single-lead ECG devices," Sensors, vol. 17, no. 9, p. 1969, Aug. 2017. - [10] Rakesh Biswas, Siddarth Reddy Malreddy and Swapna Banerjee, "A High Precision-Low Area Unified Architecture for Lossy and Lossless 3D Multi-Level Discrete Wavelet Transform", IEEE Transactions on Circuits and Systems for Video Technology, Vol. 45, No. 5, pp. 01-11, May 2017. - [11] Mamatha I, Shikha Tripathi and Sudarshan TSB, "Pipelined Architecture for Filter Bank based 1-D DWT", International Conference on Signal Processing and Integrated Networks (SPIN), pp. 47-52, May 2016. - [12] Maurizio Martin and Guido Masera, Massimo Ruo Roch and Gianluca Piccinini, "Result-Biased Distributed-Arithmetic-Based Filter Architectures for Approximately Computing the DWT", IEEE Transactions on Circuits and Systems—I: Regular Papers, Vol. 62, No. 8, pp. 2103-2113, August 2015. - [13] Basant Kumar Mohanty, Pramod Kumar Meher, "Memory-Efficient High- Speed Convolution-based Generic Structure for Multilevel 2-D DWT", IEEE transactions on Circuits, Systems for Video Technology, Vol. 23, No. 2, pp. 353-363, February 2013. - [14] Basant K. Mohanty, Anurag Mahajan, Pramod K. Meher, "Area- and Power-Efficient Architecture for High-Throughput Implementation of Lifting 2-DDWT", IEEE Transactions on Circuits and Systems-II: Express Briefs, Vol.59, No.7, pp. 434-438, July 2012. - [15] Chengjun Zhang, Chunyan Wang, M. Omair Ahmad, "A Pipeline VLSI Architecture for High-Speed Computation of the 1-D Discrete Wavelet Transform", IEEE transactions on Circuits and Systems-I; Regular Papers, Vol.57, No.10, pp. 2729-2740, October 2010. - [16] Zhang, Chengjun, Chunyan Wang, and M. Omair Ahmad, "A pipeline VLSI architecture for high-speed computation of the 1-D discrete wavelet transform", IEEE Transactions on Circuits and Systems I: Regular Papers, Vol.57, No. 10,pp: pp. 2729-2740, October 2010.