# International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering (An ISO 3297: 2007 Certified Organization) Vol. 4, Issue 4, April 2015 # Highly Optimized APC-OMS Based Recursive Least Square (RLS) Adaptive Filter with Fast Convergence Rate Sivasakthi S <sup>1</sup>, Kalaimani K <sup>2</sup> Associate Professor, Dept. of EEE, Krishnasamy College of Engineering and Technology, Cuddalore, Tamilnadu, India<sup>1</sup> PG Student [EST], Dept. of EEE, Krishnasamy college of Engineering and Technology, Cuddalore, Tamilnadu, India <sup>2</sup> **ABSTRACT**: The Recursive Least Square (RLS) adaptive filter is the most popular and widely used adaptive filter not only because of its simplicity but also because of its satisfactory convergence performance. Memory based structures are used in many kind of Digital Signal Processing (DSP) application. Memory-based structures are better performance in area minimization compare with multiply-accumulate structures and have many other advantages such as reduced latency. The Anti-Symmetric Product Coding (APC) and Odd-Multiple-Storage (OMS) techniques were used for Look-Up-Table (LUT) in adaptive FIR filter. Memory-based structure such as APC and OMS techniques are used for efficient low power design using LUT partitioning. Hence, the combination of these two techniques provides reduction in LUT size to one-fourth in adaptive FIR filter when compared with the conventional Look -Up -Table (LUT) of adaptive FIR filter. **KEYWORDS**: Anti-symmetric product coding, Odd multiple storage, Look-up-table, Recursive least square ,Adaptive FIR filter. ### I. INTRODUCTION A conventional look-up-table (LUT)-based multiplier is shown in Table.1, where A is a fixed coefficient, and X is an input word to be multiplied with A. Assuming X to be a positive binary number of word length L, there can be 2L possible values of X, and accordingly, there can be 2L possible values of product $C = A \cdot X$ . Therefore, for memory-based multiplication, an LUT of 2L words, consisting of pre computed product values corresponding to all possible values of X, is conventionally used. The product word $A \cdot Xi$ is stored at the location Xi for $0 \le Xi \le 2L - 1$ , such that if an L-bit binary value of Xi is used as the address for the LUT, then the corresponding product value $A \cdot Xi$ is available as its output. #### A. LUT's in FPGAs Three ways of using RAM blocks as multipliers are available. They are basic single look-up table multiplier, partial product multiplier, and RAM based constant coefficient multiplier. For the Accelerator devices, the single look-uptable approach can create a very fast, but narrow, 4-bit multiplier. ### B. Basic Look-Up Table (LUT)-Based Multipliers A basic LUT-based multiplier is simply a look-up table with the addresses arranged so that part of the address is the multiplicand and the other part is the multiplier. The data width should be set to the sum of the address width to accommodate the product. # International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering (An ISO 3297: 2007 Certified Organization) ## Vol. 4, Issue 4, April 2015 Fig.1 Storage and Demonstrates A 4×4 Bit Multiplier Since the memory block in the accelerator is synchronous, this configuration will result in a synchronous multiplier. The multiplier's clock frequency is only limited by the data access time of the memory. This approach is more efficient than implementing multipliers in gates, but it can consume a large amount of memory. The amount of memory required increases with the square of the bit width. ### C. PARTIAL PRODUCT MULTIPLIERS One way to mitigate the amount of memory required is to use partial product multiplication. This technique combines the look-up table approach with elements of longhand multiplication. For example, to multiply $24 \times 43 = 1032$ using longhand, we simplify the problem into the sum of 4 multiplication functions and three addition functions $(4\times3 + ((2\times3)\times10)) + ((4\times4) + ((2\times4)\times10)\times10) = 1032$ . Fig. 2 Constant Coefficient Multiplier #### D. CONSTANT COEFFICIENT MULTIPLIER A third approach to using memory blocks as multipliers is employing a constant coefficient multiplier. In many cases, especially in Digital Signal Processing (DSP) applications, the multiplicand remains constant and only the multiplier varies. Although can create a constant coefficient multiplier using pure logic, implementing this function in RAM creates a much faster multiplier and uses very few logic gates. Fig.3 Block of 256×16 Bit Words # International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering (An ISO 3297: 2007 Certified Organization) ### Vol. 4, Issue 4, April 2015 This type of multiplier scales linearly with the width of the values being multiplied. While a basic look-up table 8×8 multiplier uses one block of 65536×16 bit words, 128 Kbytes of storage, and the partial product look-up table multiplier uses four blocks of 256×8 bit words, 1 Kbyte, the constant coefficient multiplier requires only one block of 256×16 bit words, 0.5 Kbyte, and does not incur the cost of the additional logic and delay incurred by using the partial product multiplier. #### II. DESIGN METHODOLOGY #### A. APC FOR LUT OPTIMIZATION In addition, the sum of product values corresponding to these two input values on the same row is 32A. Let the product values on the second and fourth columns of a row be u and v, respectively. Since one can write u = [(u + v)/2 - (v - u)/2] and v = [(u + v)/2 + (v - u)/2], for (u + v) = 32A, we can have $$u = 16A - \left\lceil \frac{v-u}{2} \right\rceil \quad v = 16A + \left\lceil \frac{v-u}{2} \right\rceil.$$ $\begin{array}{c} x_0 \longrightarrow \\ x_1 \longrightarrow \\ x_2 \longrightarrow \\ x_2 \longrightarrow \\ x_3 \longrightarrow \\ CONTROL \\ X_3 \longrightarrow \\ CIRCUIT \\ X_4 \longrightarrow \\ CIRCUIT \\ X_4 \longrightarrow \\ CIRCUIT \\ X_5 \longrightarrow \\ CONTROL \\ X_7 \longrightarrow \\ CIRCUIT \\ X_8 \longrightarrow \\ CONTROL \\ X_8 \longrightarrow \\ CIRCUIT \\ X_9 CIRCUIT \\ X_9 \longrightarrow \\ CIRCUIT \\ CIRCUIT \\ X_9 \longrightarrow \\ CIRCUIT CIR$ Fig.4 Proposed APC-OMS combined LUT design Product word = $16A + (\text{sign value}) \times (\text{APC word})$ where sign value = 1 for x4 = 1 and sign value = -1 for x4 = 0. The product value for X = (10000) corresponds to APC value "zero," which could be derived by resetting the LUT output, instead of storing that in the LUT. Fig.5 LUT-based multiplier for L = 5 using the APC technique #### III. MODIFIED OMS FOR LUT OPTIMIZATION for the multiplication of any binary word X of size L, with a fixed coefficient A, instead of storing all the 2L possible values of C = A · X, only (2L/2) words corresponding to the odd multiples of A may be stored in the LUT, while all the even multiples of A could be derived by left-shift operations of one of those odd multiples. ### A. APC words for different input value Based on the above assumptions, the LUT for the multiplication of an *L*-bit input with a *W*-bit coefficient could be designed by the following strategy. 2387 # International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering (An ISO 3297: 2007 Certified Organization) ## Vol. 4, Issue 4, April 2015 | Input, X | product<br>values | Input, X | product<br>values | address $x_3'x_2'x_1'x_0'$ | APC<br>words | |-----------|-------------------|----------|-------------------|----------------------------|--------------| | 00001 | A | 11111 | 31A | 1 1 1 1 | 15A | | 00010 | 2A | 11110 | 30A | 1 1 1 0 | 14A | | 00011 | 3A | 11101 | 29A | 1 1 0 1 | 13A | | 00100 | 4A | 11100 | 28A | 1 1 0 0 | 12A | | 00101 | 5A | 11011 | 27A | 1 0 1 1 | 11A | | 00110 | 6A | 11010 | 26A | 1 0 1 0 | 10A | | 00111 | 7A | 11001 | 25A | 1 0 0 1 | 9A | | 01000 | 8A | 11000 | 24A | 1 0 0 0 | 8A | | 01001 | 9A | 10111 | 23A | 0 1 1 1 | 7A | | 01010 | 10A | 10110 | 22A | 0 1 1 0 | 6A | | 01011 | 11A | 10101 | 21A | 0 1 0 1 | 5A | | 01100 | 12A | 10100 | 20A | 0 1 0 0 | 4A | | 0 1 1 0 1 | 13A | 10011 | 19A | 0 0 1 1 | 3A | | 0 1 1 1 0 | 14A | 10010 | 18A | 0 0 1 0 | 2A | | 01111 | 15A | 10001 | 17A | 0 0 0 1 | A | | 10000 | 16A | 10000 | 16A | 0 0 0 0 | 0 | Table.1 APC words for different input values - 1)A memory unit of [(2L/2) + 1] words of (W + L)-bit width is used to store the product values, where the first (2L/2) words are odd multiples of A, and the last word is zero. - 2) A barrel shifter for producing a maximum of (L-1) left shifts is used to derive all the even multiples of A. - 3) The L-bit input word is mapped to the (L-1)-bit address of the LUT by an address encoder, and control bits for the barrel shifter are derived by a control circuit. #### D. OMS-based design of the LUT of APC words | input $X'$ $x'_3x'_2x'_1x'_0$ | product<br>value | # of<br>shifts | shifted input, X'' | stored APC<br>word | address $d_3d_2d_1d_0$ | |-------------------------------|------------------|----------------|--------------------|--------------------|------------------------| | 0 0 0 1 | A | 0 | | P0 = A | 0000 | | 0 0 1 0 | $2 \times A$ | 1 | 0001 | | | | 0 1 0 0 | $4 \times A$ | 2 | 0001 | | | | 1 0 0 0 | $8 \times A$ | 3 | | | | | 0 0 1 1 | 3A | 0 | | P1 = 3A | 0 0 0 1 | | 0 1 1 0 | $2 \times 3A$ | 1 | 0011 | | | | 1 1 0 0 | $4 \times 3A$ | 2 | | | | | 0 1 0 1 | 5A | 0 | 0101 | P2 = 5A | 0 0 1 0 | | 1 0 1 0 | $2 \times 5A$ | 1 | 0101 | | | | 0 1 1 1 | 7A | 0 | 0111 | P3 = 7A | 0 0 1 1 | | 1 1 1 0 | $2 \times 7A$ | 1 | 0111 | | | | 1 0 0 1 | 9A | 0 | 1001 | P4 = 9A | 0100 | | 1 0 1 1 | 11A | 0 | 1011 | P5 = 11A | 0101 | | 1 1 0 1 | 13A | 0 | 1101 | P6 = 13A | 0110 | | 1 1 1 1 | 15A | 0 | 1111 | P7 = 15A | 0111 | Table .2 OMS-based design of the LUT of APC words The word to be stored for X = (00000) is not 0 but 16A, which we can obtain from A by four left shifts using a barrel shifter. However, if 16A is not derived from A, only a maximum of three left shifts is required to obtain all other even multiples of A. A maximum of three bit shifts can be implemented by a two-stage logarithmic barrel shifter, but the implementation of four shifts requires a three-stage barrel shifter. For X = (10000), the APC word "0" is derived by resetting the LUT output, by an active-high RESET signal given by RESET = (x0 + x1 + x2 + x3) · x4. 2388 # International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering (An ISO 3297: 2007 Certified Organization) Vol. 4, Issue 4, April 2015 #### IV. PROPOSED LUT OPTIMIZATION SCHEME #### A. LUT Multiplier Using APC The structure and function of the LUT-based multiplier for L=5 using the APC technique. It consists of a four-input LUT of 16 words to store the APC values of product words as given in the sixth column of Table.1, except on the last row, where 2A is stored for input X=(00000) instead of storing a "0" for input X=(10000). Besides, it consists of an address-mapping circuit and an add/subtract circuit. #### **B. LUT USING MODIFIED OMS** The proposed APC–OMS combined design of the LUT for L=5 and for any coefficient width W. It consists of an LUT of nine words of (W+4)-bit width, a four-to-nine-line address decoder, a barrel shifter, an address generation circuit, and a control circuit for generating the RESET signal and control word (s1s0) for the barrel shifter while 2A is stored for input X=(00000) at LUT address "1000," as specified in Table 3.2. The decoder takes the 4-bit address from the address generator and generates nine word-select signals, i.e., $\{wi, \text{ for } 0 \le i \le 8\}$ , to select the referenced word from the LUT. Fig .6 (a) sign modification LUT output. (b) Address-generation circuit. The control bits s0 and s1 to be used by the barrel shifter to produce the desired number of shifts of the LUT output are generated by the control circuit, according to the relations $$s_0 = \overline{x_0 + \overline{(x_1 + \overline{x_2})}}$$ $s_1 = \overline{(x_0 + x_1)}$ . Note that (s1s0) is a 2-bit binary equivalent of the required number of shifts specified in Tables 3.1 and 3.2. ### C. BLOCK DIAGRAM OF ADAPTIVE FIR FILTER Fig.7 Block Diagram for Adaptive Filter Adaptive filters differ from other filters such as FIR and IIR in the sense that - The coefficients are not determined by a set of desired specifications - The coefficients are not fixed. - With adaptive filters the specifications are not known and change with time. # International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering (An ISO 3297: 2007 Certified Organization) ### Vol. 4, Issue 4, April 2015 • Applications include: process control, medical instrumentation, speech processing, echo and noise calculation and channel equalization. #### V. RESULTS AND DISCUSSION #### **MODULES:** A. Performance of RLSB. Power analyser C. Synthesis report #### A. PERFORMANCE OF RLS Fig. 8 Simulated output In the fig.8, shows the Convergence performance of system identification with RLS. #### **B.POWER ANALYZER** Fig. 9 Flow summary report In the fig.9, show the power analyser with the help of QUARATUS-II software. #### **POWER ANALYZER** | OVERCOME | VALUES | | |-----------------|---------|--| | Existing System | 60.52Mw | | | Proposed System | 34.08mW | | Table.1 power analyzer # International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering (An ISO 3297: 2007 Certified Organization) Vol. 4, Issue 4, April 2015 In this tabulation.1, show the different between existing system and proposed system. #### D. SYNTHESIS REPORT Fig.10 RTL Schematic report for efficient FIR filter In this fig.10 show the area minimization with the help of APC-OMS techniques. #### VI. CONCLUSION In this paper simulation is carried out by using the APC-OMS based multiplier to reduce total logical area consumption. The Anti-Symmetric Product Coding (APC) and Odd-Multiple-Storage (OMS) techniques were used for Look-Up-Table (LUT) FIR filter. Hence, the combination of these two techniques provides reduction in LUT size to one-fourth in adaptive FIR filter, it improving the convergence performance of system identification by using Recursive Least Square approach, synthesis the performance of power and timing analyzer. #### REFERENCES - 1. J.-I. Guo, C.-M. Liu, and C.-W. Jen, "The efficient memory-based VLSI array design for DFT and DCT," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 39, no. 10, pp. 723–733, Oct. 1992 - 2. H.-R. Lee, C.-W. Jen, "On the design automation of the memory-based VLSI architectures for FIR filters," *IEEE Trans. Consum. Electron.*, vol. 39, no. 3, Aug. 1993 - 3. D. F. Chiper, M. N. S. Swamy, M. O. Ahmad, and T. Stouraitis, "A systolic array architecture for the discrete sine transform," *IEEE Trans. Signal Process.*, vol. 50, no. 9, pp. 2347–2354, Sep. 2002. - 4. H.-C. Chen, J.-I. Guo, T.-S. Chang, and C.-W. Jen, "A memory-efficient realization of cyclic convolution and its application to discrete cosine transform," *IEEE Trans. Circuits Syst. Video Technol.*, vol. 15, no. 3, pp. 445–453, Mar. 2005 - D. F. Chiper, M. N. S. Swamy, M. O. Ahmad, and T. Stouraitis, "Systolic algorithms and a memory-based design approach for a unified architecture for the computation of DCT/DST/IDCT/IDST," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 52, no. 6, pp. 1125–1137, Jun. 2005 - 6. P. K. Meher, "Systolic designs for DCT using a low-complexity concurrent convolutional formulation," *IEEE Trans. Circuits Syst. Video Technol.*, vol. 16, no. 9, pp. 1041–1050, Sep. 2006. - 7. P. K. Meher, "Memory-based hardware for resource-constrained digital signal processing systems," in *Proc. 6th Int. Conf. ICICS*, Dec. 2007, pp. 1–4. #### **BIOGRAPHY** Sivasakthi.S has received his B.E. (EEE) in the year 1999 from Velammal Engineering College, Chennai, Madras University, Tamil Nadu, And he received M.E. (Power System Engineering) in the year 2008 from Annamalai University, Chidhambaram, Tamil Nadu. At present he is pursuing HOD/EEE in Krishnasamy College of Engineering and Technology, Cuddalore, Tamil Nadu, affliated to Anna University, Chennai. 2390 # International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering (An ISO 3297: 2007 Certified Organization) Vol. 4, Issue 4, April 2015 Kalaimani.K has received her B.E. (ECE) in the year 2012 from Vivekanandha Institute of Engineering and Technology for Women, Tiruchengode, Tamil Nadu, affliated to Anna University, Chennai. At present she is pursuing M.E. (Embedded System Technologies) in Krishnasamy College of Engineering and Technology, Cuddalore, Tamil Nadu, affliated to Anna University, Chennai. Her research interest lies in the areas of VLSI.