

> (An ISO 3297: 2007 Certified Organization) Website: <u>www.ijareeie.com</u> Vol. 6, Issue 4, April 2017

# Design of Efficient Multiplier to Mitigate Performance Degradation Caused by Aging Effects

Swetha H<sup>1</sup>, Dr.S.Rajkumar<sup>2</sup>

M.Tech Student (VLSI Design), Dept. of ECE, NCERC, Thrissur, Kerala, India<sup>1</sup>

Head of Dept. of ECE, NCERC, Thrissur, Kerala, India<sup>2</sup>

**ABSTRACT:** The need of any VLSI circuits is high speed computation, low power and high performance and less area. Aging effects degrade multiplier speed, and in the long term, the system may fail due to timing violations. This paper introduces a high performance multiplier design that considers the Aging effect using Aging Aware block. The proposed design is implemented using a multi precision Column bypass multiplier. The experimental results show that our proposed architecture with multi precision column bypassing multipliers performance improvement, compared with variable-latency column bypass multiplier and fixed-latency column-bypassing multiplier.

**KEYWORDS:** Aging Aware block, Aging effects, Multi precision, Dynamic voltage and frequency scaling, Parallel processing, Variable latency, Timing violation, High Performance, Column bypassing multiplier.

## I. INTRODUCTION

Digital multipliers are among the most critical arithmetic functional units in many applications such as the Fourier Transform, Discrete Cosine Transforms, and digital filtering. Arithmetic functional units in many applications, such as the Fourier transform, discrete cosine transforms, and digital filtering. The through put of these applications depends on multipliers, and if the multipliers are too slow, the performance of entire circuits will be reduced, because the multiplier is generally the slowest element in the system. Hence, the multi precision variable-latency design is proposed to reduce maximum power consumption, area and timing waste of traditional circuits paths. The bit width of the multiplier is same as that of the bit width of the largest operand of the application that the processor executes. But most of the times the operands do not occupy the maximum width and utilizes the resources unnecessarily which results in power loss. Combining MP multiplier with DVS can provide a dramatic reduction in power consumption by adjusting the voltage according to circuit's run-time, workload rather than fixing it to cater the worst case situations. The variable latency technique divides the circuit into two parts, they are, the shorter paths and the longer paths. Shorter paths can execute correctly in one cycle. In case of the longer paths, it needs two cycles to execute. When shorter paths are activated frequently, the average latency of variable-latency designs is better than that of traditional designs. Latency is the delay from input into a system to desired outcome. Also, it is well known that multipliers consume most of the power in DSP computations. Hence, low power column-bypassing multipliers and have been proposed to reduce the number of delay as well as power consumption. The delay and power reduction depends on the input bit coefficient. This means that if the input bit coefficient is zero, corresponding row or column of adders need not be activated. Furthermore, negative bias temperature instability (NBTI) occurs when a pMOS transistor is under negative bias (Vgs = -Vdd).

The corresponding effect on an NMOS transistor is positive bias temperature instability (PBTI), which occurs when an NMOS transistor is under positive bias. Compared with the NBTI effect, the PBTI effect is much smaller on oxide/poly-gate transistors, and therefore is usually ignored. To mitigate the aging effects occurring in the circuit Adaptive Hold Logic is used here.



(An ISO 3297: 2007 Certified Organization)

Website: www.ijareeie.com

Vol. 6, Issue 4, April 2017

#### **II. LITERATURE SURVEY**

Now a days, circuits and systems are growing in speed and complexity and a successful IC design grows to a complex optimization problem with several considerations like silicon area, speed, testability, design effort and power dissipation. This traditional design methodology assumes that the electrical and physical properties of transistors are deterministic and hence predictable over the life time of the device. With the continuous silicon technology scaling to nm range for better circuit operation speed, integration density and power consumption, transistor properties are no longer deterministic. It is the temporal reliability degradation due to bias-temperature instability which makes the transistor properties no longer deterministic.

## A. BIAS TEMPERATURE INSTABILITY

Bias Temperature Instability (BTI) has gained a lot of attention due to its increasingly adverse impact in nanometer CMOS technologies. It is a threshold voltage ( $V_{th}$ ) shift after a bias voltage has been applied to a MOS gate at elevated temperature. It causes threshold voltage ( $V_{th}$ ) increments to the MOS transistors. Threshold voltage  $V_{th}$  increment in a pMOS transistor that occurs under the negative gate stress is referred to as Negative Bias Temperature Instability (NBTI) and the one that occur in an nMOS transistor under positive gate stress is known as Positive Bias Temperature Instability (PBTI). The NBTI or PBTI impact can become more Significant depending on the dielectric type. For a MOS transistor, there are two BTI phases

- a. Stress phase.
- b. Relaxation phase.

These two phases differ by the gate biasing (i.e.  $V_{DD}$  or  $-V_{DD}$ ) of the MOS transistors.



Figure 1: Bias Temperature Instability Phases

## **B.TWIN PRECISION**

The technique used is the double precision. Precision is defined as the number of digits used to represent a number. Twin precision is nothing but using two different precisions within a single multiplication. That is dividing the bit width of the operands into two equal bit widths. That means if we consider an operand with a precision of octal and we are using the twin precision technique then the precision of the operand is divided into two quad operands. This technique provides the same power reduction as the operand guarding and the area overhead is also reduced. But it is having a small delay penalty.



(An ISO 3297: 2007 Certified Organization)

Website: <u>www.ijareeie.com</u>

Vol. 6, Issue 4, April 2017



Figure 2: Twin precision 8x8 Multiplier

#### **III. METHODOLOGY**

#### A. COLUMN BYPASS MULTIPLIER

Bypassing multipliers are modification of normal array multipliers. Dynamic power consumption can be reduced by bypassing method when the multiplier has more zeros in input data. The path delay for an operation is strongly tied to the number of zeros in the multiplicands in the column- bypassing multiplier. Traditional filter design using bypassing multiplier does not consider variable latency technique.

However, no Bypass multiplier using variable latency that considers the aging effect and can adjust dynamically has not yet been developed.



Figure 3: Column Bypass Multiplier

A column-bypassing multiplier is an improvement on the normal array multiplier (AM). In low-power column bypassing multiplier design, the FA operations are disabled if the corresponding bit in the multiplicand is 0.Supposing the inputs are 10102\* 11112, it can be seen that for the FAs in the first and third diagonals, two of the three input bits are 0: the carry bit from its upper right FA and the partial product aibi .Therefore, the output of the adders in both is 0, and the output sum bit is simply equal to the third bit, which is the sum output of its upper FA. Hence, the FA is modified to add two tri-state gates and one multiplexer. The multiplicand bit ai can be used as the selector of the multiplexer to decide the output of the FA, and ai can also be used as the selector of the tri-state gate to turn off the input path of the FA. If ai is 0, the inputs of FA are disabled, and the sum bit of the current FA is equal to the sum bit from its upper FA, thus reducing the power consumption of the multiplier. If ai is 1, the normal sum result is selected.



(An ISO 3297: 2007 Certified Organization)

Website: www.ijareeie.com

Vol. 6, Issue 4, April 2017

## B. MULTI PRECISION MULTIPLIER

Increased demand for portable yet high performance multimedia & communication products imposes strict constraints on the power consumption of individual components. Of these, multipliers perform one of the most Frequently encountered arithmetic operations in DSPs. Since multipliers are the slowest element in the system, the performance of a system depends on its multipliers. Also high precision multipliers consumes large amount of area in DSP kits. Therefore it is important to optimize speed and performance of a multiplier.

The MP multiplier system comprises four different modules such as:

The MP multiplier; The frequency scaling unit implemented using a voltage controlled oscillator (VCO).

Its function is to generate the required operating frequency of the multiplier; The voltage scaling unit (VSU) implemented using a voltage dithering technique to limit silicon area overhead. Its function is to dynamically generate the supply voltage so as to minimize power consumption; The dynamic voltage/frequency management unit (VFMU) that receives the user requirements (throughput).The VFMU sends control signals

to the VSU and FSU to generate the required power supply voltage and Clock frequency for the MP multiplier. The proposed multiplier not only combines MP and DVS but also parallel processing (PP).

## **PROPOSED ARCHITECTURE**

## A.AGING-AWARE VARIABLE LATENCY MULTIPLIER DESIGN

Our proposed aging-aware multiplier architecture, which includes two m-bit inputs (m is a positive number), one 2m-bit output, one array, row/column bypassing o 2m1-bit Razor flip flops, and an AHL circuit. In the proposed architecture, the column multipliers can be examined by the number of zeros in either the multiplicand or multiplicator to predict whether the operation requires one cycle or two cycles to complete. When input patterns are random, the number of zeros and ones in the multiplicator and multiplicand follows a normal distribution. Razor flip-flops can be used to detect whether timing violations occur before the next input pattern arrives.



Figure 4: Multi precision Variable Latency Multiplier with Adaptive Hold Logic

The performance of the processors in DSP systems are mainly depend on the multipliers in it. This paper presents a power & area efficient multi precision multiplier with a decreased delay. As multipliers are the key components in DSPs, microprocessors, FIR filters etc, it will adversely affect the performance of the system. Thus the main aim of the project is to increase the speed of the multiplier, for this some compression techniques were



(An ISO 3297: 2007 Certified Organization)

Website: www.ijareeie.com

Vol. 6, Issue 4, April 2017

incorporated. This multiplier also enables parallel processing so that it is possible to perform higher precision multiplications. The main focus of this paper is to increase the speed of the multipliers. The speed of a multiplier relies on generation of partial products. Here, it is suggested to use compressing techniques to improve the speed of multipliers. In addition to that scaling of supply voltage and frequency management are also done. This flexible multiplier combining variable precision processing, voltage and frequency management can be used efficiently to reduce circuit power consumption and delay. To generate the required power supply voltage and clock frequency for the MP multiplier. Initially, the multiplier operates at a standard supply voltage of 3.3 V. If the razor flip flops of the multiplier do not report any errors, this means that the supply voltage can be reduced. This is achieved through the VFMU, which sends control signals to the VSU, hence to lower the supply voltage level. When the feedback provided by the razor flip flops indicates timing errors, the scaling of the power supply is stopped. Proposed Multiplier not only combined Multi Precision & Parallel Processing, and also combines DVS with operand scheduling technique. PP can be used to increase the throughput or reduce the supply voltage level for low power operation.

#### **B.** VARIABLE-LATENCY DESIGN

Traditional circuits use critical path delay as the overall circuit clock cycle in order to perform correctly. However, the probability that the critical paths are activated is low. In most cases, the path delay is shorter than the critical path. For these noncritical paths, using the critical path delay as the overall cycle period will result in significant timing waste. Hence, the variable-latency design was proposed to reduce the timing waste of traditional circuits. The variable-latency design divides the circuit into two parts: shorter paths and longer paths. Shorter paths can execute correctly in one cycle, whereas longer paths need two cycles to execute. When shorter paths are activated frequently, the average latency of variable latency designs is better than that of traditional designs

#### C. RAZOR FLIP-FLOP

A 1-bit Razor flip-flop contains a main flip-flop, shadow latch, XOR gate, and multiplexer. The main flip-flop catches the execution result for the combination circuit using a normal clock signal, and the shadow latch catches the execution result using a delayed clock signal, which is slower than the normal clock signal.



If the latched bit of the shadow latch is different from that of the main flip-flop, this means the path delay of the current operation exceeds the cycle period and the main flip-flop catches an incorrect result.

If errors occur, the Razor flip-flop will set the error signal to 1 to notify the system to re execute the operation and notify the AHL circuit that an error has occurred. . We use Razor flip-flops to detect whether an operation that is considered to be a one-cycle pattern can really finish in a cycle. If not, the operation is re executed with two



(An ISO 3297: 2007 Certified Organization)

Website: www.ijareeie.com

Vol. 6, Issue 4, April 2017

cycles. Although the re-execution may seem costly, the overall cost is low because the re-execution frequency is low.

#### **IV. RESULT AND DISCUSSION**

The design entry is modeled using VHDL in Xilinx ISE. The 32x32 variable and fixed latency column bypass multiplier is compared with the 32x32 multi precision column bypass multiplier. The low power multi precision variable latency multiplier with AH logic contains modules such as a column bypassing multiplier, the razor flip-flop and an adaptive hold logic. The table shows the comparison between multi precision variable latency, variable latency and fixed latency column bypass multiplier.

#### 32X32 MULITIPLIER

| LATENCY                                | POWER(mW) | AREA(No:of gate count) | DELAY(ns) |
|----------------------------------------|-----------|------------------------|-----------|
| FIXED LATENCY                          | 245       | 17,853                 | 10.001    |
| VARIABLE<br>LATENCY                    | 191       | 17,252                 | 6.237     |
| MULTI PRECISION<br>VARIABLE<br>LATENCY | 182       | 4,355                  | 6.237     |

Figure 7: Simulation result of various 32x32 multipliers

## V. CONCLUSION

Multipliers are the key component in digital circuits. These papers propose an aging aware multi precision variable latency multiplier design with Adaptive Hold Logic. Multi precision multipliers which result in minimized area and power consumption is opted. The multiplier is able to adjust the Adaptive Hold Logic to mitigate performance degradation due to increased delay. In the fixed latency the clock cycles is fixed and due to this the timing violations occur thus variable latency based multipliers has less delay when compared with fixed latency ones, the multi precision multiplier has less area and power compared to fixed and variable latency multiplier.

#### REFERENCES

[1] Ing-Chao Lin, *Member, IEEE*, Yu-Hung Cho, and Yi-Ming Yang, "Aging-Aware Reliable Multiplier Design With Adaptive Hold Logic", inProc. IEEE, 2014.

[2] Y. Cao. (2013). Predictive Technology Model (PTM) and NBTI Model[Online]. Available: http://www.eas.asu.edu.

[3] S. Zafar et al., "A comparative study of NBTI and PBTI (chargetrapping) in SiO2/HfO2 stacks with FUSI, TiN, Re gates," in Proc.IEEE Symp. VLSI Technol. Dig. Tech. Papers, 2006, pp. 23–25.

[4] S. Zafar, A. Kumar, E. Gusev, and E. Cartier, "Threshold voltage instabilities in high-k gate dielectric stacks," IEEE Trans. Device Mater.Rel., vol. 5, no. 1, pp. 45–64, Mar. 2005.

[5] K.-C. Wu and D. Marculescu, "Aging-aware timing analysis and optimization considering path sensitization," inProc., 2011, pp.1-6.

[6] Neil H.E. Weste, David Harris, Ayan Banerjee, "CMOS VLSI Design - A circuits and systems Perspective", Pearson education, 3rd edition.

[7] Ms. Ritu Jain "Design and Analysis of Generic Architecture of Multipliers". IJERT Volume 6, issue 8, august 2014.

[8] Vaishali S. Chirde "Design of Adaptive Hold Logic (AHL) Circuit to Reduce Aging Effects" International Conference on Emerging Trends in Advanced Communication Technologies (NCETACT-2015).



(An ISO 3297: 2007 Certified Organization)

Website: <u>www.ijareeie.com</u>

Vol. 6, Issue 4, April 2017

[9] Xiaoxiao Zhang, Student Member, IEEE, Farid Boussaid, Senior Member, IEEE, and Amine Bermak, Fellow, IEEE

"32 Bit×32 Bit Multiprecision Razor-Based Dynamic Voltage Scaling Multiplier with Operands Scheduler", inProc.IEEE, 2014.

[10] Magnus Själander and Per Larsson-Edefors, Senior Member, IEEE, "Multiplication Acceleration Through Twin Precision", inProc. IEEE, 2010.