

## International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

(An ISO 3297: 2007 Certified Organization)

Vol. 4, Issue 10, October 2015

# An Efficient Implementation of 12 Multiple DFT/IDFT in Tms320c6670 Processor

S. Narayana Reddy<sup>1</sup>, D. Sai Priya<sup>2</sup>

Associate professor, Dept. of ECE, Sri Venkateswara University College of Engineering , Tirupati, Andhra Pradesh, India<sup>1</sup>

PG Student [Signal processing], Dept. of ECE, Sri Venkateswara University College of Engineering , Tirupati, Andhra  $Pradesh \ , India^2$ 

**ABSTRACT**: LTE adopts SC-FDMA over OFDMA in uplink. Unlike OFDMA the data has to be DFT pre-coded before applying to IFFT. The size of DFT employed for pre-coding is same as the amount of bandwidth (in terms of subcarriers) allocated to user in uplink.DFT pre-coding has to be accomplished in bare minimal time. Bandwidth allocated to each use will be in terms of resource blocks, where each resource block constitutes 12 sub carriers and 0.5 milliseconds time.DFT pre-coding has to be implemented for these 12 multiples. In this paper, it proposes an algorithm for the implementation of these 12 multiple DFT'S with minimum number of computations. Standard DIT-FFT and DIF-FFT algorithms are efficient in computing discrete fourier transform of a finite length sequence whose length can be expressed in powers of 2, but these algorithms can't be employed to compute non radix-2 length DFT'S. Hence we propose a algorithm to evaluate these 12 multiple DFT'S. The algorithm is implemented using tms320c6670 processor.

**KEYWORDS:** LTE, cycles, OFDMA, SC-FDMA, DFT.

#### **I.INTRODUCTION**

LTE is the promising revolutionary change in the cellular technology. It offers a data rate of 100Mbps in the downlink and 50Mbps in uplink. In the uplink scenario, the bandwidth associated with each user will be of 12 multiples. So, in the context of 20MHz spectrum there will be 100 PRBs per each frame which implies 1200 subcarriers as each PRB is associated with 12 subcarriers. Therefore, the pre-coding of size 1200 has to be performed. Pre-coder cannot be implemented using radix-2 algorithms. So, Prime Factor Algorithm and Winogard Fourier transform Algorithm can be used to implement these 12 multiple DFT's with less number of computations. So, to compute DFT of size N where N is non radix-2, PFA (Prime Factor Algorithm) and WFTA (Winogard Fourier Transform Algorithm) are widely used. WFTA requires more computations and storage. Hence, Prime factor algorithm is used for evaluating 12 multiple DFT'S. The TMS320C6670 DSP processor consists of FFTC to accomplish the task of pre-coding for the 12 multiples. In order to perform the pre- coding, the capability of the TI co-processor can be invoked. And this Co-processor must be carefully configured to fulfil the task.

LTE adopts SC-FDMA over OFDMA in uplink so as to improve the power efficiency as the terminal user devices which are small in size cannot afford to transmit the data with large amount of power. The basic significant difference between OFDMA and SCFDMA is that, the data to be transmitted will be pre-coded before being transmitted. The size of DFT employed for pre-coding is same as the amount of bandwidth (in terms of sub carriers) allocated to user in the uplink. The bandwidth that is allocated to each user will be in terms of resource block which are assigned to the respective user. Each resource block consists of 12 subcarriers per one slot in a single radio frame.

In this paper, it proposes efficient algorithm to compute the 12 multiple DFT'S with minimum computation time. we consider this prime factor algorithm to compute DFT for various input size. Efficiency can be known based on the number of cycles consumed. Here, We find the cycles consumed for floating point c code, later fixed point c code, simd instructions and also using the Texas Instrument FFT co-processor. The aim is to find the method in which the time consumption is less, this is given by the number of cycles consumed. The time factor plays a major role because in LTE each SC-FDMA symbol occupies 66.67 micro seconds of time. Hence, generation of each symbol has to be



8319

# International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

(An ISO 3297: 2007 Certified Organization)

## Vol. 4, Issue 10, October 2015

accomplished with in 66.67micro seconds at transmitter. Therefore, DFT pre-coding which is a part symbol generation should be accomplished in minimal time.

## II.SYSTEM MODEL AND ASSUMPTIOM



## LTE frame structure

#### III.PRIME FACTOR ALGORITHM IMPLEMENTATION

Prime Factor Algorithm was first introduced by Goods, and subsequently developed by Kolba and Parks. Consider the length of the sequence for which the DFT has to be evaluated be N. Let  $N=r_1\times r_2$  and assume that  $r_1$  and  $r_2$  are mutually prime (i.e., the greatest common divisor of  $r_1$  and  $r_2$ , denoted by GCD  $(r_1,r_2)$ , is equal to 1). The discrete Fourier transform (DFT) of a sequence  $x_k$ , k=[0,1,...N-1] is defined as,

$$X_n = \sum_{k=0}^{N-1} x_k W_N^{nk} \quad , n = [0, 1, \dots N - 1]$$
 (2.1)

Where

$$W_N^{nk} = e^{-j2\pi nk/N}$$

Then, the indices n and k of Eq. (2.1) can be expressed as,

$$k \equiv k_1 r_1 + k_2 r_2 \pmod{N},$$
  $k_1 = [0,1, \dots, r_2 - 1]$   $k_2 = [0,1, \dots, r_1 - 1]$  (2.2)

$$n \equiv n_1 s_1 r_2 + n_2 s_2 r_1 \pmod{N}, \quad n = [0, 1 \dots N - 1]$$

$$n_1 \equiv n \pmod{r_1} \qquad n_1 = [0, 1, \dots r_1 - 1]$$

$$n_2 \equiv n \pmod{r_2} \qquad n_2 = [0, 1, \dots r_2 - 1]$$
(2.3)

and  $s_1$ ,  $s_2$  are solutions of

$$s_1 r_2 \equiv 1 \pmod{r_1}$$
  
$$s_2 r_1 \equiv 1 \pmod{r_2}$$

The equation 2.1 can be simplified by representing the sequences  $X_k$  and  $x_n$  as two dimensional arrays as,

$$X(n_1, n_2) = \sum_{k_2=0}^{r_1-1} W_{r_1}^{n_1 k_2} A(n_2, k_2)$$
 (2.4)



## International Journal of Advanced Research in Electrical, **Electronics and Instrumentation Engineering**

(An ISO 3297: 2007 Certified Organization)

#### Vol. 4, Issue 10, October 2015

$$A(n_2, k_2) = \sum_{k_1=0}^{r_2-1} W_{r_2}^{n_2 k_1} x(k_1, k_2)$$

$$W_r^{nk} = e^{-\frac{j2\pi nk}{r}} = \cos\left(\frac{2\pi nk}{r}\right) - j\sin\left(\frac{2\pi nk}{r}\right)$$
(2.5)

$$W_r^{nk} = e^{\frac{j2\pi nk}{r}} = \cos\left(\frac{2\pi nk}{r}\right) - j\sin\left(\frac{2\pi nk}{r}\right)$$
 (2.6)

Equation (2.6) is called twiddle factor.

#### IV.IMPLEMENTATION OF PRIME FACTOR ALGORITHM

In the hardware implementation process, the significant metric that has to be considered is consumption of cycles. The time taken by a particular algorithm in real time can be derived from these cycles. These cycles must be divided with the clock frequency to get the time.

In the LTE uplink perspective, each SC-FDMA symbol constitutes 66.67 usec irrespective of the spectrum. All the functional blocks required to generate these symbols must take bare minimal time to complete the whole generation process with in 66.67µsec.therefor, DFT pre coding which is one of the functional block of SC-FDMA must take minimal time. The significant part of PFA which consumes more cycles is the evaluation of the twiddle factors. For a N point discrete sequence,  $N(r_1 + r_2)$  are required. Computation of single twiddle factor consumes approximately 140cycles for TMS320C6670 DSP processor. Then the total no. of cycles for all the twiddle factors will consume  $140N(r_1 + r_2)$ . For example consider 600 point (i.e. 50 resource blocks) discrete sequence, then the total no. of cycles required for evaluation of twiddle factors are 4.2M results in 4.2msec for 1GHz clock frequency of TMS320C6670 DSP processor. But these many number of cycles is very huge in context of LTE

Here in this case, look up tables can be solution which does not involves any computations. For small length sequence, it will not rise any problems but if the sequence length increases (say 600), then the size of look up table must be increased which will again lead to more storage and accessing issues.

Hence, in order to reduce these cycles further, deviation is required to evaluate the twiddle factors from the conventional process and must switch to the alternatives like implementation IIR filters, Taylor series, Parabola method and Line method. But Taylor series has a problem of convergence and requires more number of computations. The Conventional sinusoid extended to single cycle will be similar to parabola and hence parabola is designed so as to match with the sinusoid. Similarly, in line method also a single cycle sinusoid can be approximated with a set of lines. And these approximation of line and parabola for sinusoids are designed in such a way that they minimum average error between them. But here in these two methods, wrapping of phase must be done which consumes huge number of computations. So, implementation of IIR filter is chosen so as to compute these twiddle factors as they take very less number of cycles. Implementing IIR filter for evaluating the twiddle factors is discussed as below:

1.IIR filter implementation of cosine

$$h[n] = cos[w_0n] \xrightarrow{Z \text{ transform}} H(z) = \frac{Y(z)}{X(z)} = \frac{1 - z^{-1}cos(w_0)}{1 - 2z^{-1}cos(w_0) + z^{-2}}$$
$$y[n] - 2cos(w_0)y[n-1] + y[n-2] = x[n] - cos(w_0)x[n-1]$$

When x[n] is impulse, then y[n] = h[n].



## International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

(An ISO 3297: 2007 Certified Organization)

## Vol. 4, Issue 10, October 2015



$$y[n]=\cos(w_0n)$$

$$y[0] = \cos(w_0 0) = 1$$
  
 $y[1] = \cos(w_0 1) = \cos(w_0)$   
 $y[n] = \cos(w_0 n) = 2\cos(w_0) y[n-1] - y[n-2]$ 

2.IIR filter implementation of sine

$$h[n] = sin[w_0 n] \xrightarrow{Z \; transform} H(z) = \frac{Y(z)}{X(z)} = \frac{z^{-1} sin[(w_0)]}{1 - 2z^{-1} \cos(w_0) + z^{-2}}$$

$$y[n] - 2\cos(w_0)y[n-1] + y[n-2] = \sin(w_0)x[n-1]$$
 when x[n] is impulse, then y[n]=h[n]

$$y[n]=\sin(w_0 n)$$

$$y[0] = sin(w_0 0) = 0$$
  
 $y[1] = sin(w_0 1) = sin(w_0)$   
 $y[n] = sin(w_0 n) = 2 cos(w_0) y[n-1] - y[n-2]$ 



Some approximation is desperately needed to evaluate the first sample of  $\cos(w_0 n)$  and  $\sin(w_0 n)$  to reduce cycles. Till date, approximation of these sinusoidal curves is accomplished by using line equations. As an alternative, approximation of sinusoidal curves can also be done by using parabola.

#### V. RESULT AND DISCUSSION

1. Computation of bountiful twiddle factors in PFA to perform DFT preceding consumes many cycles. Hence the evaluation of twiddle factors using conventional method consumes more cycles in LTE context as time is a major constraint. Computation of these twiddle factors is handled by implanting IIR filters as above. In this method, there is



# International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

(An ISO 3297: 2007 Certified Organization)

## Vol. 4, Issue 10, October 2015

need of evaluating first sample and this can be accomplished by incorporating either line method or parabola method. Here in these methods the phase of conventional sinusoid is wrapped to 90° and then approximated with the line and parabola methods. In order to compare these two methods of approximation of conventional sinusoids, root mean square is taken under consideration as metric. And this RMS values are evaluated for both the methods.

Table: Comparison of cycles for approximation methods

| Approximation | No. of          | No. of    | No. of      | Total no. of |
|---------------|-----------------|-----------|-------------|--------------|
| method        | multiplications | additions | comparisons | cycles       |
| Parabola      | 3               | 2         | 0           | 8            |
| Line          | 2               | 2         | 2           | 8            |

And the results of approximation of conventional cosine with the parabola and line extended from 0 to 90° is shown in the Figure as below





Also, the results of approximation of conventional sine with the parabola and line extended from 0 to  $90^{\circ}$  is shown in below



# International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

(An ISO 3297: 2007 Certified Organization)

Vol. 4, Issue 10, October 2015





Root Mean Square error is considered as comparison metric, The RMS value of sine and cosine are as below,

Sin:

Line=0.0214 Parabola=0.0210

Cos:

Line=0.0229 Parabola=0.0185

- 2. Implementation of PFA in real time hardware like TMS320C6670 DSP processor ensures a step by step procedure. While implementing any algorithm, process starts with developing floating point program and then fixed point arithmetic has to be constructed and then SIMD instructions must be invoked to minimize cycles there by reducing the time.
- 3. Fixed point Arithmetic resolves unnecessary storage issues. Metrics considered for performing fixed point coding are precision and number of bits. For adding two fixed point no's i.e., X (a; b) and Y (c; d) make sure that a = b and b = d and the result is stored in Z (a + 1; b). Multiplication of two no's i.e. (a; b) and Y (c; d) yields the result in Z (a + c; b + d).
- 4. TMS320C6670 Processor supports a set of SIMD instructions with the inclusion of respective header files. Basic operations like packing, multiplication, addition, subtraction, shifting etc., are defined with corresponding instruction to consume minimum no. of cycles. Some examples are pack2, cmpy, dadd2 etc.
- 5. Invoking Texas instrument co-processor, TMS320C6670 DSP Processor comprises of FFTC (Fast Fourier Transform Co-processor) the capability of this co-processor can be invoked in order to compute the DFT of these 12 multiples. The registers are configured so as to compute the DFT's. It also facilitates certain features like input and output scaling, Addition of CP etc., FFTC is taking 21micro seconds for both 600 and 1200 point DFT's.



# International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

(An ISO 3297: 2007 Certified Organization)

Vol. 4, Issue 10, October 2015

## TIME TAKEN BY PROCESSOR AT DIFFERENT LEVELS FOR DIFFERENT INPUT SIZES

|                                                   | 12-point | 600-point | 1200-point |
|---------------------------------------------------|----------|-----------|------------|
| Floating Point<br>C-Code                          | 25μs     | 517µs     | 1500μs     |
| Fixed Point<br>C-Code                             | 12μs     | 347µs     | 967μs      |
| SIMD<br>Instructions<br>in Fixed point<br>C-Code  | 1.75μs   | 115µs     | 335μs      |
| Using Texas<br>Instrument<br>FFT Co-<br>processor | 21μs     | 21μs      | 21μs       |

#### VARIATION OF TIME WITH THE FT SIZE



#### **VI.CONCLUSION**

Computation of bountiful twiddle factors must be accomplished by engaging some alternatives like implementation of IIR filters. Computation of small number of twiddle factors can be accomplished by employing approximation of sinusoids either by parabola or line. If the value precision matters then approximation of sinusoids is employed by more number of lines. But if cycles are significant then approximation of sinusoids can be done by



# International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

(An ISO 3297: 2007 Certified Organization)

Vol. 4, Issue 10, October 2015

parabola. Use of lookup tables for twiddle factors are efficient provided the length of DFT is fixed and it is efficient than any other approximations.

The cycles can be reduced further if the changes were made in the PFA. Use of intermediate two dimensional matrices can be replaced with three dimensional matrices which would cut down the complex computations, thereby reducing the cycles.

#### REFERENCES

- $[1]\ FAROOQ\ KHAN\ (2009),\ LTE\ for\ 4G\ Mobile\ Broadband\ Air\ Interface\ Technologies\ and\ Performance,\ Cambridge\ University\ New\ York.$
- [2] 3GPP2 TSG C.S0024-0 v2.0, cdma2000 High Rate Packet Data Air Interface Specification.
- [3] 3GPP TSG RAN TR 25.848 v4.0.0, Physical Layer Aspects of UTRA High Speed Downlink Packet Access.
- [4] 3GPP2TSG C.S0002-C v1.0, Physical Layer Standard for cdma2000 Spread Spectrum Systems, Release C.
- [5] IEEE Std 802.16e-2005, Air Interface for Fixed and Mobile Broadband Wireless Access Systems.
- [6] 3GPP TSG RAN TR 25.912 v7.2.0, Feasibility Study for Evolved Universal Terrestrial Radio Access (UTRA) and Universal Terrestrial Radio Access Network (UTRAN).
- [7] 3GPP2 TSG C.S0084-001-0 v2.0, Physical Layer for Ultra Mobile Broadband (UMB) Air Interface Specification.
- [8] 3GPP TSG RAN TR 25.913 v7.3.0, Requirements for Evolved Universal Terrestrial Radio Access (UTRA) and Universal Terrestrial Radio Access Network (UTRAN).