download - Xia et al. / J Zhejiang Univ Sci A 2009...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Xia et al . / J Zhejiang Univ Sci A 2009 10(7):1067-1074 1067 New method for high performance multiply-accumulator design * Bing-jie XIA , Peng LIU †‡ , Qing-dong YAO ( Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China ) E-mail:; Received July 27, 2008; Revision accepted Oct. 28, 2008; Crosschecked Apr. 27, 2009 Abstract: This study presents a new method of 4-pipelined high-performance split multiply-accumulator (MAC) architecture, which is capable of supporting multiple precisions developed for media processors. To speed up the design further, a novel partial product compression circuit based on interleaved adders and a modified hybrid partial product reduction tree (PPRT) scheme are proposed. The MAC can perform 1-way 32-bit, 4-way 16-bit signed/unsigned multiply or multiply-accumulate operations and 2-way parallel multiply add (PMADD) operations at a high frequency of 1.25 GHz under worst-case conditions and 1.67 GHz under typical-case conditions, respectively. Compared with the MAC in 32-bit microprocessor without interlocked piped stages (MIPS), the proposed design shows a great advantage in speed. Moreover, an improvement of up to 32% in throughput is achieved. The MAC design has been fabricated with Taiwan Semiconductor Manufacturing Company (TSMC) 90-nm CMOS standard cell technology and has passed a functional test. Key words: Multiply-accumulator (MAC), Pipeline, Compressor, Partial product reduction tree (PPRT), Split structure doi: 10.1631/jzus.A0820566 Document code: A CLC number: TP332 INTRODUCTION Multiply-accumulate operation is one of the ba- sic arithmetic operations extensively used in modern digital signal processing (DSP). Most arithmetic, such as digital filtering, convolution and fast Fourier transform (FFT), requires high-performance multiply- accumulate operations. The multiply-accumulator (MAC) unit always lies in the critical path that de- termines the speed of the overall hardware systems. Therefore, a high-speed MAC that is capable of supporting multiple precisions and parallel operations is highly desirable. The existing MAC implementation methods in the literature can be generally classified into three categories. The first category is the recursive MAC method (Clark et al ., 2001; Liao and Roberts, 2002), which builds wider vector elements out of several narrower ones and then adds the multiple results to- gether. It is achieved iteratively by recalculating the data back through the unit over more than one cycle. This method saves hardware resource but requires several clock cycles per operation. The second cate- gory involves the parallel MAC method (Perri et al ., 2005; Parandeh-Afshar et al ., 2006; MIPS Technolo- gies Inc., 2006; 2007) implemented by unrolling the iterative loop of recursive MAC method, which achieves high speed at the cost of hardware resources.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 09/09/2011 for the course EE 3193 taught by Professor Halenlee during the Spring '10 term at NYU Poly.

Page1 / 8

download - Xia et al. / J Zhejiang Univ Sci A 2009...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online