113D_1_EE113D_CR2_C54xPgmg - Digital versus Analog Signal...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Digital versus Analog Signal Processing The advantages of using digital signal processing over analog signal processing are : 1) Less expensi ve, and often fewer components 2) More reliable and deterministic performance 3) More flexible and easy adjustable by reprogramming 4) Higher precision (the more bits you use to represent signals, the higher accuracy you get) 5) Higher noise immunity 6) Wider range of applications (like adaptive signal processing) , T he dis-advantages of using digital signal processing over analog signal processing are: 1) Limited ban dwidth (to prevent aliasing bandwidth is limited to folding frequency Fs/2 ) 2) Quantizat ion of signals is necessary (and can be considered as an additional noise source) 3) Due to quantization and overflow a stable syst em can become unstable and create oscillations (which are called limit cycles and overflow oscilla tions, respectively) Hardware versus Software Digital Signal Processing Implementation 1) DSP chip is less expensive than a CPU. 2) DSP is simpler and it can run faster. 3) DSP can perform real-time signal processing in applications upto higher frequencies. 4) DSP has higher amount of parallelism. 3 25 A pplications Typical digital signal processing tasks are: 1) ~igital filterin.g: Finite Imp~lse Response (FIR ) a nd Recursive (IIR) digital filtering 2) Signal processing: Compression, Expansion, Averaging 3) Data Processing: Encrypting/Scrambling, En coding, Decoding 4) Numeric Processing: Scaler, vector and matrix arithmetic, functional computat ions 5) Modulation: Amplitude , Frequency, Phas e 6) Spec t ral Analysis: Fas~ Fourier Transform (FFT ), Discret e Fourier Transform (DFT) Table 1- 1. Typical Applications for the TMS320DSPs Automotive Adaptive ride control An tiskid brakes Ce llular telephones Digital radios Engine control Navigation and global positioning Vibration analysis Voice commands Anticollision radar General-Purpose Adaptive filtering Convolution Correlation Digital filtering Fast Fourier transforms Hilbert transforms Waveform generation Windowing Instrumentation Digital filtering Function generation Pattern matching Phase-locked loops Seismic processing Spectrum analysis Transient analysis Consumer Digital radioslTVs Educational toys Mus ic synthesizers Pagers Power tools Radar detectors Solid-state answering machines Control Disk drive control Engine control laser printe r control Motor control Robotics control Servo contro l Graphlcsllmaglng 3-D rotation Animation/digital maps Homomorphic processing Image compressionltra nsmission Image enhancement Pattern recognition Robot vision Workstations Medical Diagnostic equipment Fetal monitoring Heari ng aids Patient monitoring Prosthetics Ultrasound equipment Telecommunications Industrial Numeric control Power-line monitoring Robotics Security access MUllary Image processing Missile guidance Navigation Radar processing Radio frequency modems Secure communications Sonar processing VolcelSpeech Speaker verification Speech enhance ment Speech recognitio n Speech synthesis Speech vocoding Text-to-speech Voice mail 1200- to 33 600-bps modems Adaptive equalizers AD PCM transcoders Cellular telephone s Channel multiplexing Data encryption Digital PBXs Digital speech interpolation (OSI ) DTM F encoding/decoding Echo cancellation Faxing Une repeaters Personal communications systems (PCS) Person al digital assistants (POA) Speakerphones Spread spectrum communications Video conferencing X.25 packet switching 4 26 Block Diagram Figure 2- 1. Block Diagram of TMS320C54x Internal Hardware System control ..-interf ace Progra m address generation log ic (PAGEN) Dat a address generat ion logic (DAGEN) ARAUO, ARAUl A~7 C PAB PB '- ~ I PC , IPTR, RC. BR C.~REA t I I - AAP. BK. DP. SP V'-"\. ~ Memory and external inl elface L -- - - -I-- - - -+--I-+-+-- - - +--I--I-+-i- - - - - - - - --1 1 I I '----- -,-- - +-+-+-+-- - - +-f-f-++----- - - ---t CAB CB '-------+--- -+--+-+ - + - - - - -+-+-oH------ - - --t '------+..,--+-+-+-+------:H -++-- -'T- -....,.---1 I I I DAB DB '------++-- --:H--+-- - - - -f-++---+----+- --1 '-----,---+-+_r---+_+-------+---1----1~--r-+or_--t o Peripheral inte rface EAB I EB '----+---+-+_I----+---------1----1I-~-++_--t I '---111T;====~-----__t-ltlTIr---+---+-+-t-f EXP encoder x '\. MUX / or I t I AB I T T ,.,gister .IY S Barrel shiller I I MSW/lSW select I ZERO SAT I ROUNO I 5 27 A rchitectural Overview The central componen ts are: 1) Program Bus (PB) 2) Data Buses (CB, DB, EB) 3) Address Buses (PAB, CAB, DAB, EAB) 4) Internal Memory 5) Central Processing Unit (CPU) with Arithmetic Logic Unit (ALU) 6) P rogram Address Generation Logic (PAGEN) 7) Data Address Generation Logic (DAGEN) 8) On-Chip Peripherals 9) System Control Interface Bus Structure 1) The program bus (PH) carries instruction code and immediate data from program memory. 2) Three data buses (CB, DB, EB) interconnect to various elements su ch as the CPU . Table 2 -1. Bus Usage for Read and Write Accesses Address Bus Access Type Program read Program write Data single read Data dual read Data long (32-bit) read Data single write . Data readldata write Dual ·readlcoefficient read Peripheral read Peripheral write. ..J PA B CAB DAB EAB -PB D ata Bus CB DB EB " ..J ..J ..J " (hw) ..J ..J(lw) ,J ..J ,J(hw) " ,J(lw) " " ,J ,J ,J ,J " ..J " ..J ..J " ,J Central Processing Unit (CPU) The '54x Central Processing Unit (CPU) contains th e following parts : 1) 40-bit arithmetic logic unit (ALU) 2) Two 40-bit accumulators (A and B) 3) Barrel shifter 4) 17 x 17-bit multiplier 5) 40-bit adder 6) Compare, select, and store unit (CSSU) 7) Data address generation unit (DAGEN) 8) Pro gram add ress generation unit (PAGEN) 6 28 Figure 4-4. ALU Functional Diagram C615-CBO T 061 5- 0 60 Shifte r output (40) 40 40 SXM SXM OVM C16 C 1 - - - + OVA/OVa ZNZB TC Legend: A Accumulator A B Accumulator B C CBdata bus o DBdata bus M MACunit S Barrel shlfter T T register U ALU MAC output I I I Arithmetic Logic Unit (ALU) The ALU uses one of the following inputs: 1) 16-bit immediate value 2) 16-bit word from data memory 3) 16-bit value in the t emporary register T 4) two 16-bit words from data memory 5) 32-bit word from data memory 6) 40-bit word from either accumulator A or B ALU inputs takes several forms from several sources. The X input source to the ALU is either of two values: 1) The shifter output (32-bit or 16-bit data-memory operand or shifted accumulator value) 2) A data-memory operand from data bus DB The Y input source to the ALU is any of three values: 1) The value in one of the accumulators (A or B) 2) A data-memory operand from data bus CB 3) T he value in the T regist er 7 29 30 D igital versus Analog Signal P rocessing Digital signal processing is an area that has develo ped r apidly over t he pas t 30 years. This rapid developm ent is a r esult of the significant advances in digital computer technology and integrated-circuit fabrication . The digital computers of 30 years ago were relatively large and expensive, and as a consequence, their use was limited to general-purpose , non-real-time (off-line) computations. The rapid development in integrated-circuit technology from small, to medium, large, and now, very-large-scale integration (VLSI) of electronic circuits has realized a development of powerful, smaller, faster and cheaper digital computers and special-purpose digital hardware. These inexpensive and relatively fast digital circuits have made it possible to construct highly sophisticated digital systems capable of performing complex digital signal processing functions and tasks, which are usually too difficult and/or too expensive to be performed by analog signal processing systems. Therefore, many of the signal processing tasks that were conventionally performed by analog processing are realized today by less expensive and often more reli a b le digital hard ware. Not only do digital circuits yield cheaper and more reliable systems for sign al processing, they have other advantages as well. In particular, di gi tal processing hardware allows programmable operations. T hrough software, one can m ore easily modify the signal processing functions to be performed by the hardware. Thus digital hardware and associated software provide a greater degree of flexibility in system design. Also, there is often a higher order of precisio n achievable with digital hardware and software compared with analog circui ts and analog signal processing. . Note that digital signal processing is not the proper so lu tion for every signal processing problem. For example, for signals with extremely wide bandwidths tha t are r equir ed to be processed in real-time , digital circuits have insufficient speed to perform. the signal processing. Another drawback of digital processing of analog signals is that it always requires a DA-converter to start with. Conversion of an analog signal to digital form , accomplished by sampling the analog signal and qu antizing the samples, results in a distorti on of the signal that p reven ts us from reconstructing the original analog signal from the q uant ized samples. In summary: The advantages of using digital signal processing over analog signal processing are: 1) Less expensive, and often fewer components 2) More reliable and deterministic perform an ce 3) More flexible and easy adjustable by reprogramming 4) Higher precision (the more bits you use to represent the signals, t he higher accuracy you get) 5) IDgher noise immuni ty 6) Wider range of appli cations (like adaptive signal processing, in whi ch t he coefficients are adjusted automatically). The dis-advantages of usin g digital signal processing over analog signal processing are: 1) Limited band wid t h (to prevent aliasing the bandwidth is limited to the folding frequency Fs/2) 2) Quantization of signals is necessary (and can b e consi d ered as an addit ional no ise source) 3) Due to quantization and overflow a stable system can become unstable and creat e oscillations (which are called limit cycles and overflow oscillations, respectively) Hardware versus So ftware D igi tal Signal Processing Implementation Digital signal processing can be performed on special-p urpose hardware (called digital signal processor (DSP». or caD be imp lemented in software on general-purpose hard ware (call ed central processing units (CPU» . Implementing a digital signal processing algorithm on special-purpose hardware (DSP) provides the exact same results as that on general-purpose hardware (CPU), assuming that they use the same n umb er of bits and format to represent the digital signals. So what are the differences between DSP and CPU? For one thing, a DSP chip is less expensive than a CPU, because the DSP has a more-limited instruction set (RISC). This means t hat the architecture of a DSP is simpler an d it can run £aster. RuJming at a high er speed, a DSP can perform r eal- time signal processing in applicat ions upto higher frequencies. ( _ The p recision of a DSP is determined by the number of bi ts used to represent the digi tar s ignals. The chips which we use in the lab are the Texas Instruments TMS320C542 DSP chips. They have data paths which are 16 bits wide, so they provide a dynamic range of 96 dB. Moreover, in termedia te results are held in 4G-bit accumulators, which provide a dynamic range of 240 dB . Another m ain advantage of DSP over CPU is the amount of parallelism. Each on-chip ex ecut ion uni t of a DSP (which co ntains an arithmetic logic unit (AL U), a program address generation unit ( PAG EN) and a. data address gener atin g unit (DAGEN» , t he memory and peripherals operates independently and in parallel with the other units 2 31 t hrough a sophisticated bus system. The ALU, PAGEN , and DA GEN opera te all in p arallel so that an instruction prefet ch, a multip lication , an ad dit ion, two data moves an d two ad dr esss-poi nter up dates can be executed in a single ins truct ion cycle . This parallelism allows an F ffi filter to be executed in only one cycle per filter tap , the theoretical minimum for a. single-processor architecture. At the same time, the system control interface and the periphery interface can send and/or receive data, On a CPU all of these tasks have to be executed separately. Moreover, the DSPs which we use provide on-chip serial and parallel interfaces which m ake external communication more flexible. Then , our chip has a sop histica ted debugging system that allows sim ple, inexpensive and speed independent access to the internal registers for debugging. It tells application progr ammers exact ly what t he stat us . is within th e registers, memory loca tions, buses , an d th e last few instructions that were ex ecu ted. . Applications Digital signal processing has a wide range of applica t ions. Typ ical areas of app lications are in automotive processing, consumer electronics (such as digi tal T V) , high-speed control, general -purpose signal processing, graphics and image processing, industrial applications, instrumen tation , medical and military electronics, telecommunications an d voice and speech processing. Typical digital signal processing tasks are: 1) Digi tal filtering: Fini te Imp ulse Response (Fffi) and Recursive (1m.) di gital filtering, Matched filters (corre1ators) , Adaptive filters, Equalizers 2) Signal processing: Compression (such as linear predi ctive co ding of speech signals), Expansion , A veraging 3) Data Processing: E n aypt ing /Saambling, Encoding (such as Trellis coding) , Decoding (such as Viterbi decoding) 4) Numeric Processing: Scaler, vector and matrix arithme tic, functional com pu tations (like sin(x), cos(x) , exp(x», nonlin ear functions , pseudo-random number generators 5) Modul ation: Amplit ude, Frequency, Phase 6) Spectral Analysis: Fast Fourier Transform (FFT), Discrete Fourier Transform (D FT) , Sin e/ Cosine Transform , Moving Average (MA) modeling, Auto Regression (AR) modeling, ARMA modeling 1.1.2 Typical Applleatlons for tmt.TMS320 Fam ily Table 1-11sls some typlcsIsppIk:atIons lor the TMS320 C3mlly 0( DSPs. The nc5320 0SPa otrermoreIIdapCabIe IppC'Oadle$ to IraciUonaI clgnaIilfOCe$clog plllbIems such as YOCOCing encffiIIellng ltIan standard m1a OOOSSOi/ opc mIaoc:omputel' ~ They eIso suppoct CIOR"ClIex applicaliocIS that often require mulliple operalJons to be perlonned slnUlaneously. Tabla I-f. TyplcaJApplicatlonsforlhs TMS320DSPs ="Il ColnIIIan ~ IIIlCIIIon tIIMlt hnIfclnIw W8W'1bm generatlon MncIoMlg InICnInlIcUlloft ~~ =~Ill Inwge~"".''''i Inwge.lfwlOllMl1l Robot Wlcln UedIc8I PIIlIem~ Woibtdol. T~ 3 32 A rch itectural Overview We will start with an overview of the arch itectural structure of th e '54x, which comprises th e central processing un it (CPU) , memory, and on-chip peripherals. The '54x D SP uses an advanced modified H arvard architecture that maximizes processing power wit h eight buses. Separate program and data spaces all ow simultaneous access to program instructions and data, providing a hig h degree of parallelism. For example, three reads and one wri te can be performed in a single cycle. Ins t ru ct ions wit h parallel store and application-specific instructions fully util ize this architecture. In addition, data can be transferred between data and program spaces. Such parallelism supports a powerful set of arithmetic, logic, and bit-manipulation operations that can all be performed in a single machine cycle. Also , the 'S4x includes the contr ol mechanisms to manage interrupts, repeated operations, and function calling. Figure 2- 1 shows a functional block diagram of the TMS320CS4x, which in clu d es the principle blocks and bus str ucture. .... -,. De ,-o,=c; IDgIe 8AC,IIlCAoMA -IDgIe f:WlB4l-- r-- t rc:.PnI...c. 1 I ~ I b:E: 10 ".:' - 1 ..... 11 ' T ~ rf I I DPA r· I -- ~~. I Uir.fr._.. . . ¥ .. .- ~I t!~ ~~t ~..'7 -~ ~ A "-'* _ A A cca_ ... .. IlIAC.. u E R_'" -'(4Clt ~ D 118_... I m.o I &1\1' I ftCMID ' . . .. . - . . . : w:.- ..2:... === ~ .ClCIUP" g! _ • - 1--1 . 'T '--"" I E 'L 4 33 T he central components are : 1) Program Bus (PB) 2) Da ta Buses (CB, DB, EB ) 3) Address Buses (PAB, CAB, DAB, EAB) 4) Internal Memory 5) Central Processing Unit ( CP U) wit h Arith m etic Logic Unit (ALU) 6) Program Address Generation Logic (PAGEN) 7) Data Address Generation Logic (DAGEN) . 8) On-Chip Peripherals 9) System Control Interface We will now give a brief descriptions of each of the central components. Bus Strncture The '54x architecture is built around eight major 16-bit buses (four program/ data buses and four address buses): The program bus (PB ) carries the instruction code and immediate operands from program mem ory. Three data buses (CB, DB, E B) interconnect to various elements such as the CP U, data address generation logic (DAGEN), pro gram address generation logic (pAGEN) , on-chip peripherals, and data memory. The CB and D B carry the operands that are read from data m emory, the EB carries the data to be written to memory. Four address buses (pAB, CAB , DAB, and EAB) carry the addresses for instruction execution. The t54x can generate up to two data-memory addresses per cycle using the two a uxiliary register arithmetic units (ARAUO and ARAUI in he data address gen eration logic DAG EN) . The program bus (PB) can carry data operands stored in program space (for instance, a coefficient table) to t h e multiplier and adder for mult iply/accumulate (MAC ) operations or to a d estination in data space for data move instructions (MVPD and READA) . This capability, in conjunction with the feature of dual-operand read, supports the execution of s ingle-cycle 3-operand instructions as the FIRS (linear-phase FIR) instrnction. The '54x also has an on-chip bidirectional bus for accessing on-chip peripherals; this bus is connected to DB and EB through the bus exchanger in the CPU interface. Accesses that use this bus can req uir e two or more cycles for r eads and writes depending on the peripheral's structure. Table 2-1 summarizes the buses used by various types of accesses. Table 2-1. Bus Usaf}6 torRsad and Write AocBsses AddreM Bus DtataBus Acceu 1\'Pe P10grwnIMd PAB CAB DAB EA8 - pe C8 08 ED Pqr.- .... --''** .j '* '* '* -I(lwI o.ta lingle IMd o.ta .... 1Md o.ta long C3Z-tlIl tMd Data lingle .... ., -I(hw) ., ., '*M '* '* '* ., '* '* '* "<hwt o.ta 1'NCI'data .... OUII~tMd '* ~Nad ~ Le!IOftd: .....Ngh'-. .. . .... -, - " " " " . '* '* " 5 34 Central Processing Unit (C P U) T he ' 54.x cen tr 31 Processing Unit (CPU ) con tains the following parts: 1) 4G-bit arithmet ic logic uni t (AL U) 2) Two 4G-bit acc umulators (A and B) 3) Barrel shifter 4) 17 x 17-bit multiplier 5) 4o-bit adder 6) Compare, select, and store unit (CSSU ) . 7) Data address generation unit (DAGEN) 8) P rogr am ad dress gener at ion unit (PAG E N) Ari thm etic Logic U nit (ALU) Figure 4-4. ALU FunctkxJsJ Diagram r--~=-=:=---, C81 ~ T SNlIor oullU (40) OVM C16 ~-~c 1---+ OVNOYB 1---+ VIZB TC ~ A_A ._DDI_ ... MIoW:_ I-I <:<:1_ ... ~U;t- The -to-bit ALU implements a wide range of arithmetic and logical functions, most of which execute in a single clock cycle. After an op eration is p erformed. in the ALU, the result is usnally transferred to a destination accumulator (accumulator A or B). Instructions that perform memory-to-memory operations (ADDM, ANDM, ORM, and XORM) are exceptions. The ALU uses one of the foll owing inputs: I) 16-bit immediate valne 2) 16-bit word from data memory 3) 16-b it value in the temporary r egister T 4) two 16-bit words from data memory 5) 32-bit word from data memory 6) 4o-bit word from either accumulator A or B ALU inpu ts takes several forms &om several sources. The X input source to the ALU is either or two values: / ... -, 1) The shifter output (a 32-bit or 16-bit data--memory op er an d or a shifted accumulator -valu e) , 2) A cfatar.memory operand &om data bus DB The Y input source to the ALU is any of three values: I) The valu e in one of the accumul ators (A or B) 2) A cfatar.memory operand &om data bus CB 3) The valu e in the T register 6 35 W hen a 16- bit data-memory o perand is fed th rough d ata bus CB or DB, th e 40-bit ALU in put is constructed in one of two ways: 1) If bits 15 through 0 cont ai n t he dat a-memory operand , bits 39 t hr ough 16 are zero filled (SXM = 0) or signexten d ed (SXM = 1) 2) If bits 31 through 16 contain the data-memory operand , b its 15 through 0 are zero filled , and bits 39 thro ugh 32 ar e either zer o filled (SXM = 0) or sign extended (SXM = 1). Overflow Handling The ALU saturation logic prevents a result from overflowing by keeping the result at a maximum (positive or negat ive) valu e. This feature is useful for fil ter calculations to preven t overflow oscillation. T he alu sat ur at ion will be applied when the overfl.ow m ode bit (OVM) in s tatus register ST I is set. When a result overflows: If OVM 0, t he accumula tors are load ed with t he r esult witho ut modification. If OVM = I, the accumulators are loaded with either the most positive 32-bit value (00 7F FF FFF Fh) or the m ost negative 32-bit value (FF 8000 OOOO h) , de pending on th e direction of the overflow. = Carry Bi t The ALU has an associa ted carry bit (C) that is affected by m ost arithmetic ALU instruct ions, including ro tate -an d shift operations. The carry bit supports efficient computation of extended-precision arithmet ic operat ions. The carry bit is not affected by loading the accumulator, performing logical operations, or ex ecut ing other nonarit hmet ic or control instructions, so it can be used for overflow management . Dual 16-bit Mode For arithmetic o perations, the ALU can operate in a special dual 16-bit arithmet ic mode that p erforms two 16-bit operations (for instance, two additions or two sub tractions) in one cycle. Accumulators Accumul ators A and B (see Figure 2.1) sto re the output from the ALU or the multip lier / adder block. They can also provide a second inp u t to the ALUj accumulator A can be an input to t he multiplier/adder. FIguro 4-6. Accumufator A 31-16 15-0 1J... a..--Ior~ \ 1\ FIgUfB U . AccvinutatOf 8 31-16 15-0 BL a..--Iorllb 8G _lib Each accumulator is divided into three parts: I) Low-o rder word (bits 0 - 15) 2) High-order word (bits 16 - 31) 3) Guard bits (bits 32 - 39) The guard bits are used as a headmargin for computations. Headmargins allow you to prevent some overflow in iterative coIPputations such as autocorrelation. Instructions are provided for storing the guard bits, for storing the high- and the low-order accumulator words in data memory, and for transferring 32-bit accumulator words in data memory. Also, ei ther of the accumulators can be used as tempo rary storage for the other. ~ The only difference between accumula tors A and B is that bits 32 - 16 of A can be used as an input to the multiplier in the multiplier / add er unit. Storing Accumulator Contents You can store accumulator contents in data memory by using the 8TH, STL, ST LM, and SACen instructio ns or by using parallel-store instructions. 'Ib store the 16 most-significant bi ts (MSBs) of the accumulator in memory with a shift, use the STH, SACCD , and parallel-store instru ctions. For righ t-shift operations, bits from AG and BG shift into AH and BB . For left-s hift operations, bits from AL and BL shift in to AH and BH, r espectively. 7 36 To store the 16 LSBs of t he accumulator in mem ory wit h a shift , use t he ST L instruction . For righ t-s hift opera tions, bits fr om AH and BH shift in to AL and BL, respect ively, an d the least- significant bits (LSBs) are lost. For left-shift operations , the bits in AL and BL are filled with zeros. Since the shift operations are performed in the shifter, the contents of the accumulator remain unchanged. Example 4-3 shows the result of accumulator store operations with shift; it assumes that accumulator A = OFF 4321 1234h. Example 4-3. A.c:aJmulator Store With Shlft snI snI STL STL TDIP - •• TDlP -'.TDa' TDiP : TDlP • 2112h TDIP • FFClh TDU' • 3400h I TDU' • 2 112h I I Sat ur at ion Up on Accumulator Sto re The d ata in an accumulat or can be saturated before storing it in memory. T he sat ur at ion is performed after the shift operation . The following steps are performed when saturating upon accumulator store: 1) The 4o-bit data is shifted (right or left) dep en din g on the instruction. Th e shift is the same as describ ed before and depends on the value of the SXM bit . 2) The 4o-b it value is saturated to a 32-bit value. The saturation de pen ds on the value of the SXM bit (it SXM=O the number is always assumed to be positive): IT SXM = 0 then 7FFF FFFFh is gen erated if the 4o-bit value is gr eater th an or equal to 7FFF FFFFh. IT SXM = 1 then 7FFF FFFFh is generated if the 4o-bit value is greater than or equal to 7F FF FFFFh, and 8000 OOOOh is generated if the 4o-bit value is less than 8000 OOOOh. 3) The data is stored in memory depending on the instruction (ei ther 16-bit LSB , 16-bit MSB , or 32-bit data) . The accumulator remains unchanged during the process. Application-Specific Instructions Each accumulator is dedicated to specific operations in application-specific instructions with parallel o perations. These include symmetrical Fffi filter operations using Fm.S instruction, adaptive filter operations using the LMS instruction, Euclidean distance calculations using the SQnST instruction, and other parallel operations: 1) FIRS performs operations for symmetric FIR filters by using multiply/accumulates (M ACs) in parallel with additions. 2) LMS performs a MAC-and a parallel add with rounding to efficiently up da te the coefficients in an Fffi filter. 3) SQnST performs a MAC and a subtraction in parallel to calculate Euclid ean distance.. Barrel Shifter FIglJ('B 4-7. Bam1I ShlftM FunctIonaJ Diagram OB15-080 . - -.......----.,l...&.-_ T : - 11 '-tgh 31 mnoe TC (leIC blQ ---+-f '---..,.--',..-- ASM<4-ClI : -111vough 151Wlge hough 15 ot Ohough 15 1Wlge 1nslNdIonNgisl« ~ -1 6 8 37 T he barrel shifter is used for scaling opera t ions such as : 1) Prescaling an input data-memory operand or the accumulator value before an ALU o peration 2) Performing a logical or arithmetical shift of the accumulator value 3) No rmalizing the accumulator 4) PostscaJing the accumulator before storing the accumulator value ito data memory The 40-bit shifter is connec ted as follows: T he input is connected to: 1) DB for a 16-bi t data input operand . 2) DB and C B for a 32-bit data input operan d 3) Eith er one of t he two 40-b it accum ulators A an d B The outpu t is connected to: 1) O ne of the ALU inp uts 2) The E B bus through the MSW /LSW write select unit The barrel shifter can produce a left shift of 0 to 31 bits and a right shift of 0 to 16 bits on the input data. The shift requirement are defined in the shift count field of the instruction , the shift count field (ASM) of status re gister STl , or in the tempo rary register T (when it is desi gnated as a shift count register). The shift count determines how many bits to shift. P osi tive shift values correspond to left shifts, whereas negative values correspond to right shifts. The shift co unt is specified. as a two's-complement value in several ways, depending on the instruction type. An immediate operand, the accumulator shift mo de (ASM) field of STl , or T can be used to define the shift count: a It. .. or ~ mme<Iafe wIuo s:pec:llied ... the opetand 01an lnslnIcfion ~ ADD II shlft count value... the -16 to 15 range. For example: _ A. -( .11 : Add aecwaulator A lright-ehifted : ( bitel to aeeuaulator II : lone word . one cycle •• SFTL A• • e : Shift (logical! aCCUDl1ator A e ight : bits l e f t 10000000000 M. one cycle l a The ASM vaJuerepresents II shift count value ... lhe - 16 to 15 range and can lie loaded by the LD klst:vc:Iion (wlCh an Immediate opocand orwith II data-mem<K)' opef3Ild). For e~ ADD A. ASH. II : Add ~lator A to e~tor II : with a shift specified by ASH The b arrel shifter and the exponent encod er normalizes the values in an accumulator in a single cycle. The leastsignificant bits (LSBs) of the output are filled with Os, and the most-significant bits (MSBs) can be either zero filled or sign extended (all ones for negative numbers) depending on the state of the sign-extension mode bit (SXM) in STl. Additional shift capabilit ies enable t he processor to perform numerical scaling, bit extraction, extended arithmetic and overfiow prevention operations. Figure 4-8. Mu1t/pli6tfAdd« Func6on8J DIagram Multiplier/Ad der Unit C815-eeo DB15-060 17 '"-"" _ A A e ee _ _ DCII _ _ T T ...... .-. ,"- - OVA/OW 9 lAIZB 38 The mu ltip lier/adder unit p erforms 17 x 17 b it two' &-<:O mpl em ent mu ltipl icatio n wit h a 4G-bit additi on in a sing le ins t ruct ion cycle. Th e multi plier/adder block consists of several elements: a multiplier, an adder , signed/unsigned in put contro l logic, fract ion al contro l logic, a zero detector, a rounder (using two's complement arithmetic), overflow/saturation logic, and a l&-bit temporary storage register (T). The multiplier has two inputs: one input is selected from T, a dat a-memo ry operand D or accumula tor Ai the other is selected from program memory P, data memory D, accumulator A, or an immedi a te value C. T he fas t , on-chip multiplier allows the '54x to perform operations efficiently such as convolution, correlation, and filterin g. In addition, the multiplier and ALU together execute multiply/accumulate (MAC) computations and ALU . operations in parallel in a single instructio n cycle. This function is used in determining the Euclidian distance an d in im plem ent ing sym met rical an d adaptive leas t- m ean-squar e (LMS) fil ters, w hich are req uired for complex DS P algori thms . The multiplier out pu t can be s hift ed left by one bit to compensate for the extra sign bit generated by multiplying two 16-bit two's-complement n umb ers in fractional mode. The adder in t he m ul tiplier/ adder uni t contains a zero detector, a rounder (in two's complement), and overflow/ saturation logic. Roun ding consists of adding 2 15 to the result and then clearing the lower 16 bits of the destinat ion accumula tor. Rounding is performed in some multiply, multiply/accumulate (MAC), and mult iply/ subtract (MAS ) instructions wh en the suffic R is included with the instruction. The LMS instruction also rounds to minim ize quant izat ion errors in updated coefficients. Mul t iplier Input Sources The XM input source to the multiplier is any of the following values: 1) The temporary register (T) 2) A data-memory operand from data bus DB 3) Accumulator A bits 32 - 16 The YM input source to the multiplier is any of the following values: 1) A data-memory operand from data bus DB 2) A data-memory operand from data bus CB 3) A program-memory operand from program bus PB 4) Accu m ulator A bits 32 to 16 Table 4-5 shows how t he multiplier inputs are obtained for several instructions, There are a total of nine combinatioDS of multiplier inputs th at are act u ally used. Table 4-6. MuitiprliN ft¥xit S61ec1ion lor &wetallnsiructJons x MultIpleUC' T MPY 11Z34t1. A 0AA2,A DB A P SCSDBA 2 MPYtRI :s 4 MPYA 8 MACP 0AA2, pmad, A MPY 0AA2,0AR3.8 5 fi SOUR °AA2, 8 MPYA 0Nt2 FIRS 0AA2,0AR3.pmad " ., 1 I II sOuR .... S For instructions using T as one inpu t, the second input may be obtained as an immediate value from data m emory via a data bus (DB) , or from accumula tor A. For instructions using single data-memory oper and addressing, one operand is fed inw...t h e multiplier via DB. The second operand m ay come from T , as an imm ediate value or from program memo ry via P B , or from accumulator A. For instructions using d ual data- m emory oper and addressing, DB and CB carry the data into the multiplier. Multip ly / Accumulate (MAC) Instructions MAC instructions use the mUlti plier's computational bandwidth to simultaneously process two operands. Multip le arithmet ic oper ations can be p erformed in a single cycle by the multiplier/adder unit. 10 39 C ompare, Select , and Store U nit (CSSU) Flf}UfI1 4-9. Compare. SeIod, and Store Unit (CSSU) From aocurrdaIoc A From~B -----, r - B" 16 EBI5-EBO cssu The compare, select , and store unit (CSS U) perfOI'ID3 maximum comparisons between the accumulator's high and low word, and selects the larger word in the accumulator to store into data memory. D a ta Addressing The '54x offers seven basic data addressing modes (which will be explained in more detail): 1) Immediate addressing uses the instruction to encode a fixed value. 2) Absolute addressing uses the instruction to encode a fixed address. 3) Accumulator addressing uses accumulator A to access a location in program memory as data. 4) 'Direct addressing uses seven bits of the instruction to encode the lower seven bits of an address. The seven bits are used with the data pointer (DP) or the stade pointer (SP) to determine the actual memory address. 5) Indirect addressing uses the auxiliary registers to access memory. 6) Memory-mapped register addressing uses the memory-mapped register without modifying either the current data page pointer (DP) value of the current stack pointer (SP) value. 7) Stack addressing manages adding and removing items from the system stack. During the execution of instructions using direct, indirect, or memory-mapped register addressing, the data-address generation logic (DAGEN) computes the address of data-memory operands. Immediate Addressing In immediate addressing, the instruction syntax contains the specific value of the operand. The syntax for immediate addressing uses a number sign (#) immediately preceding the value or symbol to indicate that it is an immediate value. For example, to load accumulator A with the value 80 in hexadecimal, you would write: LD #80h, A Absolute Addressing There are .. types of absolu te addressing: 1) Data-memory address addressing (dmad): MVDK, MVDM , MVKD, MVMD 2) Program-memory address addressing (pmad): FIRS, MACD, MAOP, MVDP, MVPD 3) Port address addressing (pa): PORTa, PORTW 4) *(lk) addressing To copy the value contained at the address labeled SAMPLE to the memory location pointed to by AR5, you write: MVKD SAMPLE, • AR5 To copy a word in the program-memory location lab eled TABLE to a data-memory1ocation specified by AR7, you ~ write: MVPD TABLE, • AR7 To copy a value from the I/ O port at port address F IFO to a data-memo ry location opecified by AR5, you write: PORTR FIFO , *AR5 To load accumulato r A with the value contained in address BUFFER in data space, you write: LD ·(B UFF ER ), A 11 40 A ccumulator Addressing Accumu lator addressing uses the accumulator as an address . This addressing mode is used to address program memory as data. READA transfers a word from a program-memory location specified by accumulator A, and WRITEA transfers a word from a data-memory location to a program-memory location specified by the accumulator A. Direct Ad dr essing In direct addressing mode, the instruction contains t he lower se ven bits of the data-memory (dma). The 7-bit dma is an address offset that is combined with a base ad dress, with the data-page pointer (DP), or with the stack pointer . (SP) to form a 16-bit data-memory address. Using this form of addressing, you can access any of 128 locations in ran dom order without changing the DP or the SP. The syntax for direct addressing uses a symbol or num ber to specify the offset value. For example, t o add the contents of the memory location SAMPLE to accumulato r B, provided that the correct base address is in D P or SP, you would write: ADD SAMPLE, B Indirect Addressing In in direct address in g, any location in the 64lc-word data space can be accessed via a 16-bit address contained in an auxiliary r egister. The '54x has eight 16-bit a uxiliary registers (ARO-AR7). Indirect addressing is used mainly when there is a need to step throu gh sequenti al locati ons in memory in fixed- size steps. When memory is addressed with indirect addressing, the auxiliary register and the address can be optionally modified by a decrement , an in crement, an offset , or an index. Special m odes offer circular and b it-reversed addressing. Indirect addressing is flexible enough not only to r ead or wri t e a single 16-b it d ata operand from memory with one instruction, but also to access two data memory locations with one instruction . Access of two d at a-memory locations include r eads of two independent m emory locations, r eads an d writes of two consecu ti ve memory locat ions, and a read of one memory location combined with a wri te to a m em ory location. Single-Operand Address Modifications You can modify t he address you use in instructions before or after they are accessed, or yo u can leave them unchanged. You can modify them by incrementing or decrementing the address by I , adding a 16-bit offset, or indexing with the value in ARO. These three types of action combined with taking the action either before or after the access, plus the ways of leaving the address unchanged make a total of 16 addressing types, each assigned to a value of MOD, the 4-bit modification field in the encoding of an instruction using indirect addressing. Table 5-4 lists the types of single data-m emory operand addressing, along with t he value o f MOD, the assembler syntax, and the fu n ction for each type. Table 5-4. IndirfIct AddntssIn{J Twes With • SIngle Data-MemotY Operand MOO FWcI OOOO (lll GOOf (1) 0010(2) 0011 (3) 0100(4) 0101 (5) 0110 (6) 0111 (7) 1000(8) 1001 (II) 0penncI SrnCfx 0 1&. FtInctktn Mi'_ARx Mi' - ARx ARx . ARx - 1 Mi'.ARx ARK• ARK+ 1 Mi'.ARx+1 ARK. ARK+ 1 Mi'.ARK ARx.1l(ARx - N'IJ) 8ddt . ARK ARK• ARK- N'IJ 8ddt . ARK ARK• ARK+ N'IJ 8ddt.1AJ( ARK• Il(ARx • N'IJ) 8ddt. ARK ARK. dre(ARx - 1) 8ddt.ARK ARK• dr«AAx- N'IJ ) DMcrfpIIont ARx ClOnlMw lie data _ , ~ °ARx_ °ARK. 0+AAx 0Ntll.-06 "ARx-G NIt« - . 1Ie 8dchuhNtll. II dea ......... * NIM .-ss, lie 8dchu h Ntll. II Rnlmellled.* The adciress kl Ntll. II ~ before Is_lf' NIM - . N'IJ II ~ Iroc!' Ntll. ... _ tMIY (~ propegIIIolL NIM 8CIOeSS. N'IJ Is Nllradad from ARx. "ARK.O °ARK. oe "ARK-% °ARK -O% Moe eocess. N'IJ Is ~ 10ARK. NIM - . N'lJ ll eddedIoARx" ' _ClII1Y (re) propagdon. dn:Uar.~* NIM .-ss, N'IJ ~ from ARK wlIh dR:ulat 8dlhsslng. NIM-.IIe~klARx Il ~ '" 12 41 P rogr am Mem or y Ad dr essing Progr am memory is usual ly addressed on a 'S4x device with the program counter (P C) . Wi th so m e instructions, however, absolute addressing may be used to access data items t hat have been stored in program memory. Th e pr ogram co unter (PC), which is used to fetch individual instructions, is located by the program-address generation logic (PAGEN). Typically, the PAG EN increments the PC as sequential instructions are fetched. However, the PAG E N may load the PC wi t h a non-sequential value as a result of rome instructions or other operations . Operations t hat cause a discontinuity include bran ches , calls, returns, conditional operations, single-instruction repeats , multipleins t ru ction rep ea ts , reset, and interrupts . For calls and interrupts , the current program counter (PC ) is saved onto . t h e stack, whi ch is referenced by the stack pointer (SP) . When the called function or in terrup t servi ce rou t ine is finished , th e P C value th at was saved is restored from t he stack via a return instructio n. P ipeline Operatio n ExampI6 7-2. Branch Instruction In th6 Pipeline .',82 a3 a4 Addreu InaCrucllon Db1 13 14 11 An instruction pipeline consists of a sequence of operations that occur during the execution of an instruction . The '54x pipeline has six levels: 1) Program prefetch. Program. address bus (PAB) is loaded with the address of the next instruction to be fetched. 2) Program fetch. An instruction word is fetched from the program bus (PB) and loaded into the instruction register (ffi). This completes an instruction fetch sequence that consists of this and t he previous cycle. 3) Decode. The contents of the instruction r egister (ffi) are decoded to determine the type of memory access operation and the control sequence at the data-address generation unit (DAGEN) and the CPU. 4) Aa:ess. DAG EN outputs the read operand's address on the data address bus DAB . If a second operand is required, the o ther data address bus CAB is also loaded with an approp riate address. AuxiJiar.y registers in indirect addressing mode and the stack pointer (SP) are also up dated. This is considered the first of the 2-stage operand read sequence. 5) React The read data operand{s) , if any, are read from the data buses, DB and 00. This completes the two-stage operand read sequence. At the same time, the two-stage operand write sequence begins. The data address of the write operand, if any, is loaded into the data write address bus (EAB). For memory-m ap p ed registers ,·the read data operand is r ead from memory and written into the selected memory-mapped registers using the DB . 6) Execute. The operand write sequence is completed by writing the data using the data write bus (EB ). T he instruction is executed in this phase. At each level, an independ ent operation occurs . Because these operations are independent, from one to six ins t ruct ions can be active in any given cycle , each instruction at a differ ent stage of completion. Typically, the pip elin e is full with a sequential set of instructions , each at one of the six stages. When a program counter (PC) discontin ui ty occurs , such as during a b ranch , call , or return , one or more stages of the pipeline may be temporarily unused. 14 42 O n-Chip Peripherals T he '54x device has these on-chip peri ph eral opt ions : 1) General-purpose I/O pins 2) Software-p ro gramma ble wait- state generator 3) Programmable blank-switch logic 4) Host port interface (HP I) 5) H ardware timer 6) Clock generator 7) Serial p orts (Synchronous serial ports, buffered serial ports . time-division multiplexed (TDM) serial ports ) External Bus Interface The '54x can address up to 64K words of dat a mem ory, 64K words of program memory. and up to 64K words of 16-bi t parallel I/ O ports . Accesses to eit her exter nal memory of I/ O ports take place thr ough the external interface. In dividual space-select signals (DS, PS , and IS) allow selection of physical seperate spaces. End of Lecture 1 In this first lecture I have only discussed the architecture of the TMS320C54x in tern al hardware. Next time I will discuss the pro gramming of the DSP chip. Be aware th at our DSP chip is implemented on a modular board which includes an AID (analog-to-digital) and a DIA (digital-to-analog) converter. With this board we can receive analog signals from a wave generator, digitally process them, and create an analog output signal which we can display on an oscilloscop e. Moreover, the board is co nnected to a PC for easy programming and debugging of the DSP chip. In the next lecture we will discuss this topic in more detail. This completes the first lecture. I will now hand out the first homework assignment, which is due next week. 15 43 44 ...
View Full Document

This note was uploaded on 11/06/2010 for the course EE 113 taught by Professor Walker during the Spring '08 term at UCLA.

Ask a homework question - tutors are online