Unformatted text preview: P routines are dominated by multiply-accumulate computations. A code sequence might be:
initialize get next data value and next coefficient accumulate next term decrement loop counter loop if not finished Example 13.1 LOOP LDR LDR MLA SUBS BNE rO, [r3], #4 rl, [r4], #4 r5, rO, rl, r5 r2, r2, #1 LOOP This loop requires seven instruction fetches (including two in the branch shadow) and two data fetches. On a standard ARM core the multiply takes a data-dependent number of computation cycles, evaluating two bits per cycle. If we assume 16-bit 372 Embedded ARM Applications coefficients and order the multiplication so that the coefficient determines the number of cycles, the multiply will require eight internal cycles. Each load also requires an internal cycle. If the data and coefficients are always in on-chip memory, the loop takes 19 clock cycles if executed from on-chip RAM, with 14 additional wait state cycles if executed from two wait state off-chip RAM. The use of on-chip RAM therefore speeds the loop up by around 75%. Note that this speed-up is a lot less than the factor three that might be expected from simply comparing memory access speeds. Exercise 13.1.1 Estimate the power saving that results from using the on-chip RAM. (Assume that on-chip accesses cost 2 nJ and off-chip accesses 10 nJ.) Repeat the estimates assuming the external memory is restricted to!6 bits wide. Repeat the estimates for both 32- and 16-bit external memory assuming that the processor supports Thumb mode (and the program is rewritten to use Thumb instructions) and high-speed multiplication. Survey the designs presented in this chapter and summarize the power-saving techniques employed on the system chips described here. Firstly, all the system chips display a high level of integration which saves power whenever two modules on the same chip exchange data. All the chips are based around an ARM core which is, itself, very power-efficient. Though none of the chips has enough on-chip memory to remove...
View Full Document
This document was uploaded on 10/30/2011 for the course CSE 378 380 at SUNY Buffalo.
- Spring '09