Since load multiple instructions transfer word

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: (Clocks Per Instruction - a measure of how much work a processor does in a clock cycle). Improved performance The rationale which leads from a requirement for higher performance to the need to increase the number of pipeline stages from three (as in the ARM7TDMI) to five, and to change the memory interface to employ separate instruction and data memories, was discussed in Section 4.2 on page 78. The 5-stage ARM9TDMI pipeline owes a lot to the StrongARM pipeline described in Section 12.3 on page 327. (The StrongARM has not been included as a core in this chapter as it has limited applicability as a stand-alone core.) The ARM9TDMI core organization was illustrated in Figure 4.4 on page 81. The principal difference between the ARM9TDMI and the StrongARM core (illustrated in Figure 12.8 on page 330) is that while StrongARM has a dedicated branch adder which operates in parallel with the register read stage, ARM9TDMI uses the main ALU for branch target calculations. This gives ARM9TDMI an additional clock cycle penalty for a taken branch, but results in a smaller and simpler core, and avoids a very critical timing path (illustrated in Figure 12.9 on page 331) which is present in the StrongARM design. The StrongARM was designed for a particular process technology where this timing path could be carefully managed, whereas the ARM9TDMI is required to be readily portable to new processes where such a critical path could easily compromise the maximum usable clock rate. The operation of the 5-stage ARM9TDMI pipeline is illustrated in Figure 9.7 on page 261, where it is compared with the 3-stage ARM7TDMI pipeline. The figure shows how the major processing functions of the processor are redistributed across the additional pipeline stages in order to allow the clock frequency to be doubled (approximately) on the same process technology. Redistributing the execution functions (register read, shift, ALU, register write) is not all that is required to achieve this higher clock rate. The processor must also be able to access the instruction memory in h...
View Full Document

Ask a homework question - tutors are online