Unformatted text preview: lculation. Pipeline features Of particular note in this pipeline structure are: The need for three register read ports to enable register-controlled shifts and store with base plus index addressing to issue in one cycle. The need for two register write ports to allow load with auto-index to issue in one cycle. The address incrementer in the execute stage to support load and store multiple instructions. The large number of sources for the next PC value. This last point reflects the many ways the ARM architecture allows the PC to be modified as a result of its visibility as r15 in the register bank. PC modification The most common source of the next PC is the PC incrementer in the fetch stage; this value is available at the start of the next cycle, allowing one instruction to be fetched each cycle. The next most common source of the PC is the result of a branch instruction. The dedicated branch displacement adder computes the target address during the instruction decode stage, causing a taken branch to incur a one-cycle penalty in addition to the cycle taken to execute the branch itself. Note that the displacement addition can take place in parallel with the instruction decode since the offset is a fixed field within the instruction; if the instruction turns out not to be a branch the computed target is simply discarded. 330 ARM CPU Cores Figure 12.8 StrongARM core pipeline organization. The StrongARM SA-110 331 Figure 12.9 StrongARM loop test pipeline behaviour. The operation of the pipeline during this sequence is shown in Figure 12.9. Note how the condition codes become valid just in time to avoid increasing the branch penalty; the instruction immediately following the branch is fetched and discarded, but then the processor begins fetching from the branch target. The target address is generated concurrently with the condition codes that determine whether or not it should be used. The same one cycle penalty applies to branch and link instructions which take the branch in the same way but use the execute and write stages to compute pc + 4 and write it to r14, the...
View Full Document
This document was uploaded on 10/30/2011 for the course CSE 378 380 at SUNY Buffalo.
- Spring '09