ARM.SoC.Architecture

4 on page 81 are the three source operand read ports

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ne hazards' on page 22.) Forwarding paths allow results to be passed between stages as soon as they are available, and the 5-stage ARM pipeline requires each of the three source operands to be forwarded from any of three intermediate result registers as shown in Figure 4.4 on page 81. There is one case where, even with forwarding, it is not possible to avoid a pipeline stall. Consider the following code sequence: LDR rN, [ . . ] ADD r2, r1, rN ; load rN from somewhere ; and use it immediately The processor cannot avoid a one-cycle stall as the value loaded into rN only enters the processor at the end of the buffer/data stage and it is needed by the following instruction at the start of the execute stage. The only way to avoid this stall is to encourage the compiler (or assembly language programmer) not to put a dependent instruction immediately after a load instruction. Since the 3-stage pipeline ARM cores are not adversely affected by this code sequence, existing ARM programs will often use it. Such programs will run correctly 5-stage pipeline ARM organization 81 Figure 4.4 ARM9TDMI 5-stage pipeline organization. on 5-stage ARM cores, but could probably be rewritten to run faster by simply reordering the instructions to remove these dependencies. PC generation The behaviour of r15, as seen by the programmer and described in 'PC behaviour' on page 78, is based on the operational characteristics of the 3-stage ARM pipeline. The 5-stage pipeline reads the instruction operands one stage earlier in the pipeline, and would naturally get a different value (PC+4 rather than PC+8). As this would 82 ARM Organization and Implementation lead to unacceptable code incompatibilities, however, the 5-stage pipeline ARMs all 'emulate' the behaviour of the older 3-stage designs. Referring to Figure 4.4, the incremented PC value from the fetch stage is fed directly to the register file in the decode stage, bypassing the pipeline register between the two stages. PC+4 for the next instruction is equal to PC+8 for the current...
View Full Document

Ask a homework question - tutors are online