The pc path which forwards pc 4 from the fetch stage

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: link register. It also applies to the normal subroutine return instruction, MOV pc, lr, where the target address comes from the register file rather than from the branch displacement adder, but it is still available at the end of the decode stage. If the return address must be computed, for example when returning from an exception, there is a two-cycle penalty since the ALU result is only available at the end of the execute stage, and where the PC is loaded from memory (from a jump table, or subroutine return from stack) there is a three-cycle penalty. Forwarding paths The execution pipeline includes three forwarding paths to each register operand to avoid stalls when read-after-write hazards occur. Values are forwarded from: 1. the ALU result; 2. data loaded from the data cache; 3. the buffered ALU result. These paths remove all data-dependency stalls except when a loaded data value is used by the following instruction, in which case a single cycle stall is required. 332 ARM CPU Cores Abort recovery It might seem that one of these paths could be avoided if the ALU result were written into the register file in the following stage rather than buffering it and delaying the write to the last stage. The merit of the delayed scheme is that data aborts can occur during the data cache access, perhaps requiring remedial action to recover the base register value (for instance when the base register is overwritten in a load multiple sequence before the fault is generated). This scheme not only allows recovery but supports the cleanest recovery mechanism, leaving the base register value as it was at the start of the instruction and removing the need for the exception handler to undo any auto-indexing. A feature of particular note is the StrongARM's multiplication unit which, despite the processor's high clock rate, computes at the rate of 12 bits per cycle, giving the product of two 32-bit operands in one to three clock cycles. The high-speed multiplier gives StrongARM considerable potential in applications which require significant digital signal processing performance....
View Full Document

Ask a homework question - tutors are online