The execute stage uses a combination of improved

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ycle, so the ARM9TDMI moved to a Harvard memory organization to release more bandwidth. The ARM9TDMI uses its instruction memory on (almost) every clock cycle. Although its data memory is only around 50% loaded, it is hard to exploit this to improve its CPI. The instruction bandwidth must be increased somehow. The approach adopted in the ARMIOTDMI is to employ 64-bit memories. This effectively removes the instruction bandwidth bottleneck and enables a number of CPI-improving features to be added to the processor organization: Branch prediction: the ARMIOTDMI branch prediction logic goes beyond what is required simply to maintain the pipeline efficiency as discussed above. Because instructions are fetched at a rate of two per clock cycle, the branch pre diction unit (which is in the fetch pipeline stage) can often recognize branches before they are issued and effectively remove them from the instruction stream, reducing the cycle cost of the branch to zero. The ARMIOTDMI employs a static branch prediction mechanism: conditional branches that branch backwards are predicted to be taken; those that branch forwards are predicted not to be taken. Non-blocking load and store execution: a load or store instruction that cannot complete in a single memory cycle, either because it is referencing slow memory or because it is transferring multiple registers, does not stall the execution pipe line until an operand dependency arises. The 64-bit data memory enables load and store multiple instructions to transfer two registers in each clock cycle. The non-blocking load and store logic requires independent register file read and write ports, and the 64-bit load and store multiple instructions require two of each. The ARMIOTDMI register bank therefore has four read ports and three write ports. Taken together these features enable the ARMIOTDMI to achieve a dhrystone 2.1 MIPS per MHz figure of 1.25, which may be compared with 0.9 for the ARM7TDMI and 1.1 for the ARM9TDMI. These figures are a direct reflection of the respectiv...
View Full Document

This document was uploaded on 10/30/2011 for the course CSE 378 380 at SUNY Buffalo.

Ask a homework question - tutors are online