Reducing the cpi again repeating the argument

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: val of the second word by half a clock cycle allows a 32-bit bus to be used and can save area since routing a 32-bit bus requires less area than routing a 64-bit bus. The ARMS processor consists of a prefetch unit and an integer datapath. The prefetch unit is responsible for fetching instructions from memory and buffering them (in order to exploit the double-bandwidth memory). It then supplies up to one instruction per clock cycle to the integer unit, along with its PC value. The prefetch unit is responsible for branch prediction and uses static prediction based on the branch direction (backwards branches are predicted 'taken', whereas forwards branches are predicted 'not taken') to attempt to guess where the instruction stream will go; the integer unit will compute the exact stream and issue corrections to the prefetch unit where necessary. The overall organization of the core is shown in Figure 9.5 on page 258. The double-bandwidth memory will normally be on-chip, in the form of a cache memory on a general-purpose device such as the ARMS 10 (described in Section 12.2 on page 323) or as addressable RAM in an embedded application. The remaining memory may comprise conventional, single-bandwidth devices, but without some fast memory in the system the ARMS core will show little benefit over the simpler ARM cores. The processor uses a 5-stage pipeline with the prefetch unit occupying the first stage and the integer unit using the remaining four stages: 1. Instruction prefetch. 2. Instruction decode and register read. 3. Execute (shift and ALU). 4. Data memory access. 5. Write-back results. Core organization Pipeline organization 258 ARM Processor Cores Figure 9.5 ARMS processor core organization. Integer unit organization The organization of the ARMS integer unit is illustrated in Figure 9.6 on page 259. The instruction stream and associated PC values are supplied by the prefetch unit through the interface shown at the top of the figure; the system control coprocessor connects through the dedicated coprocessor...
View Full Document

This document was uploaded on 10/30/2011 for the course CSE 378 380 at SUNY Buffalo.

Ask a homework question - tutors are online