In addition a number of organizational features were

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: orrect, exactly the right instruction sequence is fetched from memory; if it turns out to be wrong and the branch should not have AMULET2 383 been taken, the branch is executed as an 'unbranch' instruction to return to the previous sequential flow. Branch Statistics The effectiveness of the jump trace buffer depends on the statistics of typical branch behaviour. Typical figures are shown in Table 14.2. Table 14.2 Prediction algorithm fetches Sequential Trace buffer Correct 33% 67% AMULET2 branch prediction statistics. Incorrect 67% Redundant 2 per branch 1 per h( ) b (ave.) 33% In the absence of the jump trace buffer, the default sequential fetch pattern is equivalent to predicting that all branches are not taken. This is correct for one-third of all branches, and incorrect for the remaining two-thirds. A jump trace buffer with around 20 entries reverses these proportions, correctly predicting around two-thirds of all branches and mispredicting or failing to predict around one-third. Although the depth of prefetching beyond a branch is non-deterministic, a prefetch depth of around three instructions is observed on AMULET2. The fetched instructions are used when the branch is predicted correctly, but are discarded when the branch is mispredicted or not predicted. The jump trace buffer therefore reduces the average number of redundant fetches per branch from two to one. Since branches occur around once every five instructions in typical code, the jump trace buffer may be expected to reduce the instruction fetch bandwidth by around 20% and the total memory bandwidth by 10% to 15%. Where the system performance is limited by the available memory bandwidth, this saving translates directly into improved performance; in any case it represents a power saving due to the elimination of redundant activity. 'Halt' Unlike some other microprocessors, the ARM does not have an explicit 'Halt' instruction. Instead, when a program can find no more useful work to do it usually enters an idle loop, executing: B . ; loop until interrupted Here the '.' denotes the current PC, so the branch target is the branch instruction itself, and the progra...
View Full Document

This document was uploaded on 10/30/2011 for the course CSE 378 380 at SUNY Buffalo.

Ask a homework question - tutors are online