{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

+BranchPredictionTechniques - G22.2243-001 High Performance...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon
G22.2243-001 High Performance Computer Architecture Lecture 4 Branch Prediction Scoreboarding September 28, 2004
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
9/28/2004 2 Outline Announcements Assignment 1 due back October 5th Reducing the impact of control hazards Branch prediction buffers Branch target buffers Return address stacks Reducing the impact of data hazards Scoreboarding [ Hennessy/Patterson CA:AQA (3rd Edition): Appendix A, Chapter 3]
Background image of page 2
9/28/2004 3 (Review) Reducing Branch Stalls Baseline: Stall pipeline on a branch instruction 3 cycle stall on 5-stage RISC pipeline (2 if not taken) Faster stall pipeline Move branch direction/target determination to ID stage 1 cycle stall Predict branch not taken , squash instructions as necessary 0 cycle stall on not-taken branches, 1 cycle stall for taken Predict branch taken , squash instructions as necessary Need target address, so 1-cycle stall Can reduce to 0 with additional hardware: B ranch T arget B uffer Branch delay slots Instruction after branch always executes 0 cycle stall if useful instruction can be found (from above/below branch) Else, 1-cycle stall (effectively)
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
9/28/2004 4 Evaluating Branch Alternatives Assumptions 14% of instructions are branches 30% of branches are not taken 50% of delay slots can be filled with useful instructions Scheduling Branch CPI speedup v. speedup v. scheme penalty unpipelined stall Slow stall pipeline 3 1.42 3.5 1.0 Fast stall pipeline 1 1.14 4.4 1.26 Predict taken 1 1.14 4.4 1.26 Predict not taken 0.7 1.10 4.5 1.29 Delayed branch 0.5 1.07 4.7 1.34 A compiler can reorder instructions to further improve speedup Pipeline speedup = Pipeline depth 1 +Branch frequency × Branch penalty
Background image of page 4
9/28/2004 5 Importance of Avoiding Branch Stalls Crucial in modern microprocessors, which issue/execute multiple instructions every cycle Need to have a steady stream of instructions to keep the hardware busy Stalls due to control hazards dominate So far, we have looked at static schemes for reducing branch penalties Same scheme applies to every branch instruction Potential for increased benefits from dynamic schemes Can choose most appropriate scheme separately for each instruction Branches to top of loop have different behavior ( T aken) than “if (x == 0) return;” ( N ot T aken) Can “learn” appropriate scheme based on observed behavior Dynamic (hardware) branch prediction schemes For both direction (T or NT) and target prediction Key element of all modern microprocessors
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
9/28/2004 6 Dynamic Branch Prediction (1): Branch Prediction (History) Buffer Small memory indexed by the low-order bits of the branch instruction Stores a single bit of information: T or NT Starts off as T, flips whenever a branch behaves opposite to prediction For now, assume supported in the ID stage Benefits for larger pipelines, more complex branches Problems with this simple scheme Prediction value may not correspond to branch being considered
Background image of page 6
Image of page 7
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}