+BranchPredictionTechniques

+BranchPredictionTechniques - G22.2243-001 High Performance...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon
G22.2243-001 High Performance Computer Architecture Lecture 4 Branch Prediction Scoreboarding September 28, 2004
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
9/28/2004 2 Outline Announcements – Assignment 1 due back October 5th Reducing the impact of control hazards – Branch prediction buffers – Branch target buffers – Return address stacks Reducing the impact of data hazards – Scoreboarding [ Hennessy/Patterson CA:AQA (3rd Edition): Appendix A, Chapter 3]
Background image of page 2
9/28/2004 3 (Review) Reducing Branch Stalls Baseline: Stall pipeline on a branch instruction – 3 cycle stall on 5-stage RISC pipeline (2 if not taken) •F a s t e r stall pipeline – Move branch direction/target determination to ID stage – 1 cycle stall Predict branch not taken , squash instructions as necessary – 0 cycle stall on not-taken branches, 1 cycle stall for taken Predict branch taken , squash instructions as necessary – Need target address, so 1-cycle stall – Can reduce to 0 with additional hardware: B ranch T arget B uffer Branch delay slots – Instruction after branch always executes – 0 cycle stall if useful instruction can be found (from above/below branch) – Else, 1-cycle stall (effectively)
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
9/28/2004 4 Evaluating Branch Alternatives Assumptions – 14% of instructions are branches – 30% of branches are not taken – 50% of delay slots can be filled with useful instructions Scheduling Branch CPI speedup v. speedup v. scheme penalty unpipelined stall Slow stall pipeline 3 1.42 3.5 1.0 Fast stall pipeline 1 1.14 4.4 1.26 Predict taken 1 1.14 4.4 1.26 Predict not taken 0.7 1.10 4.5 1.29 Delayed branch 0.5 1.07 4.7 1.34 A compiler can reorder instructions to further improve speedup Pipeline speedup = Pipeline depth 1 +Branch frequency × Branch penalty
Background image of page 4
9/28/2004 5 Importance of Avoiding Branch Stalls Crucial in modern microprocessors, which issue/execute multiple instructions every cycle – Need to have a steady stream of instructions to keep the hardware busy – Stalls due to control hazards dominate So far, we have looked at static schemes for reducing branch penalties – Same scheme applies to every branch instruction Potential for increased benefits from dynamic schemes – Can choose most appropriate scheme separately for each instruction • Branches to top of loop have different behavior ( T aken) than “if (x == 0) return;” ( N ot T aken) – Can “learn” appropriate scheme based on observed behavior – Dynamic (hardware) branch prediction schemes • For both direction (T or NT) and target prediction • Key element of all modern microprocessors
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
9/28/2004 6 Dynamic Branch Prediction (1): Branch Prediction (History) Buffer Small memory indexed by the low-order bits of the branch instruction –S t o r e s a single bit of information: T or NT • Starts off as T, flips whenever a branch behaves opposite to prediction – For now, assume supported in the ID stage • Benefits for larger pipelines, more complex branches Problems with this simple scheme – Prediction value may not correspond to branch being considered
Background image of page 6
Image of page 7
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 51

+BranchPredictionTechniques - G22.2243-001 High Performance...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online