2 - Advanced Pipelining

2 - Advanced Pipelining - Module 3 Introduction to Advanced...

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon
1 Module 3 Introduction to Advanced Pipelining
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 Review: Evaluating Branch Alternatives Scheduling Branch CPI speedup v. speedup v. scheme penalty unpipelined stall Stall pipeline 3 1.42 3.5 1.0 Predict taken 1 1.14 4.4 1.26 Predict not taken 1 1.09 4.5 1.29 Delayed branch 0.5 1.07 4.6 1.31 Pipeline speedup = Pipeline depth 1 +Branch frequency Branch penalty Two part solution: Determine branch taken or not sooner, AND Compute taken branch address earlier
Background image of page 2
3 Review: Evaluating Branch Prediction Two strategies Backward branch predict taken, forward branch not taken Profile-based prediction: record branch behavior, predict branch based on prior run “Instructions between mispredicted branches” a better metric than misprediction Instructions per mispredicted branch 1 10 100 1000 10000 100000 alvinn compress doduc espresso gcc hydro2d mdljsp2 ora swm256 tomcatv Profile-based Direction-based
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
4 Review: Summary of Pipelining Basics Hazards limit performance Structural: need more HW resources Data: need forwarding, compiler scheduling Increasing length of pipe increases impact of hazards; pipelining helps instruction bandwidth, not latency Interrupts, Instruction Set, FP makes pipelining harder Compilers reduce cost of data and control hazards Load delay slots Branch delay slots Branch prediction Today: Longer pipelines (R4000) => Better branch prediction, more instruction parallelism?
Background image of page 4
Case Study: MIPS R4000 (200 MHz) 8 Stage Pipeline: IF–first half of fetching of instruction; PC selection happens here as well as initiation of instruction cache access. IS–second half of access to instruction cache. RF–instruction decode and register fetch, hazard checking and also instruction cache hit detection. EX–execution, which includes effective address calculation, ALU operation, and branch target computation and condition evaluation. DF–data fetch, first half of access to data cache. DS–second half of access to data cache. TC–tag check, determine whether the data cache access hit. WB–write back for loads and register-register operations.
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 6
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 01/21/2012 for the course CSCI 593 taught by Professor Hamnes during the Spring '11 term at St. Cloud.

Page1 / 23

2 - Advanced Pipelining - Module 3 Introduction to Advanced...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online