07_superscalar

07_superscalar - This Unit: Superscalar Execution App App...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
CIS 501 (Martin): Superscalar 1 CIS 501 Computer Architecture Unit 7: Superscalar Slides developed by Milo Martin & Amir Roth at the University of Pennsylvania with sources that included University of Wisconsin slides by Mark Hill, Guri Sohi, Jim Smith, and David Wood. This Unit: Superscalar Execution • Idea of instruction-level parallelism • Superscalar scaling issues • Multiple fetch and branch prediction • Dependence-checks & stall logic • Wide bypassing • Register file & cache bandwidth • “Superscalar” vs VLIW/EPIC CIS 501 (Martin): Superscalar 2 CPU Mem I/O System software App App App Readings • Textbook (MA:FSPTCM) Sections 3.1, 3.2 (but not “Sidebar” in 3.2), 3.5.1 Sections 4.2, 4.3, 5.3.3 CIS 501 (Martin): Superscalar 3 CIS 371 (Martin): Superscalar 4 A Key Theme of CIS 501: Parallelism • Previously: pipeline-level parallelism • Work on execute of one instruction in parallel with decode of next • Next: instruction-level parallelism (ILP) • Execute multiple independent instructions fully in parallel • Today: multiple issue • Later: • Static & dynamic scheduling • Extract much more ILP • Data-level parallelism (DLP) • Single-instruction, multiple data (one insn., four 64-bit adds) • Thread-level parallelism (TLP) • Multiple software threads running on multiple cores
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
CIS 501 (Martin): Superscalar 5 Scalar Pipeline and the Flynn Bottleneck So far we have looked at scalar pipelines One instruction per stage With control speculation, bypassing, etc. Performance limit (aka “Flynn Bottleneck”) is CPI = IPC = 1 Limit is never even achieved (hazards) Diminishing returns from “super-pipelining” (hazards + overhead) regfile D$ I$ B P CIS 501 (Martin): Superscalar 6 Multiple-Issue Pipeline Overcome this limit using multiple issue Also called superscalar Two instructions per stage at once, or three, or four, or eight… “Instruction-Level Parallelism (ILP)” [Fisher, IEEE TC’81] • Today, typically “4-wide” (Intel Core i7, AMD Opteron) Some more (Power5 is 5-issue; Itanium is 6-issue) Some less (dual-issue is common for simple cores) regfile D$ I$ B P CIS 501 (Martin): Superscalar 7 A Typical Dual-Issue Pipeline Fetch an entire 16B or 32B cache block 4 to 8 instructions (assuming 4-byte average instruction length) Predict a single branch per cycle Parallel decode Need to check for conflicting instructions Output of I 1 is an input to I 2 Other stalls, too (for example, load-use delay) regfile D$ I$ B P CIS 501 (Martin): Superscalar 8 A Typical Dual-Issue Pipeline Multi-ported register file Larger area, latency, power, cost, complexity Multiple execution units Simple adders are easy, but bypass paths are expensive Memory unit Single load per cycle (stall at decode) probably okay for dual issue Alternative: add a read port to data cache Larger area, latency, power, cost, complexity regfile D$ I$ B P
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 13

07_superscalar - This Unit: Superscalar Execution App App...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online