Lecture 4 - Wide Issue

Lecture 4 - Wide Issue - ECE565 Computer Architecture...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ECE565: Computer Architecture Instructor: Vijay S. Pai Fall 2011 Course administration: via Blackboard 1 ECE 565, Fal 2011 ECE 565, Fal 2011 2 Lots of Parallelism… • Last unit: pipeline-level parallelism • Work on execute of one instruction in parallel with decode of next • Next: instruction-level parallelism (ILP) • Execute multiple independent instructions fully in parallel • Today: limited multiple issue • Next Week: dynamic scheduling • Extract much more ILP via out-of-order processing • Data-level parallelism (DLP) • Single-instruction, multiple data • Example: one instruction, four 16-bit adds (using 64-bit registers) • Thread-level parallelism (TLP) • Multiple software threads running on multiple processors ECE 565, Fal 2011 3 This Unit: Multiple Issue/Static Scheduling • Multiple issue scaling problems • Dependence-checks • Bypassing • Multiple issue designs • Statically-scheduled superscalar • VLIW/EPIC (IA64) • Advanced static scheduling • Advanced hardware technique • Grid processor Application OS Firmware Compiler CPU I/O Memory Digital Circuits Gates & Transistors ECE 565, Fal 2011 4 Scalar Pipeline and the Flynn Bottleneck • Scalar pipelines • One instruction per stage – Performance limit (aka “Flynn Bottleneck”) is CPI = IPC = 1 – Limit is never even achieved (hazards) – Diminishing returns from “super-pipelining” (hazards + overhead) regfile D$ I$ B P ECE 565, Fal 2011 5 Multiple-Issue Pipeline • Overcome this limit using multiple issue • Also sometimes called superscalar • Two instructions per stage at once, or three, or four, or eight… • “Instruction-Level Parallelism (ILP)” [Fisher] regfile D$ I$ B P ECE 565, Fal 2011 6 Superscalar Execution Single-issue 1 2 3 4 5 6 7 8 9 10 11 12 ld [r1+0] r2 F D X M W ld [r1+4] r3 F D X M W ld [r1+8] r4 F D X M W ld [r1+12] r5 F D X M W add r2,r3 r6 F D X M W add r4,r6 r7 F D X M W add r5,r7 r8 F D X M W ld [r8] r9 F D X M W M W d* X D F add r5,r7 r8 W M X D F ld [r1+12] r5 W M X D F ld [r1+8] r4 W M X D F ld [r1+4] r3 11 8 9 10 6 7 W 5 M 4 X D F ld [r1+0] r2 3 2 1 Dual-issue M W X D F add r2,r3 r6 W X M d* D F add r4,r6 r7 W M X d* D s* F ld [r8] r9 ECE 565, Fal 2011 7 Superscalar Challenges - Front End • Wide instruction fetch • Modest: need multiple instructions per cycle • Aggressive: predict multiple branches, trace cache • Wide instruction decode • Replicate decoders • Wide instruction issue • Determine when instructions can proceed in parallel • Not all combinations possible • More complex stall logic - order N 2 for N-wide machine • Wide register read • One port for each register read • Example, 4-wide superscalar >= 8 read ports ECE 565, Fal 2011 8 Superscalar Challenges - Back End • Wide instruction execution • Replicate arithmetic units • Multiple cache ports • Wide instruction register writeback • One write port per instruction that writes a register...
View Full Document

This note was uploaded on 01/10/2012 for the course ECE 565 taught by Professor Pai during the Fall '11 term at Purdue.

Page1 / 28

Lecture 4 - Wide Issue - ECE565 Computer Architecture...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online