Lecture 4 - Wide Issue

Lecture 4 - Wide Issue - ECE565: Computer Architecture...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ECE565: Computer Architecture Instructor: Vijay S. Pai Fall 2011 Course administration: via Blackboard 1 ECE 565, Fal 2011 ECE 565, Fal 2011 2 Lots of Parallelism Last unit: pipeline-level parallelism Work on execute of one instruction in parallel with decode of next Next: instruction-level parallelism (ILP) Execute multiple independent instructions fully in parallel Today: limited multiple issue Next Week: dynamic scheduling Extract much more ILP via out-of-order processing Data-level parallelism (DLP) Single-instruction, multiple data Example: one instruction, four 16-bit adds (using 64-bit registers) Thread-level parallelism (TLP) Multiple software threads running on multiple processors ECE 565, Fal 2011 3 This Unit: Multiple Issue/Static Scheduling Multiple issue scaling problems Dependence-checks Bypassing Multiple issue designs Statically-scheduled superscalar VLIW/EPIC (IA64) Advanced static scheduling Advanced hardware technique Grid processor Application OS Firmware Compiler CPU I/O Memory Digital Circuits Gates & Transistors ECE 565, Fal 2011 4 Scalar Pipeline and the Flynn Bottleneck Scalar pipelines One instruction per stage Performance limit (aka Flynn Bottleneck) is CPI = IPC = 1 Limit is never even achieved (hazards) Diminishing returns from super-pipelining (hazards + overhead) regfile D$ I$ B P ECE 565, Fal 2011 5 Multiple-Issue Pipeline Overcome this limit using multiple issue Also sometimes called superscalar Two instructions per stage at once, or three, or four, or eight Instruction-Level Parallelism (ILP) [Fisher] regfile D$ I$ B P ECE 565, Fal 2011 6 Superscalar Execution Single-issue 1 2 3 4 5 6 7 8 9 10 11 12 ld [r1+0] r2 F D X M W ld [r1+4] r3 F D X M W ld [r1+8] r4 F D X M W ld [r1+12] r5 F D X M W add r2,r3 r6 F D X M W add r4,r6 r7 F D X M W add r5,r7 r8 F D X M W ld [r8] r9 F D X M W M W d* X D F add r5,r7 r8 W M X D F ld [r1+12] r5 W M X D F ld [r1+8] r4 W M X D F ld [r1+4] r3 11 8 9 10 6 7 W 5 M 4 X D F ld [r1+0] r2 3 2 1 Dual-issue M W X D F add r2,r3 r6 W X M d* D F add r4,r6 r7 W M X d* D s* F ld [r8] r9 ECE 565, Fal 2011 7 Superscalar Challenges - Front End Wide instruction fetch Modest: need multiple instructions per cycle Aggressive: predict multiple branches, trace cache Wide instruction decode Replicate decoders Wide instruction issue Determine when instructions can proceed in parallel Not all combinations possible More complex stall logic - order N 2 for N-wide machine Wide register read One port for each register read Example, 4-wide superscalar >= 8 read ports ECE 565, Fal 2011 8 Superscalar Challenges - Back End Wide instruction execution Replicate arithmetic units Multiple cache ports Wide instruction register writeback One write port per instruction that writes a register...
View Full Document

Page1 / 28

Lecture 4 - Wide Issue - ECE565: Computer Architecture...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online