ece475-l16 - ECE 475/CS 416 Computer Architecture -...

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon
1 ECE 475/CS 416 Computer Architecture - Exploiting ILP: Software Approaches Edward Suh C omputer S ystems L aboratory suh@csl.cornell.edu ECE 475/CS 416 — Computer Architecture, Fall 2007 Prof. Suh Announcements
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 ECE 475/CS 416 — Computer Architecture, Fall 2007 Prof. Suh Review ± In order to achieve CPI < 1 (or IPC > 1), a processor must be able to issue multiple instructions per cycle ± Two approaches for multiple issue processors: • Superscalar processors – Hazard detection in HW • Very Long Instruction Word (VLIW) processors – Hazard detection in SW (compiler) ECE 475/CS 416 — Computer Architecture, Fall 2007 Prof. Suh Little’s Law Parallelism = Throughput * Latency or Latency in Cycles Throughput per Cycle One Operation L T N ± =
Background image of page 2
3 ECE 475/CS 416 — Computer Architecture, Fall 2007 Prof. Suh Example ± How much instruction-level parallelism (ILP) required to keep machine pipelines busy? One Pipeline Stage Two Integer Units, Single Cycle Latency Two Load/Store Units, Three Cycle Latency Two Floating-Point Units, Four Cycle Latency Max Throughput, Six Instructions per Cycle Latency in Cycles ECE 475/CS 416 — Computer Architecture, Fall 2007 Prof. Suh Basic Pipeline Scheduling ± Goal • find independent instructions that can be overlapped in the pipeline • separate dependent instructions by distance in clock cycles equal to the latency of the source instruction ± Assumptions • basic five-stage pipeline • one-cycle delayed branches • fully pipelined/replicated FUs Loop: fld f0,0($1) stall fadd f4,f0,f2 stall stall fst f4,0($1) addi $1,$1,-8 bne $1,$2,Loop stall 9 cycles/element plain
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
4 ECE 475/CS 416 — Computer Architecture, Fall 2007 Prof. Suh Basic Pipeline Scheduling ± Goal • find independent instructions that can be overlapped in the pipeline • separate dependent instructions by distance in clock cycles equal to the latency of the source instruction ± Assumptions • basic five-stage pipeline • one-cycle delayed branches • fully pipelined/replicated FUs Loop: fld f0,0($1) addi $1,$1,-8 fadd f4,f0,f2 stall bne $1,$2,Loop fst f4, 8 ($1) 6 cycles/element scheduled ECE 475/CS 416 — Computer Architecture, Fall 2007 Prof. Suh Loop Unrolling ± Replicate loop body several times ± Use different regs in each iteration ± – loop #1 (original) executes n%k times – loop #2 (unrolled) executes n/k times ± Advantages • more ILP (scheduling) • fewer overhead instructions ± Disadvantages • code size (increased i-cache misses) • register pressure Loop: fld f0,0($1) fld f6,-8($1) fadd f4,f0,f2 fadd f8,f6,f2 addi $1,$1,-16 fst f6,16($1) bne $1,$2,Loop fst f8,8($1) k=2 4 cycles/element scheduled and unrolled Needs to prove iterations are independent
Background image of page 4
5 ECE 475/CS 416 — Computer Architecture, Fall 2007 Prof. Suh Loop Unrolling – Superscalar ± Unroll more aggressively to amortize loop overhead • greater register pressure ± Balance integer and floating-point operations • possibly across original loop iterations ± Example: two-issue processor • LD/ST/Branch/Int ALU (1 cycle) • FPU (3 cycles) ECE 475/CS 416 — Computer Architecture, Fall 2007 Prof. Suh
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 6
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 02/19/2008 for the course ECE 4750 taught by Professor Suh during the Fall '07 term at Cornell University (Engineering School).

Page1 / 17

ece475-l16 - ECE 475/CS 416 Computer Architecture -...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online