08_scheduling-4up

08_scheduling-4up - This Unit: Static & Dynamic Scheduling...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
CIS 501 Computer Architecture Unit 8: Static and Dynamic Scheduling CIS 501 (Martin): Scheduling Slides originally developed by Drew Hilton, Amir Roth and Milo Martin at University of Pennsylvania 1 CIS 501 (Martin): Scheduling 2 This Unit: Static & Dynamic Scheduling • Pipelining and superscalar review • Code scheduling • To reduce pipeline stalls • To increase ILP (insn level parallelism) • Two approaches • Static scheduling by the compiler • Dynamic scheduling by the hardware CPU Mem I/O System software App App App CIS 501 (Martin): Scheduling 3 Readings • Textbook (MA:FSPTCM) Sections 3.3.1 – 3.3.4 (but not “Sidebar:”) Sections 5.0-5.2, 5.3.3, 5.4, 5.5 • Paper • “Memory Dependence Prediction using Store Sets” by Chrysos & Emer Pipelining Review • Increases clock frequency by staging instruction execution • “Scalar” pipelines have a best-case CPI of 1 • Challenges: • Data and control dependencies further worsen CPI • Data: With full bypassing, load-to-use stalls • Control: use branch prediction to mitigate penalty • Big win, done by all processors today • How many stages (depth)? • Five stages is pretty good minimum • Intel Pentium II/III: 12 stages • Intel Pentium 4: 22+ stages • Intel Core 2: 14 stages CIS 501 (Martin): Scheduling 4
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
CIS 501 (Martin): Scheduling 5 Pipeline Diagram • Use compiler scheduling to reduce load-use stall frequency • “ d* ” is data dependency, “ s* ” is structural hazard, p* ” is propagation hazard (only n instructions per stage) 1 2 3 4 5 6 7 8 9 add $3 ! $2,$1 F D X M W lw $4 ! 4($3) F D X M W addi $6 ! $4 ,1 F D d* X M W sub $8 ! $3,$1 F p* D X M W 1 2 3 4 5 6 7 8 9 add $3 ! $2,$1 F D X M W lw $4 ! 4($3) F D X M W sub $8 ! $3,$1 F D X M W addi $6 ! $4 ,1 F D X M W Superscalar Pipeline Review • Execute two or more instruction per cycle • Challenges: • wide fetch (branch prediction harder, misprediction more costly) • wide decode (stall logic) • wide execute (more ALUs) • wide bypassing (more possibly bypassing paths) • Finding enough independent instructions (and fill delay slots) • How many instructions per cycle max (width)? • Really simple, low-power cores are still single-issue (most ARMs) • Even low-power cores a dual-issue (ARM A8, Intel Atom) • Most desktop/laptop chips three-issue or four-issue (Core i7) • A few 5 or 6-issue chips have been built (IBM Power4, Itanium II) CIS 501 (Martin): Scheduling 6 CIS 501 (Martin): Scheduling 7 Superscalar Pipeline Diagrams - Ideal scalar 1 2 3 4 5 6 7 8 9 10 11 12 lw 0(r1) " r2 F D X M W lw 4(r1) " r3 F D X M W lw 8(r1) " r4 F D X M W add r14,r15 " r6 F D X M W add r12,r13 " r7 F D X M W add r17,r16 " r8 F D X M W lw 0(r18) " r9 F D X M W 2-way superscalar 1 2 3 4 5 6 7 8 9 10 11 12 lw 0(r1) " r2 F D X M W lw 4(r1) " r3 F D X M W lw 8(r1) " r4 F D X M W add r14,r15 " r6 F D X M W add r12,r13 " r7 F D X M W add r17,r16 " r8 F D X M W lw 0(r18) " r9 F D X M W CIS 501 (Martin): Scheduling 8 Superscalar Pipeline Diagrams - Realistic scalar 1 2 3 4 5 6 7 8 9 10 11 12 lw 0(r1) " r2 F D X M W lw 4(r1) " r3 F D X M W lw 8(r1) " r4 F D X M W add r4 ,r5 " r6 F D d* X M W add r2,r3 " r7 F p* D X M W add r7,r6 " r8 F D
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 10/19/2011 for the course CS 501 taught by Professor Matin during the Fall '10 term at UPenn.

Page1 / 44

08_scheduling-4up - This Unit: Static & Dynamic Scheduling...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online