08_scheduling

08_scheduling - CIS 501 Computer Architecture Unit 8 Static...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CIS 501 Computer Architecture Unit 8: Static and Dynamic Scheduling CIS 501 (Martin): Scheduling Slides originally developed by Drew Hilton, Amir Roth and Milo Martin at University of Pennsylvania 1 CIS 501 (Martin): Scheduling 2 This Unit: Static & Dynamic Scheduling • Code scheduling • To reduce pipeline stalls • To increase ILP (insn level parallelism) • Static scheduling by the compiler • Approach & limitations • Dynamic scheduling in hardware • Register renaming • Instruction selection • Handling memory operations CPU Mem I/O System software App App App CIS 501 (Martin): Scheduling 3 Readings • Textbook (MA:FSPTCM) • Sections 3.3.1 – 3.3.4 (but not “Sidebar:”) • Sections 5.0-5.2, 5.3.3, 5.4, 5.5 • Paper to read for in-class discussion: • “The MIPS R10000 Superscalar Microprocessor” by Kenneth Yeager • Paper for group discussion and questions: • “Memory Dependence Prediction using Store Sets” by Chrysos & Emer Code Scheduling & Limitations CIS 501 (Martin): Scheduling 4 Code Scheduling • Scheduling: act of finding independent instructions • “Static” done at compile time by the compiler (software) • “Dynamic” done at runtime by the processor (hardware) • Why schedule code? • Scalar pipelines: fill in load-to-use delay slots to improve CPI • Superscalar: place independent instructions together • As above, load-to-use delay slots • Allow multiple-issue decode logic to let them execute at the same time CIS 501 (Martin): Scheduling 5 CIS 501 (Martin): Scheduling 6 Compiler Scheduling • Compiler can schedule (move) instructions to reduce stalls • Basic pipeline scheduling : eliminate back-to-back load-use pairs • Example code sequence: a = b + c; d = f – e; • sp stack pointer, sp+0 is “a”, sp+4 is “b”, etc… Before ld r2,4(sp) ld r3 ,8(sp) add r3 ,r2,r1 //stall st r1,0(sp) ld r5,16(sp) ld r6 ,20(sp) sub r5, r6 ,r4 //stall st r4,12(sp) After ld r2,4(sp) ld r3 ,8(sp) ld r5,16(sp) add r3 ,r2,r1 //no stall ld r6 ,20(sp) st r1,0(sp) sub r5, r6 ,r4 //no stall st r4,12(sp) CIS 501 (Martin): Scheduling 7 Compiler Scheduling Requires • Large scheduling scope • Independent instruction to put between load-use pairs + Original example: large scope, two independent computations – This example: small scope, one computation • One way to create larger scheduling scopes? • Loop unrolling Before ld r2,4(sp) ld r3 ,8(sp) add r3 ,r2,r1 //stall st r1,0(sp) After ld r2,4(sp) ld r3 ,8(sp) add r3 ,r2,r1 //stall st r1,0(sp) CIS 501 (Martin): Scheduling Scheduling Scope Limited by Branches r1 and r2 are inputs loop: jz r1, not_found ld [r1+0] -> r3 sub r2, r3 -> r4 jz r4, found ld [r1+4] -> r1 jmp loop Legal to move load up past branch?...
View Full Document

This note was uploaded on 12/06/2011 for the course CIS 501 taught by Professor Milom during the Fall '08 term at UPenn.

Page1 / 171

08_scheduling - CIS 501 Computer Architecture Unit 8 Static...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online