Tomasulo Scheduling

Computer Organization and Design: The Hardware/Software Interface

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
CS152 Computer Architecture and Engineering Lecture 15 Dynamic Scheduling: Tomasulo March 29, 2004 John Kubiatowicz ( lecture slides: 3/29/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec15.2 Review: Compiler techniques for parallelism ° Loop unrolling Multiple iterations of loop in software: Amortizes loop overhead over several iterations Gives more opportunity for scheduling around stalls ° Software Pipelining Take one instruction from each of several iterations of the loop Software overlapping of loop iterations Today will show hardware overlapping of loop iterations ° Very Long Instruction Word machines (VLIW) Multiple operations coded in single, long instruction Requires sophisticated compiler to decide which operations can be done in parallel Trace scheduling find common path and schedule code as if branches didn’t exist (+ add “fixup code”) ° All of these require additional registers 3/29/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec15.3 Review: Software Pipelining ° Observation: if iterations from loops are independent, then can get more ILP by taking instructions from different iterations ° Software pipelining: reorganizes loops so that each iteration is made from instructions chosen from different iterations of the original loop (- Tomasulo in SW) Iteration 0 Iteration 1 Iteration 2 Iteration 3 Iteration 4 Software- pipelined iteration 3/29/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec15.4 Before: Unrolled 3 times 1 LD F0,0(R1) 2 ADDD F4,F0,F2 3 SD 0(R1),F4 4 LD F6,-8(R1) 5 ADDD F8,F6,F2 6 SD -8(R1),F8 7 LD F10,-16(R1) 8 ADDD F12,F10,F2 9 SD -16(R1),F12 10 SUBI R1,R1, #24 11 BNEZ R1,LOOP After: Software Pipelined 1 SD 0(R1),F4 ; Stores M[i] 2 ADDD F4,F0,F2 ; Adds to M[i-1] 3 LD F0,-16(R1); Loads M[i-2] 4 SUBI R1,R1, #8 5 BNEZ R1,LOOP Symbolic Loop Unrolling Maximize result-use distance Less code space than unrolling Fill & drain pipe only once per loop vs. once per each unrolled iteration in loop unrolling SW Pipeline Loop Unrolled overlapped ops Time Time Review: Software Pipelining Example
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
3/29/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec15.5 Review: Software Pipelining with Loop Unrolling in VLIW Memory Memory FP FP Int. op/ Clock reference 1 reference 2 operation 1 op. 2 branch LD F0,-48(R1) ST 0(R1),F4 ADDD F4,F0,F2 1 LD F6,-56(R1) ST -8(R1),F8 ADDD F8,F6,F2 SUBI R1,R1,#24 2 LD F10,-40(R1) ST 8(R1),F12 ADDD F12,F10,F2 BNEZ R1,LOOP 3 ° Software pipelined across 9 iterations of original loop In each iteration of above loop, we: - Store to m,m-8,m-16 (iterations I-3,I-2,I-1) - Compute for m-24,m-32,m-40 (iterations I,I+1,I+2) - Load from m-48,m-56,m-64 (iterations I+3,I+4,I+5) ° 9 results in 9 cycles, or 1 clock per iteration ° Average: 3.3 ops per clock, 66% efficiency Note: Need less registers for software pipelining (only using 7 registers here, was using 15) 3/29/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec15.6 Review: advanced pipelining issues ° How do we prevent WAR and WAW hazards?
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 01/29/2008 for the course CS 152 taught by Professor Kubiatowicz during the Spring '04 term at Berkeley.

Page1 / 19

Tomasulo Scheduling - Review: Compiler techniques for...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online