Compiler Optimizations (continued), Dynamic Scheduling

Computer Organization and Design: The Hardware/Software Interface

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
CS152 Computer Architecture and Engineering Lecture 14 Static Scheduling (Continued) Dynamic Scheduling: Scoreboards March 17 th , 2003 John Kubiatowicz (www.cs.berkeley.edu/~kubitron) lecture slides: http://inst.eecs.berkeley.edu/~cs152/ 3/17/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec14.2 ° The Five Classic Components of a Computer ° Today’s Topics: Recap last lecture/Review Scoreboard Administrivia Tomasulo scheduling algorithm Tomasulo loop unrolling The Big Picture: Where are We Now? Control Datapath Memory Processor Input Output 3/17/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec14.3 Recall: Can we somehow make CPI closer to 1? Let’s assume full pipelining: Possible delay slots around a 4-cycle multiply instruction: multf $F0 ,$F2,$F4 multf $F0 , $F2, $F4 ld $F0 ,0($r5) delay-1 delay-1 delay-1 delay-2 delay-2 multf $F4, $F0 ,$F3 delay-3 sw $F0 , 4($R2) addf $F6,$F10, $F0 Fetch Decode Ex1 Ex2 Ex3 Ex4 WB multf delay1 delay2 delay3 addf Earliest forwarding for 4-cycle instructions Earliest forwarding for 1-cycle instructions 3/17/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec14.4 Recall: Revised FP Loop Minimizing Stalls 6 clocks: CPI = 6/5 = 1.2) Instruction Execute Instruction Use Latency producing result Latency using result in cycles FP ALU op 4 Another FP ALU op 3 FP ALU op 4 Store double 2 Load double 2 FP ALU op 1 1 Loop: LD F0 ,0(R1) 2 stall 3 ADDD F4 , F0 ,F2 4 SUBI R1,R1,8 5 BNEZ R1,Loop ;delayed branch 6 SD 8 (R1), F4 ;altered when move past SUBI Swap BNEZ and SD by changing address of SD Unroll loop 4 times code to make faster?
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
3/17/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec14.5 ° What assumptions made when moved code? OK to move store past SUBI even though changes register OK to move loads before stores: get right data? When is it safe for compiler to do such changes? 1 Loop:LD F0,0(R1) 2 LD F6,-8(R1) 3 LD F10,-16(R1) 4 LD F14,-24(R1) 5 ADDD F4,F0,F2 6 ADDD F8,F6,F2 7 ADDD F12,F10,F2 8 ADDD F16,F14,F2 9 SD 0(R1),F4 10 SD -8(R1),F8 11 SD -16(R1),F12 12 SUBI R1,R1,#32 13 BNEZ R1,LOOP 14 SD 8 (R1),F16 ; 8-32 = -24 14 clock cycles, or 3.5 per iteration CPI = 14/14 = 1 When safe to move instructions? Recall: Unrolled Loop That Minimizes Stalls 3/17/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec14.6 ° Two main variations: Superscalar and VLIW ° Superscalar: varying no. instructions/cycle (1 to 6) Parallelism and dependencies determined/resolved by HW IBM PowerPC 604, Sun UltraSparc, DEC Alpha 21164, HP 7100 ° Very Long Instruction Words (VLIW): fixed number of instructions (16) parallelism determined by compiler Pipeline is exposed; compiler must schedule delays to get right result ° Explicit Parallel Instruction Computer (EPIC)/ Intel 128 bit packets containing 3 instructions (can execute sequentially) Can link 128 bit packets together to allow more parallelism Compiler determines parallelism, HW checks dependencies and fowards/stalls Getting CPI < 1: Issuing Multiple Instructions/Cycle 3/17/04 ©UCB Spring 2004 CS152 / Kubiatowicz Lec14.7 ° Superscalar DLX: 2 instructions, 1 FP & 1 anything else – Fetch 64-bits/clock cycle; Int on left, FP on right – Can only issue 2nd instruction if 1st instruction issues
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 01/29/2008 for the course CS 152 taught by Professor Kubiatowicz during the Spring '04 term at University of California, Berkeley.

Page1 / 15

Compiler Optimizations (continued), Dynamic Scheduling -...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online