{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

VLIW_notes - BASIC LIW/VLIW MULTIPLE INDEPENDENT RISC...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
EE557 Michel Dubois USC 2007 Chapter4.12 3/3/07 BASIC LIW/VLIW MULTIPLE INDEPENDENT RISC INSTRUCTIONS --called ops-- ARE PACKAGED IN A VLI or VLIW--called instructions INDEPENDENT FUNCTIONAL UNITS WITH NO HAZARD DETECTION MAY HAVE SOME FORWARDING COMPILER IS RESPONSIBLE FOR ISSUING/SCHEDULING INSTRUCTION EXAMPLE: 1 INT op OR BRANCH 2 FP ops 2 MEMORY REFERENCES ops EACH op TAKES 16 TO 24 BITS TOTAL INSTRUCTION IS 112-168 BITS USE LOCAL AND GLOBAL COMPILER SCHEDULING ALGORITHMS EE557 Michel Dubois USC 2007 Chapter4.13 3/3/07 VLIW ARCHITECTURE LD/ST LD/ST FPop1 FPop2 INT/BR D D D D D E E E M M WB WB WB E E E E WB E E E E WB PC: WITH OR WITHOUT FORWARDING CYCLIC SCHEDULING: LOOP UNROLLING AND SOFTWARE PIPELINING (APPLICABLE TO LOOPS)
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
EE557 Michel Dubois USC 2007 Chapter4.14 3/3/07 VLIW--LOOP UNROLLING PROBLEMS CODE SIZE EMPTY SLOTS REGISTER PRESSURE LOCKSTEP BINARY COMPATIBILITY (BINARY TRANSLATION/EMULATION COST OF BRANCHES ILP ILP ILP ILP... UNROLL LOOP 7 TIMES--VLIW PROGRAM: Clock MemOp1 MemOp2 FPOp1 FPOp2 Int/BROp 1 LD F0,0(R1) LD F6,-8(R1) 2 LD F10,-16(R1) LD F14,-24(R1) 3 LD F18,-32(R1) LD F22,-40(R1) ADDD F4,F0,F2 ADDD F8,F6,F2 4 LD F26,-48(R1) ADDD F12,F10,F2 ADDD F16,F14,F2 5 ADDD F20,F18,F2 ADDD F24,F22,F2 6 SD 0(R1),F4 SD -8(R1),F8 ADDD F28,F26,F2 7 SD -16(R1),F12 SD -24(R1),F16 SUBI R1,R1,#48 8 SD 16(R1),F20 SD 8(R1),F24 BNEZ R1,LOOP 9 SD 0(R1),F28 EE557 Michel Dubois USC 2007 Chapter4.15 3/3/07 VLIW--SOFTWARE PIPELINING Loop: LD F0,0(R1) O1 ADDD F4,F0,F2 O2 SD 0(R1),F4 O3 1 LOOP ITERATION PER CLOCK ITE1 ITE2 ITE3 ITE4 ITE5 ITE6 INST1 O1 INST2 -- O1 INST3 O2 -- O1 INST4 -- O2 -- O1 INST5 -- -- O2 -- O1 INST6 O3 -- -- O2 -- O1 INST7 O3 -- -- O2 -- INST8 O3 -- -- O2 KERNEL KERNEL CODE:(WE HAVE 6 ITERATIONS BETWEEN LD AND SD LOOPBR compares R1 and R2,loops back if non equal and increments RRB Speedup is 9!!!! USE ROTATING REGISTERS. REGISTER ROTATION IS A FORM OF REGISTER RENAMING Clock MemOp1 MemOp2 FPOp1 FPOp2 Int/BROp 1 LD F0,0(R1) SD 48(R1),F4 ADDD F4,F0,F2 NOOP LOOPBR R1,R4,CLK1
Background image of page 2