Unformatted text preview: c F16,-24(R1) substitution & R1,R1,#-32 simplification R1,R2,Loop S. Ziavras Loop Processing/Unrolling (6) • Execution time: 28 clock cycles 28 – Each L.D: 1 stall – Each ADD.D: ADD 2 – DADDUI: 1 – BNE: 1 – Instr. Issue cycles: 14 7 cycles per element (for 4 elements) • Worse than scheduled version of original loop! • BETTER SOLUTION: Schedule the unrolled loop • Both phases are included in optimizing compilers S. Ziavras Scheduling the Unrolled Loop Loop: L.D L.D L.D L.D ADD.D ADD.D ADD.D ADD.D S.D S.D DADDUI S.D BNE S.D F0,0(R1) F6,-8(R1) F10,-16(R1) F14,-24(R1) F4,F0,F2 F8,F6,F2 F12,F10,F2 F16,F14,F2 Execution time: time F4,0(R1) 14 clock cycles F8,-8(R1) R1,R1,#-32 3.5 clock cycles F12,16(R1) per element R1,R2,Loop F16,8(R1) ;8-32=-24 S. Ziavras Loop Unrolling Pipeline Scheduling With Loop Unrolling & Pipeline Scheduling With Static Multiple Issue • Assume a 2-issue statically scheduled superscalar MIPS pipeline – 2 instrs. can be issued per clock cycle 1...
