D 1 stall each addd add 2 daddui 1 bne 1 instr issue

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: c F16,-24(R1) substitution & R1,R1,#-32 simplification R1,R2,Loop S. Ziavras Loop Processing/Unrolling (6) • Execution time: 28 clock cycles 28 – Each L.D: 1 stall – Each ADD.D: ADD 2 – DADDUI: 1 – BNE: 1 – Instr. Issue cycles: 14 7 cycles per element (for 4 elements) • Worse than scheduled version of original loop! • BETTER SOLUTION: Schedule the unrolled loop • Both phases are included in optimizing compilers S. Ziavras Scheduling the Unrolled Loop Loop: L.D L.D L.D L.D ADD.D ADD.D ADD.D ADD.D S.D S.D DADDUI S.D BNE S.D F0,0(R1) F6,-8(R1) F10,-16(R1) F14,-24(R1) F4,F0,F2 F8,F6,F2 F12,F10,F2 F16,F14,F2 Execution time: time F4,0(R1) 14 clock cycles F8,-8(R1) R1,R1,#-32 3.5 clock cycles F12,16(R1) per element R1,R2,Loop F16,8(R1) ;8-32=-24 S. Ziavras Loop Unrolling Pipeline Scheduling With Loop Unrolling & Pipeline Scheduling With Static Multiple Issue • Assume a 2-issue statically scheduled superscalar MIPS pipeline – 2 instrs. can be issued per clock cycle 1...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online