7- SW-tech-for-ILP

7- SW-tech-for-ILP - SW techniques for ILP Review: Unrolled...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: SW techniques for ILP Review: Unrolled Loop that Minimizes Stalls for Scalar 1 Loop: LD F0,0(R1) 2 LD F6,-8(R1) 3 LD F10,-16(R1) 4 LD F14,-24(R1) 5 ADDD F4,F0,F2 6 ADDD F8,F6,F2 7 ADDD F12,F10,F2 8 ADDD F16,F14,F2 9 SD 0(R1),F4 10 SD -8(R1),F8 11 SD 16(R1),F1 11 SD -16(R1),F12 12 SUBI R1,R1,#32 13 BNEZ R1,LOOP 14 SD 8(R1),F16 ; 8-32 = -24 14 clock cycles, or 3.5 per iteration Loop Unrolling in Superscalar Integer instruction FP instruction Clock cycle Loop: LD F0,0(R1) 1 LD F6,-8(R1) 2 LD F10,-16(R1) ADDD F4,F0,F2 3 LD F14,-24(R1) ADDD F8,F6,F2 4 LD F18,-32(R1) ADDD F12,F10,F2 5 SD 0(R1),F4 ADDD F16,F14,F2 6 SD -8(R1),F8 ADDD F20,F18,F2 7 SD -16(R1),F12 8 SD -24(R1),F16 9 SUBI R1,R1,#40 10 BNEZ R1,LOOP 11 BNEZ R1,LOOP 11 SD -32(R1),F20 12 Unrolled 5 times to avoid delays (+1 due to SS) 12 clocks, or 2.4 clocks per iteration (1.5X) Multiple Issue Challenges While Integer/FP split is simple for the HW, get CPI of 0.5 only for programs with: Exactly 50% FP operations No hazard No hazards If more instructions issue at same time, greater...
View Full Document

Page1 / 7

7- SW-tech-for-ILP - SW techniques for ILP Review: Unrolled...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online