Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 3c t.4a t.4b t.4c %ecx.4c addl %ecx.i +1 cmpl cc.3 %edx.3 jl load addl load %ecx.i +1 cmpl cc.4 %edx.4 %ecx.3a addl %ecx.3b addl load addl load addl %ecx.4b %ecx.4a jl Iteration 3 addl Iteration 4 Figure 5.21: Scheduling of Operations for Three-Way Unrolled Integer Sum with Bounded Resource Constraints. In principle, the procedure can achieve a CPE of 1.0. The measured CPE, however, is 1.33. 5.8. REDUCING LOOP OVERHEAD 237 shows that once we reach iteration 3 (i ), the operations would follow a regular pattern. The operations of iteration 4 (i ) have the same timings, but shifted by three cycles. This would indeed yield a CPE of 1.0. Our measurement for this function shows a CPE of 1.33, that is, we require four cycles per iteration. Evidently some resource constraint we did not account for in our analysis delays the computation by one additional cycle per iteration. Nonetheless, this performance represents an improvement over the code without loop unrolling. Measuring the performance for differen...
View Full Document

This note was uploaded on 09/02/2010 for the course ELECTRICAL 360 taught by Professor Schultz during the Spring '10 term at BYU.

Ask a homework question - tutors are online