This preview shows page 1. Sign up to view the full content.
Unformatted text preview: 3c t.4a t.4b t.4c %ecx.4c addl
%ecx.i +1 cmpl cc.3 %edx.3 jl load addl load
%ecx.i +1 cmpl cc.4 %edx.4 %ecx.3a addl %ecx.3b addl load addl load addl
%ecx.4b %ecx.4a jl Iteration 3 addl Iteration 4 Figure 5.21: Scheduling of Operations for Three-Way Unrolled Integer Sum with Bounded Resource Constraints. In principle, the procedure can achieve a CPE of 1.0. The measured CPE, however, is 1.33. 5.8. REDUCING LOOP OVERHEAD 237 shows that once we reach iteration 3 (i ), the operations would follow a regular pattern. The operations of iteration 4 (i ) have the same timings, but shifted by three cycles. This would indeed yield a CPE of 1.0. Our measurement for this function shows a CPE of 1.33, that is, we require four cycles per iteration. Evidently some resource constraint we did not account for in our analysis delays the computation by one additional cycle per iteration. Nonetheless, this performance represents an improvement over the code without loop unrolling. Measuring the performance for differen...
View Full Document