Unformatted text preview: unrolling. Our measurements for this procedure give a CPE of 2.20 for integer data and 3.50 for floating point. A. Explain why any version of any inner product procedure cannot achieve a CPE better than 2. B. Explain why the performance for floating point did not improve with loop unrolling. Homework Problem 5.10 [Category 1]: Write a version of the inner product procedure described in Problem 5.8 that uses four-way loop unrolling and two-way parallelism. Our measurements for this procedure give a CPE of 2.25 for floating-point data. Describe two factors that limit the performance to a CPE of at best 2.0. Homework Problem 5.11 [Category 2]: 270 CHAPTER 5. OPTIMIZING PROGRAM PERFORMANCE You’ve just joined a programming team that is trying to develop the world’s fastest factorial routine. Starting with recursive factorial, they’ve converted the code to use iteration: 1 2 3 4 5 6 7 8 9 int fact(int n) { int i; int result = 1; for (i = n; i > 0; i--) result = result * i; return result; } ¿ By doing so, they have reduced the number of CP...
