Two major differences are that 1 vram output is

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: unrolling. Our measurements for this procedure give a CPE of 2.20 for integer data and 3.50 for floating point. A. Explain why any version of any inner product procedure cannot achieve a CPE better than 2. B. Explain why the performance for floating point did not improve with loop unrolling. Homework Problem 5.10 [Category 1]: Write a version of the inner product procedure described in Problem 5.8 that uses four-way loop unrolling and two-way parallelism. Our measurements for this procedure give a CPE of 2.25 for floating-point data. Describe two factors that limit the performance to a CPE of at best 2.0. Homework Problem 5.11 [Category 2]: 270 CHAPTER 5. OPTIMIZING PROGRAM PERFORMANCE You’ve just joined a programming team that is trying to develop the world’s fastest factorial routine. Starting with recursive factorial, they’ve converted the code to use iteration: 1 2 3 4 5 6 7 8 9 int fact(int n) { int i; int result = 1; for (i = n; i > 0; i--) result = result * i; return result; } ¿ By doing so, they have reduced the number of CP...
View Full Document

Ask a homework question - tutors are online