The inner loops of the class routines figure 645a and

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ctions. These functions often spend most of their time in a few loops. So focus on the inner loops of the core functions and ignore the rest. 2. Minimize the number of cache misses in each inner loop. All other things being equal, such as the total number of loads and stores, loops with better miss rates will run faster. To see how this works in practice, consider the sumvec function from Section 6.2. 1 2 3 4 5 6 7 8 int sumvec(int v[N]) { int i, sum = 0; for (i = 0; i < N; i++) sum += v[i]; return sum; } Is this function cache-friendly? First, notice that there is good temporal locality in the loop body with respect to the local variables i and sum. In fact, because these are local variables, any reasonable optimizing compiler will cache them in the register file, the highest level of the memory hierarchy. Now consider the stride-1 references to vector v. In general, if a cache has a block size of bytes, then a stride- reference pattern (where is in expressed in words) results in an average of Ñ Ò ´1 ´wordsize ¢ kµ Bµ misses per loop iterati...
View Full Document

Ask a homework question - tutors are online