This preview shows page 1. Sign up to view the full content.
Unformatted text preview: s partition into three equivalence classes, denoted by the pair of arrays that are accessed in the inner loop. 334 CHAPTER 6. THE MEMORY HIERARCHY a total of two misses per iteration. Notice that interchanging the loops has decreased the amount of spatial locality compared to the Class routines. The routines (Figure 6.45(e) and (f)) present an interesting tradeoff. With two loads and a store, they require one more memory operation than the routines. On the other hand, since the inner loop scans both and row-wise with a stride-1 access pattern, the miss rate on each array is only 0.25 misses per iteration, for a total of 0.50 misses per iteration. Figure 6.47 summarizes the performance of different versions of matrix multiply on a Pentium III Xeon system. The graph plots the measured number of CPU cycles per inner loop iteration as a function of array size (Ò).
60 50 Cycles/iteration 40 30 20 kji jki kij ikj jik ijk 10 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400
Array size (n) Figure 6.47: Pentium III Xeon matrix multiply performance. Legend: : Class ; and : Class There are a number of interesting points t...
View Full Document
This note was uploaded on 09/02/2010 for the course ELECTRICAL 360 taught by Professor Schultz during the Spring '10 term at BYU.
- Spring '10
- The American