50 pentium iii xeon blocked matrix multiply

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: rst elems elements of an integer array with a stride of stride. The run function is a wrapper that calls the test function and returns the measured read throughput. The fcyc2 function in line 18 (not shown) estimates the running time of the test function, in CPU cycles, using the à -best measurement scheme described in Chapter 9. Notice that the size argument to the run function is in units of bytes, while the corresponding elems argument to 328 CHAPTER 6. THE MEMORY HIERARCHY the test function is in units of words. Also, notice that line 19 computes MB/s as ½¼ bytes/s, as opposed to ¾¾¼ bytes/s. The size and stride arguments to the run function allow us to control the degree of locality in the resulting read sequence. Smaller values of size result in a smaller working set size, and thus more temporal locality. Smaller values of stride result in more spatial locality. If we call the run function repeatedly with different values of size and stride, then we can recover a two-dimensional function of read bandwidth versus temporal and spatial locality called the memory mountain. Figure 6...
View Full Document

Ask a homework question - tutors are online