This preview shows page 1. Sign up to view the full content.
Unformatted text preview: d from L2. Thus the read throughput for strides of at least 8 words is a constant rate determined by the rate that cache blocks can be transferred from L2 into L1. To summarize our discussion of the memory mountain: The performance of the memory system is not characterized by a single number. Instead, it is a mountain of temporal and spatial locality whose elevations 6.6. PUTTING IT TOGETHER: THE IMPACT OF CACHES ON PROGRAM PERFORMANCE
800 700 read throughput (MB/s) 600 500 one access per cache line 400 300 200 100 0 s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 s15 s16 stride (words) 331 Figure 6.44: A slope of spatial locality. The graph shows a slice through Figure 6.42 with size=256 KB. can vary by over an order of magnitude. Wise programmers try to structure their programs so that they run in the peaks instead of the valleys. The aim is to exploit temporal locality so that heavily used words are fetched from the L1 cache, and to exploit spatial locality so that as many words as possible are accessed from a single L1 cache line. Practice Problem 6.18:
The memory mountain in Figur...
View Full Document
This note was uploaded on 09/02/2010 for the course ELECTRICAL 360 taught by Professor Schultz during the Spring '10 term at BYU.
- Spring '10
- The American