Others look for ways to use caches and locality to

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: rformance. For sizes up to and including 16 KB, the working set fits entirely in the L1 d-cache, and thus reads are served from L1 at the peak throughput of about 1 GB/s. For sizes up to and including 256 KB, the working set fits entirely in the unified L2 cache. 1200 main memory region L2 cache region L1 cache region 1000 read througput (MB/s) 800 600 400 200 0 1024k 512k 256k 128k 8m 4m 2m 64k 32k 16k 8k 4k 2k 1k working set size (bytes) Figure 6.43: Ridges of temporal locality in the memory mountain. The graph shows a slice through Figure 6.42 with stride=1. Larger working set sizes are served primarily from main memory. The drop in read throughput between 256 KB and 512 KB is interesting. Since the L2 cache is 512 KB, we might expect the drop to occur at 512 KB instead of 256 KB. The only way to be sure is to perform a detailed cache simulation, but we suspect the reason lies in the fact that the Pentium III L2 cache is a unified cache that holds both instructions and data. What we might be seeing is the effect of conflict misses between instructi...
View Full Document

Ask a homework question - tutors are online