lecture18 - CIS 450 Computer Architecture and Organization...

Info iconThis preview shows pages 1–11. Sign up to view the full content.

View Full Document Right Arrow Icon
CIS 450 – Computer Architecture and Organization Lecture 18: Cache Memories (cont.) Mitch Neilsen (neilsen@ksu.edu ) 219D Nichols Hall
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Topics ± Impact of caches on performance ± The memory mountain
Background image of page 2
Cache Memories Cache memories are small, fast SRAM-based memories managed automatically in hardware. ± Hold frequently accessed blocks of main memory CPU looks first for data in L1, then in L2, then in main memory. Typical system structure: main memory I/O bridge bus interface L2 data ALU register file CPU chip SRAM Port system bus memory bus L1 cache L2 tags
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Writing Cache Friendly Code Repeated references to variables are good ( temporal locality ) Stride-1 reference patterns are good ( spatial locality ) Examples: ± cold cache, 4-byte words, 4-word cache blocks int sum_array_rows(int a[M][N]) { int i, j, sum = 0; for (i = 0; i < M; i++) for (j = 0; j < N; j++) sum += a[i][j]; return sum; } int sum_array_cols(int a[M][N]) { int i, j, sum = 0; for (j = 0; j < N; j++) for (i = 0; i < M; i++) sum += a[i][j]; return sum; } Miss rate = 1/4 = 25% Miss rate = 100%
Background image of page 4
The Memory Mountain Read throughput (read bandwidth) ± Number of bytes read from memory per second (MB/s) Memory mountain ± Measured read throughput as a function of spatial and temporal locality. ± Compact way to characterize memory system performance.
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Memory Mountain Test Function /* The test function */ void test(int elems, int stride) { int i, result = 0; volatile int sink; for (i = 0; i < elems; i += stride) result += data[i]; sink = result; /* So compiler doesn't optimize away the loop */ } /* Run test(elems, stride) and return read throughput (MB/s) */ double run(int size, int stride, double Mhz) { double cycles; int elems = size / sizeof(int); test(elems, stride); /* warm up the cache */ cycles = fcyc2(test, elems, stride, 0); /* call test(elems,stride) */ return (size / stride) / (cycles / Mhz); /* convert cycles to MB/s */ }
Background image of page 6
Memory Mountain Main Routine /* mountain.c - Generate the memory mountain. */ #define MINBYTES (1 << 10) /* Working set size ranges from 1 KB */ #define MAXBYTES (1 << 23) /* . .. up to 8 MB */ #define MAXSTRIDE 16 /* Strides range from 1 to 16 */ #define MAXELEMS MAXBYTES/sizeof(int) int data[MAXELEMS]; /* The array we'll be traversing */ int main() { int size; /* Working set size (in bytes) */ int stride; /* Stride (in array elements) */ double Mhz; /* Clock frequency */ init_data(data, MAXELEMS); /* Initialize each element in data to 1 */ Mhz = mhz(0); /* Estimate the clock frequency */ for (size = MAXBYTES; size >= MINBYTES; size >>= 1) { for (stride = 1; stride <= MAXSTRIDE; stride++) printf("%.1f\t", run(size, stride, Mhz)); printf("\n"); } exit(0); }
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
The Memory Mountain s1 s3 s5 s7 s9 s11 s13 s15 8m 2m 512k 128k 32k 8k 2k 0 200 400 600 800 1000 1200 L1 L2 mem xe Slopes of Spatial Locality Pentium III 550 MHz 16 KB on-chip L1 d-cache 16 KB on-chip L1 i-cache 512 KB off-chip unified L2 cache Ridges of Temporal Locality Working set size (bytes) Stride (words) Throughput (MB/sec)
Background image of page 8
X86-64 Memory Mountain s1 s5 s9 s13 s17 s21 s25 s29 128m 4m 128k 4k 0 1000 2000 3000 4000 5000 6000 Read Throughput (MB/s) Stride (words) Working Set Size (bytes) Slopes of Spatial Locality Ridges of Temporal Locality Mem L2 L1 Pentium Xeon x86-64 3.2 GHz 12 Kuop on-chip L1 trace cache 16 KB on-chip L1 d-cache 1 MB off-chip unified L2 cache
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Camaro.cis.ksu.edu Memory Mountain (Intel(R) Xeon(TM) CPU 2.66GHz) s1 s5 s9 s13 s17 s21 s25
Background image of page 10
Image of page 11
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 04/09/2008 for the course CIS 450 taught by Professor Neilsen,mitch during the Spring '08 term at Kansas State University.

Page1 / 43

lecture18 - CIS 450 Computer Architecture and Organization...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online