CSE361S-CacheMemory-slides

CSE361S-CacheMemory-slides - Writing Cache Friendly Code...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
Page 1 Writing Cache Friendly Code Repeated references to variables are good (temporal locality) Stride Stride-1 reference patterns are good (spatial locality) 1 reference patterns are good (spatial locality) Examples: ± cold cache, 4-byte words, 4-word cache blocks –1– int sumarrayrows(int a[M][N]) { int i, j, sum = 0; for (i = 0; i < M; i++) for (j = 0; j < N; j++) sum += a[i][j]; return sum; } int sumarraycols(int a[M][N]) { int i, j, sum = 0; for (j = 0; j < N; j++) for (i = 0; i < M; i++) sum += a[i][j]; return sum; } Miss rate = Miss rate = 1/4 = 25% 100% The Memory Mountain Read throughput (read bandwidth) ± Number of bytes read from memory per second (MB/s) Memory mountain ± Measured read throughput as a function of spatial and temporal locality. ± Compact way to characterize memory system performance –2– Compact way to characterize memory system performance. Memory Mountain Test Function /* The test function */ void test(int elems, int stride) { int i, result = 0; volatile int sink; for (i = 0; i < elems; i += stride) result += data[i]; sink = result; /* So compiler doesn't optimize away the loop */ } –3– /* Run test(elems, stride) and return read throughput (MB/s) */ double run(int size, int stride, double Mhz) { double cycles; int elems = size / sizeof(int); test(elems, stride); /* warm up the cache */ cycles = fcyc2(test, elems, stride, 0); /* call test(elems,stride) */ return (size / stride) / (cycles / Mhz); /* convert cycles to MB/s */ } Memory Mountain Main Routine /* mountain.c - Generate the memory mountain. */ #define MINBYTES (1 << 10) /* Working set size ranges from 1 KB */ #define MAXBYTES (1 << 23) /* . .. up to 8 MB */ #define MAXSTRIDE 16 /* Strides range from 1 to 16 */ #define MAXELEMS MAXBYTES/sizeof(int) int data[MAXELEMS]; /* The array we'll be traversing */ int main() { i t i /* W ki t i (i b t ) */ –4– int size; /* Working set size (in bytes) */ int stride; /* Stride (in array elements) */ double Mhz; /* Clock frequency */ init_data(data, MAXELEMS); /* Initialize each element in data to 1 */ Mhz = mhz(0); /* Estimate the clock frequency */ for (size = MAXBYTES; size >= MINBYTES; size >>= 1) { for (stride = 1; stride <= MAXSTRIDE; stride++) printf("%.1f\t", run(size, stride, Mhz)); printf("\n"); } exit(0); } The Memory Mountain 800 1000 1200 roughput (MB/s) Pentium III Xeon 550 MHz 16 KB on-chip L1 d-cache 16 KB on-chip L1 i-cache 512 KB off-chip unified L2 cache L1 –5– s1 s3 s5 s7 s9 s11 s13 s15 8m 2m 512k 128k 32k 8k 2k 0 200 400 600 read th stride (words) working set size (bytes) Ridges of Temporal Locality L2 mem Slopes of Spatial Locality xe Pentium 4 – 2.4 GHz 3000 3500 4000 4500 5000 ut (MB/s) 4500-5000 4000-4500 3500-4000 3000 3500 –6– s2 s4 s6 s8 s10 s12 s14 s16 8m 1024k 128k 16k 2k 0 500 1000 1500 2000 2500 Read throughp Stride (words) Working set size (bytes) 3000-3500 2500-3000 2000-2500 1500-2000 1000-1500 500-1000 0-500
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Page 2 Ridges of Temporal Locality Slice through the memory mountain with stride=1 ±
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 11/13/2009.

Page1 / 5

CSE361S-CacheMemory-slides - Writing Cache Friendly Code...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online