Lec14-Cache_measurement

# Its also a nice example of a program that is highly

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: put (MB/s) 6000 Size (bytes) 14 Pentium Xeon Stephen Chong, Harvard University 15 Opteron Memory Mountain L1 L2 Mem Stephen Chong, Harvard University 16 Topics for today •Cache performance metrics •Discovering your cache's size and performance •The “Memory Mountain” •Matrix multiply, six ways •Blocked matrix multiplication •Exploiting locality in your programs Stephen Chong, Harvard University 17 Matrix Multiplication Example • Matrix multiplication is heavily used in numeric and scientiﬁc applications. • It's also a nice example of a program that is highly sensitive to cache effects. • Multiply two N x N matrices • O(N3) total operations • Read N values for each source element • Sum up N values for each destination Stephen Chong, Harvard University void mmm(double *a, double *b, double *c, int n) { int i, j, k; Variable sum /* ijk */ for (i=0; i&lt;n; i++) { held in register for (j=0; j&lt;n; j++) { sum = 0.0; for (k=0; k&lt;n; k++) sum += a[i][k] * b[k][j]; c[i][j] = sum; } } 18 Matrix Multiplication Example /* ijk */ for (i=0; i&lt;n; i++) { for (j=0; j&lt;n; j++) { sum = 0.0; for (k=0; k&lt;n; k++) sum += a[i][k] * b[k][j]; c[i][j] = sum; } } 4×3 + 2×2 + 7×5 = 51 427 182 601 Stephen Chong, Harvard University × 301 245 591 = 51 19 Miss Rate Analysis for Matrix Multiply • Assume: • Line size = 32B (big enough for four 64-bit “double” values) • Matrix dimension N is very large • Cache is not big enough to hold multiple rows • Analysis Method: • Look at access pattern of inner loop k j × i A Stephen Chong, Harvard University j = k B i C 20 Layout of C Arrays in Memory (review) • C arrays allocated in row-major order • Each row in contiguous memory locations • Stepping through columns in one row: • for (i = 0; i &lt; N; i++) sum += a[0][i]; Accesses successive elements • Compulsory miss rate: (8 bytes per double) / (block size of cache) • Stepping through rows in one column: • for (i = 0; i &lt; n; i++) sum += a[i][0]; • Accesses distant elements —...
View Full Document

Ask a homework question - tutors are online