Lecture 10

# Mizaons university of washington blocked matrix

Unformatted text preview: i ­cache and d ­cache: 32 KB, 8 ­way, Access: 4 cycles L2 uniﬁed cache: 256 KB, 8 ­way, Access: 11 cycles L3 uniﬁed cache: 8 MB, 16 ­way, Access: 30 ­40 cycles Block size: 64 bytes for all caches. University of Washington Memory and Caches           Cache basics Principle of locality Memory hierarchies Cache organiza?on Program op?miza?ons that consider caches Caches and Program Op?miza?ons University of Washington Op?miza?ons for the Memory Hierarchy   Write code that has locality   Spa:al: access data con:guously   Temporal: make sure access to the same data is not too far apart in :me   How to achieve?   Proper choice of algorithm   Loop transforma:ons Caches and Program Op?miza?ons University of Washington Example: Matrix Mul?plica?on c = (double *) calloc(sizeof(double), n*n); /* Multiply n x n matrices a and b */ void mmm(double *a, double *b, double *c, int n) { int i, j, k; for (i = 0; i < n; i++) for (j = 0; j < n; j++) for (k = 0; k < n; k++) c[i*n + j] += a[i*n + k]*b[k*n + j]; } j c = i a b * Caches and Program Op?miza?ons University of Washington Cache Miss Analysis   Assume:   Matrix elements are doubles   Cache block = 64 bytes = 8 doubles   Cache size C << n (much smaller than n)   n First itera?on:   n/8 + n = 9n/8 misses (omiqng matrix c) = * = *   Aeerwards in cache: (schema:c) 8 wide Caches and Program Op?miza?ons University of Washington Cache Miss Analysis   Assume:   Matrix elements are doubles   Cache block = 64 bytes = 8 doubles   Cache size C << n (much smaller than n)   n Other itera?ons:   Again: n/8 + n = 9n/8 misses (omiqng matrix c) = * 8 wide   Total misses:   9n/8 * n2 = (9/8) * n3 Caches and Program Op?miza?ons University of Washington Blocked Matrix Mul?plica?on c = (double *) calloc(sizeof(double), n*n); /* Multiply n x n matrices a and b */ void mmm(double *a, double *b, double *c, int n) { int i, j, k; for (i = 0; i < n; i+=B) for (j = 0; j < n; j+=B) for (k = 0; k < n; k+=B) /* B x B mini matrix multiplications */ for (i1 = i; i1 < i+B; i1++) for (j1 = j; j1 < j+B; j1++) for (k1 = k; k1 < k+B; k1++) c[i1*n + j1] += a[i1*n + k1]*b[k1*n + j1]; } j1 c = i1 a b * Block size B x B Caches and Program Op?miza?ons University of Washington Cache Miss Analysis   Assume:   Cache block = 64 bytes = 8 doubles   Cache size C << n (much smaller than n)   Three block...
