{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Lecture 10

# Mizaons university of washington blocked matrix

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: i ­cache and d ­cache: 32 KB, 8 ­way, Access: 4 cycles L2 uniﬁed cache: 256 KB, 8 ­way, Access: 11 cycles L3 uniﬁed cache: 8 MB, 16 ­way, Access: 30 ­40 cycles Block size: 64 bytes for all caches. University of Washington Memory and Caches           Cache basics Principle of locality Memory hierarchies Cache organiza?on Program op?miza?ons that consider caches Caches and Program Op?miza?ons University of Washington Op?miza?ons for the Memory Hierarchy   Write code that has locality   Spa:al: access data con:guously   Temporal: make sure access to the same data is not too far apart in :me   How to achieve?   Proper choice of algorithm   Loop transforma:ons Caches and Program Op?miza?ons University of Washington Example: Matrix Mul?plica?on c = (double *) calloc(sizeof(double), n*n); /* Multiply n x n matrices a and b */ void mmm(double *a, double *b, double *c, int n) { int i, j, k; for (i = 0; i < n; i++) for (j = 0; j < n; j++) for (k = 0; k < n; k++) c[i*n + j] += a[i*n + k]*b[k*n + j]; } j c = i a b * Caches and Program Op?miza?ons University of Washington Cache Miss Analysis   Assume:   Matrix elements are doubles   Cache block = 64 bytes = 8 doubles   Cache size C << n (much smaller than n)   n First itera?on:   n/8 + n = 9n/8 misses (omiqng matrix c) = * = *   Aeerwards in cache: (schema:c) 8 wide Caches and Program Op?miza?ons University of Washington Cache Miss Analysis   Assume:   Matrix elements are doubles   Cache block = 64 bytes = 8 doubles   Cache size C << n (much smaller than n)   n Other itera?ons:   Again: n/8 + n = 9n/8 misses (omiqng matrix c) = * 8 wide   Total misses:   9n/8 * n2 = (9/8) * n3 Caches and Program Op?miza?ons University of Washington Blocked Matrix Mul?plica?on c = (double *) calloc(sizeof(double), n*n); /* Multiply n x n matrices a and b */ void mmm(double *a, double *b, double *c, int n) { int i, j, k; for (i = 0; i < n; i+=B) for (j = 0; j < n; j+=B) for (k = 0; k < n; k+=B) /* B x B mini matrix multiplications */ for (i1 = i; i1 < i+B; i1++) for (j1 = j; j1 < j+B; j1++) for (k1 = k; k1 < k+B; k1++) c[i1*n + j1] += a[i1*n + k1]*b[k1*n + j1]; } j1 c = i1 a b * Block size B x B Caches and Program Op?miza?ons University of Washington Cache Miss Analysis   Assume:   Cache block = 64 bytes = 8 doubles   Cache size C << n (much smaller than n)   Three block...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online