Lec14-Cache_measurement

# 0 for k0 kn k sum aik bkj cij sum i b after

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: no spatial locality! • Compulsory miss rate = 100% Stephen Chong, Harvard University 21 Matrix Multiplication (ijk) /* ijk */ for (i=0; i<n; i++) { for (j=0; j<n; j++) { sum = 0.0; for (k=0; k<n; k++) sum += a[i][k] * b[k][j]; c[i][j] = sum; } } Inner loop: (i,*) (*,j) (i,j) A B C Row-wise Columnwise Fixed • 2 loads, 0 stores per iteration • Assume cache line size of 32 bytes, so 4 doubles per line • Misses per iteration: A = 0.25 Stephen Chong, Harvard University B=1 C=0 Total: 1.25 22 Cache miss analysis Inner loop: /* ijk */ for (i=0; i<n; i++) { for (j=0; j<n; j++) { sum = 0.0; for (k=0; k<n; k++) sum += a[i][k] * b[k][j]; c[i][j] = sum; } } (i,*) B After ﬁrst iteration in cache (schematic): Stephen Chong, Harvard University 4 doubles wide B C Row-wise A (i,j) A First iteration: (*,j) Columnwise Fixed C 23 Cache miss analysis Inner loop: /* ijk */ for (i=0; i<n; i++) { for (j=0; j<n; j++) { sum = 0.0; for (k=0; k<n; k++) sum += a[i][k] * b[k][j]; c[i][j] = sum; } } (i,*) B After ﬁrst iteration in cache (schematic): Stephen Chong, Harvard University 4 doubles wide B C Row-wise A (i,j) A First iteration: (*,j) Columnwise Fixed C 24 Matrix Multiplication (jik) /* jik */ for (j=0; j<n; j++) { for (i=0; i<n; i++) { sum = 0.0; for (k=0; k<n; k++) sum += a[i][k] * b[k][j]; c[i][j] = sum; } } Inner loop: (i,*) (*,j) (i,j) A B C Row-wise Columnwise Fixed • Same as ijk, just swapped order of outer loops • 2 loads, 0 stores per iteration • Assume cache line size of 32 bytes, so 4 doubles per line • Misses per iteration: A = 0.25 Stephen Chong, Harvard University B=1 C=0 Total: 1.25 25 Matrix Multiplication (kij) /* kij */ for (k=0; k<n; k++) { for (i=0; i<n; i++) { r = a[i][k]; for (j=0; j<n; j++) c[i][j] += r * b[k][j]; } } Inner loop: (i,k) (k,*) (i,*) A B C Fixed Row-wise Row-wise • 2 load, 1 store per iteration • Assume cache line size of 32 bytes, so 4 doubles per line • Misses per iteration: A=0 Stephen Chong, Harvard University B = 0.25 C = 0.25 Total: 0.5 26 Matrix Multiplication (ikj) /* ikj */ for (i=0; i<n; i++) { for (k=0; k<n; k++) { r = a[i][k]; for (j=0; j<n; j++) c[i][j] += r * b[k][j]; } } Inner loop: (i,k) (k,*) (i,*) A B C Fixed Row-wise Row-wise • Same as kij, just swapped order of outer loops • 2 load, 1 store per iteration • Assume cache line size of 32 bytes, so 4 doubles per line • Misses per iteration: A=0 Ste...
View Full Document

## This note was uploaded on 10/19/2012 for the course CS 61 taught by Professor Eddiekohler during the Fall '12 term at Carnegie Mellon.

Ask a homework question - tutors are online