Making this code run fast requires two types of

for (k = 0; k < n; k++) for (j = 0; j < n; j++) { r = B[k][j]; for (i = 0; i < n; i++) C[i][j] += A[i][k]*r; } code/mem/matmult/mm.c (c) Version . code/mem/matmult/mm.c (d) Version . code/mem/matmult/mm.c 1 2 3 4 5 6 for (k = 0; k < n; k++) for (i = 0; i < n; i++) { r = A[i][k]; for (j = 0; j < n; j++) C[i][j] += r*B[k][j]; } code/mem/matmult/mm.c 1 2 3 4 5 6 for (i = 0; i < n; i++) for (k = 0; k < n; k++) { r = A[i][k]; for (j = 0; j < n; j++) C[i][j] += r*B[k][j]; } code/mem/matmult/mm.c (e) Version . (f) Version . Figure 6.45: Six versions of matrix multiply. Matrix multiply version (class) & ( ) & ( ) & ( ) Loads per iter 2 2 2 Stores per iter 0 1 1 A misses per iter 0.25 1.00 0.00 B misses per iter 1.00 0.00 0.25 C misses per iter 0.00 1.00 0.25 Total misses per iter 1.25 2.00 0.50 Figure 6.46: Analysis of matrix multiply inner loops.
