Unformatted text preview: optimal cache placement policy. for (i = 0; i &lt; dim; i++) dot_prod += a[i] * b[i]; 2- Consider the problem of multiplying two dense matrices of dimension 4K x 4K(Each row of the matrix takes 16 KB of storage). What is the peak achievable performance using a three-loop dot-product based formulation? (Assume that matrices are laid out in a row-major fashion.) for (i = 0; i &lt; dim; i++) for (j = 0; i &lt; dim; j++) for (k = 0; k &lt; dim; k++) c[i][j] += a[i][k] * b[k][j];...
View Full Document
- Spring '09
- Central processing unit, CPU cache, Cache algorithms, peak achievable performance