This preview shows page 1. Sign up to view the full content.
Unformatted text preview: optimal cache placement policy. for (i = 0; i &lt; dim; i++) dot_prod += a[i] * b[i]; 2 Consider the problem of multiplying two dense matrices of dimension 4K x 4K(Each row of the matrix takes 16 KB of storage). What is the peak achievable performance using a threeloop dotproduct based formulation? (Assume that matrices are laid out in a rowmajor fashion.) for (i = 0; i &lt; dim; i++) for (j = 0; i &lt; dim; j++) for (k = 0; k &lt; dim; k++) c[i][j] += a[i][k] * b[k][j];...
View Full
Document
 Spring '09
 Rjyosy

Click to edit the document details