324_Book

# 324_Book

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 0; for (kk = 0; kk < en; kk += bsize) { for (jj = 0; jj < en; jj += bsize) { for (i = 0; i < n; i++) { for (j = jj; j < jj + bsize; j++) { sum = C[i][j]; for (k = kk; k < kk + bsize; k++) { sum += A[i][k]*B[k][j]; } C[i][j] = sum; } } } } } code/mem/matmult/bmm.c Figure 6.48: Blocked matrix multiply. A simple version that assumes that the array size ( n) is an integral multiple of the block size (bsize). kk bsize jj bsize bsize 1 jj 1 i kk bsize i A Use 1 x bsize row sliver bsize times B Use bsize x bsize block n times in succession C Update successive elements of 1 x bsize row sliver Figure 6.49: Graphical interpretation of blocked matrix multiply The innermost ´ tiplies a ½ ¢ bsize sliver of A by a bsize ¢ bsize block of B and accumulates into a C. ½ loop pair mul¢ bsize sliver of µ 6.6. PUTTING IT TOGETHER: THE IMPACT OF CACHES ON PROGRAM PERFORMANCE 337 Blocking can make code harder to read, but it can also pay big performance dividends. Figure 6.50 shows the performance of two versions of blocked matrix multiply on a Pent...
View Full Document

## This note was uploaded on 09/02/2010 for the course ELECTRICAL 360 taught by Professor Schultz during the Spring '10 term at BYU.

Ask a homework question - tutors are online