324_Book

# Linking understanding linking will enable you to

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: rge chunks called blocks. (In this context, the term “block” refers to an application-level chunk of data, not a cache block.) The program is structured so that it loads a chunk into the L1 cache, does all the reads and writes that it needs to on that chunk, then discards the chunk, loads in the next chunk, and so on. Blocking a matrix multiply routine works by partitioning the matrices into submatrices and then exploiting the mathematical fact that these submatrices can be manipulated just like scalars. For example, if Ò , then we could partition each matrix into four ¢ submatrices: ½½ ¾½ ½¾ ¾¾ ½½ ¾½ ½¾ ¾¾ ½½ ¾½ ½¾ ¾¾ where ½½ ½¾ ¾½ ¾¾ ½½ ½½ ¾½ ¾½ ½½ · ½¾ · ½½ · ½¾ · ½¾ ½¾ ¾¾ ¾¾ ¾½ ¾¾ ¾½ ¾¾ Figure 6.48 shows one version of blocked matrix multiplication, which we call the version. The basic idea behind this code is to partition and into ½ ¢ bsize row slivers and to partition into bsize ¢ bsize blocks. The innermost ´ µ loop pair multiplies a sliver of by a block of and accumulates the result into a sliver of . The loop iterates through Ò row slivers of and...
View Full Document

## This note was uploaded on 09/02/2010 for the course ELECTRICAL 360 taught by Professor Schultz during the Spring '10 term at BYU.

Ask a homework question - tutors are online