The memory hierarchy for i 0 i 16 i for j 0 j

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: . If we permute the loops and make some other minor code changes, we can create the six functionally equivalent versions of matrix multiply shown in Figure 6.45. Each version is uniquely identified by the ordering of its loops. At a high level, the six versions are quite similar. If addition is associative, then each version computes an identical result.2 Each version performs Ç ´Ò¿ µ total operations and an identical number of adds and multiplies. Each of the Ò¾ elements of and is read Ò times. Each of the Ò¾ elements of is computed by summing Ò values. However, if we analyze the behavior of the innermost loop iterations, we find that there are differences in the number of accesses and the locality. For the purposes of our analysis, let’s make the following assumptions: ¯ ¯ ¯ ¯ Each array is an Ò ¢ Ò array of double, with sizeof(double) == 8. There is a single cache with a 32-byte block size ( ¿¾ ). The array size Ò is so large that a single matrix row does not fit in the L1 cache. The compiler...
View Full Document

This note was uploaded on 09/02/2010 for the course ELECTRICAL 360 taught by Professor Schultz during the Spring '10 term at BYU.

Ask a homework question - tutors are online