NOT_USEFUL-lecture-03

for i 0 1 n 1 for j 0 1 m 1 for k 0 1

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: omputing dt and ds • A is walked in row major order in inner loop • • • • ds(A) = 3 (why?) for i ∈ [0 : 1 : N − 1] for j ∈ [0 : 1 : M − 1] for k ∈ [0 : 1 : K − 1] Cij = Cij + Aik * Bkj Fig. 1. Na¨ve MMM Code ı j Note: ds is not dependent on N → A always has spatial locality B Figure 1 shows the standard three n k matrix multiplication. As is well-known How many iterations does it take to return to the same element of A? permutable because the loops can be What is dt(A)? Tuesday, February 2, 2010 without changing the result [3].j To refe k will specify the loop order from outer A C the ijk version is the one with the i lo i i k loop innermost. Without loss of gene that the matrix is stored in row-major o 60 Miss rates for B and C • What is dt(B)? for i ∈ [0 : 1 : N − 1] for j ∈ [0 : 1 : M − 1] for k ∈ [0 : 1 : K − 1] Cij = Cij + Aik * Bkj Fig. 1. Na¨ve MMM Code ı j • What is ds(B)? • What about for C? Tuesday, February 2, 2010 B Figure 1 shows the standard three n k matrix multiplication. As is well-known permutable because the loops can be without changing the result [3].j To refe k will specify the loop order from outer A C the ijk version is the one with the i lo i i k loop innermost. Without loss of gene that the matrix is stored in row-major o 60 Putting it all together missijk,A (N, b, C ) = N 2 /b bN + N ≤ C N 3 /b otherwise 2 N /b (N 2 + 2N ) ≤ C N 3 /b bN + N ≤ C missijk,B (N, b, C ) = 3 N otherwise missijk,C (N, b, C ) = N /b 2 Tuesday, February 2, 2010 Putting it all together (N 2 + 2N ) ≤ C 3N 2 /b N 3 /b + 2N 2 /b bN + N ≤ C missijk (N, b, C ) = (b + 1)N 3 /b otherwise 3/(4bN ) ratioijk (N, b, C ) = 1/(4b) (b + 1)/(4b) Tuesday, February 2, 2010 N ≤ (C ) N ≤ C/(b + 1) otherwise Miss ratios for other orders 3/(4bN ) ratioijk (N, b, C ) = 1/(4b) (b + 1)/(4b) 3/(4bN ) ratiojki (N, b, C ) = 1/(4b) 1/2 3/(4bN ) ratiokij (N, b, C ) = 1/(4b) 1/(2b) Tuesday, February 2, 2010 N ≤ (C ) N ≤ C/(b + 1) otherwise N≤ (C ) N ≤ C/(2b) otherwise N ≤ (C ) N ≤ C/2 otherwise Performance regimes • • • Large-cache regime • • • All matrices exhibit temporal and spatial locality Miss rate decreases as matrix size gets larger For all orders, occu...
View Full Document

Ask a homework question - tutors are online