Unformatted text preview: rs until N ≥ sqrt(C) Mediumcache regime •
• One matrix starts incurring capacity misses
Others still enjoy locality Smallcache regime • Tuesday, February 2, 2010 Two matrices start suffering capacity misses Predicting performance
• Itanium 2 architecture: •
• Line size: 16 doubles
Cache size (L2): 32K doubles 3/(4bN )
ratioijk (N, b, C ) =
1/(4b) (b + 1)/(4b) • Predicted miss rates: •
•
• Tuesday, February 2, 2010 decrease while N ≤ 181 1.5625% while N ≤ 1927
26.5625% afterwards N ≤ (C )
N ≤ C/(b + 1)
otherwise Miss ratios 8
k inner
i inner
j inner L2 Miss Rate (%) 6 4 2 0
0 50 100 150 200 N (elements)
Tuesday, February 2, 2010 250 300 350 Miss ratios 60
k inner
i inner
j inner
50 L2 Miss Rate 40 30 20 10 0
0 1000 2000 N (elements)
Tuesday, February 2, 2010 3000 4000 Loop permutation
•
•
•
•
•
• Loop permutation clearly an important transformation • Can lead to massive performance improvements How do we determine when loop permutation is legal?
How do we automatically generate permuted code? •
• Straightforward for some loops (like MMM)
Much harder for other loops How do we know if loop permutation will be useful? • Don’t want to change ikj loop into jki loop! Are there other transformations we might want to perform?
The next set of lectures will answer these questions Tuesday, February 2, 2010...
View
Full
Document
This document was uploaded on 04/05/2014.
 Spring '14

Click to edit the document details