Unformatted text preview: rs until N ≥ sqrt(C) Mediumcache regime •
• One matrix starts incurring capacity misses
Others still enjoy locality Smallcache regime • Tuesday, February 2, 2010 Two matrices start suffering capacity misses Predicting performance
• Itanium 2 architecture: •
• Line size: 16 doubles
Cache size (L2): 32K doubles 3/(4bN )
ratioijk (N, b, C ) =
1/(4b) (b + 1)/(4b) • Predicted miss rates: •
•
• Tuesday, February 2, 2010 decrease while N ≤ 181 1.5625% while N ≤ 1927
26.5625% afterwards N ≤ (C )
N ≤ C/(b + 1)
otherwise Miss ratios 8
k inner
i inner
j inner L2 Miss Rate (%) 6 4 2 0
0 50 100 150 200 N (elements)
Tuesday, February 2, 2010 250 300 350 Miss ratios 60
k inner
i inner
j inner
50 L2 Miss Rate 40 30 20 10 0
0 1000 2000 N (elements)
Tuesday, February 2, 2010 3000 4000 Loop permutation
•
•
•
•
•
• Loop permutation clearly an important transformation • Can lead to massive performance improvements How do we determine when loop permutation is legal?
How do we automatically generate permuted code? •
• Straightforward for some loops (like MMM)
Much harder for other loops How do we know if loop permutation will be useful? • Don’t want to change ikj loop into jki loop! Are there other transformations we might want to perform?
The next set of lectures will answer these questions Tuesday, February 2, 2010...
