NOT_USEFUL-lecture-03

2010 two matrices start suffering capacity misses

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: rs until N ≥ sqrt(C) Medium-cache regime • • One matrix starts incurring capacity misses Others still enjoy locality Small-cache regime • Tuesday, February 2, 2010 Two matrices start suffering capacity misses Predicting performance • Itanium 2 architecture: • • Line size: 16 doubles Cache size (L2): 32K doubles 3/(4bN ) ratioijk (N, b, C ) = 1/(4b) (b + 1)/(4b) • Predicted miss rates: • • • Tuesday, February 2, 2010 decrease while N ≤ 181 1.5625% while N ≤ 1927 26.5625% afterwards N ≤ (C ) N ≤ C/(b + 1) otherwise Miss ratios 8 k inner i inner j inner L2 Miss Rate (%) 6 4 2 0 0 50 100 150 200 N (elements) Tuesday, February 2, 2010 250 300 350 Miss ratios 60 k inner i inner j inner 50 L2 Miss Rate 40 30 20 10 0 0 1000 2000 N (elements) Tuesday, February 2, 2010 3000 4000 Loop permutation • • • • • • Loop permutation clearly an important transformation • Can lead to massive performance improvements How do we determine when loop permutation is legal? How do we automatically generate permuted code? • • Straightforward for some loops (like MMM) Much harder for other loops How do we know if loop permutation will be useful? • Don’t want to change ikj loop into jki loop! Are there other transformations we might want to perform? The next set of lectures will answer these questions Tuesday, February 2, 2010...
View Full Document

Ask a homework question - tutors are online