Unformatted text preview: o notice about this graph: and : Class ; and ¯ ¯ ¯ ¯ For large Ò, the fastest version runs three times faster than the slowest version, even though each performs the same number of ﬂoatingpoint arithmetic operations. Versions with the same number and locality of memory accesses have roughly the same measured performance. The two versions with the worst memory behavior, in terms of the number of accesses and misses per iteration, run signiﬁcantly slower than the other four versions, which have fewer misses or fewer accesses, or both. The Class routines — 2 memory accesses and 1.25 misses per iteration — perform somewhat better on this particular machine than the Class routines — 3 memory accesses and 0.5 misses per iteration — which trade off an additional memory reference for a lower miss rate. The point is that cache misses are not the whole story when it comes to performance. The number of memory accesses 6.6. PUTTING IT TOGETHER: THE IMPACT OF CACHES ON PROGRAM PERFORMANCE 335 is also important, and in many cases, ﬁnding the best performance involves a tradeoff between the two. Problems 6.32...
View
Full
Document
This note was uploaded on 09/02/2010 for the course ELECTRICAL 360 taught by Professor Schultz during the Spring '10 term at BYU.
 Spring '10
 Schultz
 The American, Gulliver's Travels, 2.2.5 2.2.6 2.2.7 2.3 2.3.1 2.3.2 2.3.3 2.3.4 2.3.5

Click to edit the document details