0 513 understanding memory performance 255

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: oad unit. This leads to the following comparison between the achieved performance versus the theoretical limits: Method Achieved Theoretical Limit Integer + * 1.06 1.25 1.00 1.00 Floating Point + * 1.50 2.00 1.00 2.00 In this table, we have chosen the combination of unrolling and parallelism that achieves the best performance for each case. We have been able to get close to the theoretical limit for integer sum and product and for floating-point product. Some machine-dependent factor limits the achieved CPE for floating-point multiplication to 1.50 rather than the theoretical limit of 1.0. 5.11 Putting it Together: Summary of Results for Optimizing Combining Code We have now considered six versions of the combining code, some of which had multiple variants. Let us pause to take a look at the overall effect of this effort, and how our code would do on a different machine. Figure 5.27 shows the measured performance for all of our routines plus several other variants. As can 248 CHAPTER 5. OPTIMIZING PROGRAM PERFORMANCE be seen, we achieve maximum performance for the integer sum by simply unrolling the loop many times, wher...
View Full Document

This note was uploaded on 09/02/2010 for the course ELECTRICAL 360 taught by Professor Schultz during the Spring '10 term at BYU.

Ask a homework question - tutors are online