50 400 150 200 150 400 150 200 125 400 125 200 125

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: e assembly code for the inner loop and its translation into operations. Assembly Instructions .L49: addl (%eax,%edx,4),%ecx addl 4(%eax,%edx,4),%ecx addl 8(%eax,%edx,4),%ecx addl %edx,3 cmpl %esi,%edx jl .L49 Execution Unit Operations load (%eax, %edx.0, 4) addl t.1a, %ecx.0c load 4(%eax, %edx.0, 4) addl t.1b, %ecx.1a load 8(%eax, %edx.0, 4) addl t.1c, %ecx.1b addl %edx.0, 3 cmpl %esi, %edx.1 jl-taken cc.1 t.1a %ecx.1a t.1b %ecx.1b t.1c %ecx.1c %edx.1 cc.1 As mentioned earlier, loop unrolling by itself will only help the performance of the code for the case of integer sum, since our other cases are limited by the latency of the functional units. For integer sum, threeway unrolling allows us to combine three elements with six integer/branch operations, as shown in Figure 5.20. With two functional units for these operations, we could potentially achieve a CPE of 1.0. Figure 5.21 236 CHAPTER 5. OPTIMIZING PROGRAM PERFORMANCE %edx.2 5 6 7 8 9 %ecx.2c 10 11 12 13 14 15 Cycle i=9 i=6 load load addl t.3a t.3b t.3c %ecx....
View Full Document

This note was uploaded on 09/02/2010 for the course ELECTRICAL 360 taught by Professor Schultz during the Spring '10 term at BYU.

Ask a homework question - tutors are online