23 it is hard to imagine why the pointer code

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: has even begun. We can also see the effect of pipelining. Each iteration requires at least seven cycles from start to end, but successive iterations are completed every 4 cycles. Thus, the effective processing rate is one iteration every 4 cycles, giving a CPE of 4.0. The four-cycle latency of integer multiplication constrains the performance of the processor for this program. Each imull operation must wait until the previous one has completed, since it needs the result of 5.7. UNDERSTANDING MODERN PROCESSORS 229 %edx.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Cycle imull %ecx.0 incl load t.1 %edx.1 cmpl jl i=0 cc.1 incl load t.2 %edx.2 cmpl jl cc.2 incl load t.3 %edx.3 cmpl jl cc.3 %ecx.1 Iteration 1 imull i=1 %ecx.2 Iteration 2 imull i=2 %ecx.3 Iteration 3 Figure 5.15: Scheduling of Operations for Integer Multiplication with Unlimited Number of Execution Units. The 4 cycle latency of the multiplier is the performance-limiting resource. 230 %edx.0 CHAPTER 5. OPTIMIZING PROGRAM PERFORMANCE 1...
View Full Document

Ask a homework question - tutors are online