If the prediction is incorrect the processor must

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: x1 to compute the final result. To see how this code yields improved performance, let us consider the translation of the loop into operations for the case of integer multiplication: 5.10. ENHANCING PARALLELISM Assembly Instructions .L151: imull (%eax,%edx,4),%ecx imull 4(%eax,%edx,4),%ebx addl $2,%edx cmpl %esi,%edx jl .L151 Execution Unit Operations load (%eax, %edx.0, 4) imull t.1a, %ecx.0 load 4(%eax, %edx.0, 4) imull t.1b, %ebx.0 addl $2, %edx.0 cmpl %esi, %edx.1 jl-taken cc.1 t.1a %ecx.1 t.1b %ebx.1 %edx.1 cc.1 243 Figure 5.25 shows a graphical representation of these operations for the first iteration (i ¼). As this diagram illustrates, the two multiplications in the loop are independent of each other. One has register %ecx as its source and destination (corresponding to program variable x0), while the other has register %ebx as its source and destination (corresponding to program variable x1). The second multiplication can start just one cycle after the first. This makes use of the pipelining capabilities of both the load unit and the integer multiplier. Figure 5....
View Full Document

This note was uploaded on 09/02/2010 for the course ELECTRICAL 360 taught by Professor Schultz during the Spring '10 term at BYU.

Ask a homework question - tutors are online