This preview shows page 1. Sign up to view the full content.
Unformatted text preview: x1 to compute the ﬁnal result. To see how this code yields improved performance, let us consider the translation of the loop into operations for the case of integer multiplication: 5.10. ENHANCING PARALLELISM
Assembly Instructions .L151: imull (%eax,%edx,4),%ecx imull 4(%eax,%edx,4),%ebx addl $2,%edx cmpl %esi,%edx jl .L151 Execution Unit Operations load (%eax, %edx.0, 4) imull t.1a, %ecx.0 load 4(%eax, %edx.0, 4) imull t.1b, %ebx.0 addl $2, %edx.0 cmpl %esi, %edx.1 jl-taken cc.1 t.1a %ecx.1 t.1b %ebx.1 %edx.1 cc.1 243 Figure 5.25 shows a graphical representation of these operations for the ﬁrst iteration (i ¼). As this diagram illustrates, the two multiplications in the loop are independent of each other. One has register %ecx as its source and destination (corresponding to program variable x0), while the other has register %ebx as its source and destination (corresponding to program variable x1). The second multiplication can start just one cycle after the ﬁrst. This makes use of the pipelining capabilities of both the load unit and the integer multiplier. Figure 5....
View Full Document
This note was uploaded on 09/02/2010 for the course ELECTRICAL 360 taught by Professor Schultz during the Spring '10 term at BYU.
- Spring '10
- The American