This preview shows page 1. Sign up to view the full content.
Unformatted text preview: x.0 addl $2, %edx.0 cmpl %esi, %edx.1 jl-taken cc.1 load cmpl load jl
cc.1 t.1a %ecx.1 t.1b %ebx.1 %edx.1 cc.1 %ecx.0 %ebx.0 t.1a t.1b imull imull
%ecx.1 %ebx.1 Figure 5.25: Operations for First Iteration of Inner Loop of Two-Way Unrolled, Two-Way Parallel Integer Multiplication. The two multiplication operations are logically independent. the results at the end. For example, let ÈÒ denote the product of elements
¼ ½ Ò ½ : È Ò Ò ½
¼ Assuming Ò is even, we can also write this as ÈÒ È Ò ¢ È ÇÒ , where È Ò is the product of the elements with even indices, and È ÇÒ is the product of the elements with odd indices: È ÈÇ Ò Ò ¾ ¾
¾ Ò Ò ¾ ¾
¾ ·½ ¼ ¼ Figure 5.24 shows code that uses this method. It uses both two-way loop unrolling to combine more elements per iteration, and two-way parallelism, accumulating elements with even index in variable x0, and elements with odd index in variable x1. As before, we include a second loop to accumulate any remaining array elements for the case where the vector length is not a multiple of 2. We then apply the combining operation to x0 and...
View Full Document
- Spring '10
- The American