This preview shows page 1. Sign up to view the full content.
Unformatted text preview: ing saved to avoid recomputation. Practice Problem 5.4:
The following shows the code generated from a variant of combine6 that uses eight-way loop unrolling and four-way parallelism.
1 2 3 4 5 6 7 8 9 10 11 12 13 .L152: addl (%eax),%ecx addl 4(%eax),%esi addl 8(%eax),%edi addl 12(%eax),%ebx addl 16(%eax),%ecx addl 20(%eax),%esi addl 24(%eax),%edi addl 28(%eax),%ebx addl $32,%eax addl $8,%edx cmpl -8(%ebp),%edx jl .L152 A. What program variable has being spilled onto the stack? B. At what location on the stack? C. Why is this a good choice of which value to spill? With ﬂoating-point data, we want to keep all of the local variables in the ﬂoating-point register stack. We also need to keep the top of stack available for loading data from memory. This limits us to a degree of parallelism less than or equal to 7. 5.11. PUTTING IT TOGETHER: SUMMARY OF RESULTS FOR OPTIMIZING COMBINING CODE247
Function combine1 combine1 combine2 combine3 combine4 combine5 combine6 Page 211 211 212 217 219 234 241 Method Abstract unoptimized Abstract -O2 Move vec length Direct data access Accumulate in temporary Unroll ¢ Unroll ¢...
View Full Document
- Spring '10
- The American