C figure 530 linked list functions these illustrate

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ing saved to avoid recomputation. Practice Problem 5.4: The following shows the code generated from a variant of combine6 that uses eight-way loop unrolling and four-way parallelism. 1 2 3 4 5 6 7 8 9 10 11 12 13 .L152: addl (%eax),%ecx addl 4(%eax),%esi addl 8(%eax),%edi addl 12(%eax),%ebx addl 16(%eax),%ecx addl 20(%eax),%esi addl 24(%eax),%edi addl 28(%eax),%ebx addl $32,%eax addl $8,%edx cmpl -8(%ebp),%edx jl .L152 A. What program variable has being spilled onto the stack? B. At what location on the stack? C. Why is this a good choice of which value to spill? With floating-point data, we want to keep all of the local variables in the floating-point register stack. We also need to keep the top of stack available for loading data from memory. This limits us to a degree of parallelism less than or equal to 7. 5.11. PUTTING IT TOGETHER: SUMMARY OF RESULTS FOR OPTIMIZING COMBINING CODE247 Function combine1 combine1 combine2 combine3 combine4 combine5 combine6 Page 211 211 212 217 219 234 241 Method Abstract unoptimized Abstract -O2 Move vec length Direct data access Accumulate in temporary Unroll ¢ Unroll ¢...
View Full Document

Ask a homework question - tutors are online