This preview shows page 1. Sign up to view the full content.
Unformatted text preview: previous one is completed. Our code cannot take advantage of this capability, even with loop unrolling, since we are accumulating the value as a single variable x. We cannot compute a new value of x until the preceding computation has completed. As a result, the processor will stall, waiting to begin a new operation until the current one has completed. This limitation shows clearly in Figures 5.15 and 5.17. Even with unbounded processor resources, the multiplier can only produce a new result every four clock cycles. Similar limitations occur with ﬂoating-point addition (three cycles) and multiplication (ﬁve cycles). 5.10.1 Loop Splitting
For a combining operation that is associative and commutative, such as integer addition or multiplication, we can improve performance by splitting the set of combining operations into two or more parts and combining 242 CHAPTER 5. OPTIMIZING PROGRAM PERFORMANCE
%edx.0 addl %edx.1 Execution Unit Operations load (%eax, %edx.0, 4) imull t.1a, %ecx.0 load 4(%eax, %edx.0, 4) imull t.1b, %eb...
View Full Document
This note was uploaded on 09/02/2010 for the course ELECTRICAL 360 taught by Professor Schultz during the Spring '10 term at BYU.
- Spring '10
- The American