Observe how the cycle times generally decline as we

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: previous one is completed. Our code cannot take advantage of this capability, even with loop unrolling, since we are accumulating the value as a single variable x. We cannot compute a new value of x until the preceding computation has completed. As a result, the processor will stall, waiting to begin a new operation until the current one has completed. This limitation shows clearly in Figures 5.15 and 5.17. Even with unbounded processor resources, the multiplier can only produce a new result every four clock cycles. Similar limitations occur with floating-point addition (three cycles) and multiplication (five cycles). 5.10.1 Loop Splitting For a combining operation that is associative and commutative, such as integer addition or multiplication, we can improve performance by splitting the set of combining operations into two or more parts and combining 242 CHAPTER 5. OPTIMIZING PROGRAM PERFORMANCE %edx.0 addl %edx.1 Execution Unit Operations load (%eax, %edx.0, 4) imull t.1a, %ecx.0 load 4(%eax, %edx.0, 4) imull t.1b, %eb...
View Full Document

This note was uploaded on 09/02/2010 for the course ELECTRICAL 360 taught by Professor Schultz during the Spring '10 term at BYU.

Ask a homework question - tutors are online