This preview shows page 1. Sign up to view the full content.
Unformatted text preview: cant delay occurs getting the processor restarted at the correct location. 5.8 Reducing Loop Overhead
The performance of combine4 for integer addition is limited by the fact that each iteration contains four instructions, with only two functional units capable of performing them. Only one of these four instructions operates on the program data. The others are part of the loop overhead of computing the loop index and testing the loop condition. We can reduce overhead effects by performing more data operations in each iteration, via a technique known as loop unrolling. The idea is to access and combine multiple array elements within a single iteration. The resulting program requires fewer iterations, leading to reduced loop overhead. Figure 5.19 shows a version of our combining code using three-way loop unrolling. The ﬁrst loop steps through the array three elements at a time. That is, the loop index i is incremented by three on each iteration, and the combining operation is applied to array elements , · ½, and · ¾ in a single iteration. 234 CHAPTER 5. OPTIMIZING PROGRAM PER...
View Full Document
This note was uploaded on 09/02/2010 for the course ELECTRICAL 360 taught by Professor Schultz during the Spring '10 term at BYU.
- Spring '10
- The American