For integer and oating point product however we

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: cant delay occurs getting the processor restarted at the correct location. 5.8 Reducing Loop Overhead The performance of combine4 for integer addition is limited by the fact that each iteration contains four instructions, with only two functional units capable of performing them. Only one of these four instructions operates on the program data. The others are part of the loop overhead of computing the loop index and testing the loop condition. We can reduce overhead effects by performing more data operations in each iteration, via a technique known as loop unrolling. The idea is to access and combine multiple array elements within a single iteration. The resulting program requires fewer iterations, leading to reduced loop overhead. Figure 5.19 shows a version of our combining code using three-way loop unrolling. The first loop steps through the array three elements at a time. That is, the loop index i is incremented by three on each iteration, and the combining operation is applied to array elements , · ½, and · ¾ in a single iteration. 234 CHAPTER 5. OPTIMIZING PROGRAM PER...
View Full Document

This note was uploaded on 09/02/2010 for the course ELECTRICAL 360 taught by Professor Schultz during the Spring '10 term at BYU.

Ask a homework question - tutors are online