14 4714 2508 3605 1919 3218 626 1252 176 901 151 901

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: er code (Figure 5.22). 5.10. ENHANCING PARALLELISM 241 code/opt/combine.c 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 /* Unroll loop by 2, 2-way parallelism */ void combine6(vec_ptr v, data_t *dest) { int length = vec_length(v); int limit = length-1; data_t *data = get_vec_start(v); data_t x0 = IDENT; data_t x1 = IDENT; int i; /* Combine 2 elements at a time */ for (i = 0; i < limit; i+=2) { x0 = x0 OPER data[i]; x1 = x1 OPER data[i+1]; } /* Finish any remaining elements */ for (; i < length; i++) { x0 = x0 OPER data[i]; } *dest = x0 OPER x1; } code/opt/combine.c Figure 5.24: Unrolling Loop by 2 and Using Two-Way Parallelism. This approach makes use of the pipelining capability of the functional units. 5.10 Enhancing Parallelism At this point, our programs are limited by the latency of the functional units. As the third column in Figure 5.12 shows, however, several functional units of the processor are pipelined, meaning that they can start on a new operation before the...
View Full Document

Ask a homework question - tutors are online