30 this function computes the length of a linked list

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Unroll ¢¾, Parallelism ¢¾ Unroll ¢ , Parallelism ¢¾ Unroll ¢ , Parallelism ¢ Integer + * 42.06 41.86 31.25 33.25 20.66 21.25 6.00 9.00 2.00 4.00 1.50 4.00 1.06 4.00 1.50 2.00 1.50 2.00 1.25 1.25 39.7 33.5 Floating Point + * 41.44 160.00 31.25 143.00 21.15 135.00 8.00 117.00 3.00 5.00 3.00 5.00 3.00 5.00 2.00 2.50 1.50 2.50 1.50 2.00 27.6 80.0 Worst:Best Figure 5.27: Comparative Result for All Combining Routines. The best performing version is shown in bold face. 5.10.3 Limits to Parallelism For our benchmarks, the main performance limitations are due to the capabilities of the functional units. As Figure 5.12 shows, the integer multiplier and the floating-point adder can only initiate a new operation every clock cycle. This, plus a similar limitation on the load unit limits these cases to a CPE of 1.0. The floating-point multiplier can only initiate a new operation every two clock cycles. This limits this case to a CPE of 2.0. Integer sum is limited to a CPE of 1.0, due to the limitations of the l...
View Full Document

Ask a homework question - tutors are online