Organization enhanced drams there are many

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ract version of the function has a CPE of 54 for both integer and floating-point data. By doing the same sort of transformations we did to transform the abstract program combine1 into the more efficient combine4, we get the following code: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 /* Accumulate in temporary */ void inner4(vec_ptr u, vec_ptr v, data_t *dest) { int i; int length = vec_length(u); data_t *udata = get_vec_start(u); data_t *vdata = get_vec_start(v); data_t sum = (data_t) 0; for (i = 0; i < length; i++) { sum = sum + udata[i] * vdata[i]; } *dest = sum; } 5.16. SUMMARY 269 Our measurements show that this function requires 3.11 cycles per iteration for integer data. The assembly code for the inner loop is: udata in %esi, vdata in %ebx, i in %edx, sum in %ecx, length in %edi loop: Get udata[i] Multiply by vdata[i] Add to sum i++ Compare i:length If <, goto loop 1 2 3 4 5 6 7 .L24: movl (%esi,%edx,4),%eax imull (%ebx,%edx,4),%eax addl %eax,%ecx incl %edx cmpl %edi,%edx jl .L24 Assume that integer multiplication is perfo...
View Full Document

This note was uploaded on 09/02/2010 for the course ELECTRICAL 360 taught by Professor Schultz during the Spring '10 term at BYU.

Ask a homework question - tutors are online