324_Book

# E value of zero in the data section this is followed

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: apters 5 and 6 to the problem of optimizing code for a memory intensive application. Consider a procedure to copy and transpose the elements of an Æ ¢ Æ matrix of type int. That is, for source matrix Ë and destination matrix , we want to copy each element × to . This code can be written with a simple loop: 1 2 3 4 5 6 7 8 void transpose(int *dst, int *src, int dim) { int i, j; for (i = 0; i &lt; dim; i++) for (j = 0; j &lt; dim; j++) dst[j*dim + i] = src[i*dim + j]; } where the arguments to the procedure are pointers to the destination (dst) and source (src) matrices, as well as the matrix size Æ (dim). Making this code run fast requires two types of optimizations. First, although the routine does a good job exploiting the spatial locality of the source matrix, it does a poor job for large values of Æ with the destination matrix. Second, the code generated by GCC is not very efﬁcient. Looking at the assembly code, one sees that the inner loop requires 10 instructions, 5 of which reference memory—one for the source, one for the destination, and three to read local variables fr...
View Full Document

## This note was uploaded on 09/02/2010 for the course ELECTRICAL 360 taught by Professor Schultz during the Spring '10 term at BYU.

Ask a homework question - tutors are online