This preview shows page 1. Sign up to view the full content.
Unformatted text preview: o a desire to remain compatible with earlier 486 and Pentium processors, the compiler does not take advantage of these new features. In our experiments, we used the handwritten assembly code shown above. A version using GCC’s facility to embed assembly code within a C program (Section 3.15) required 17.1 cycles due to poorer quality code generation. Unfortunately, there is not much a C programmer can do to improve the branch performance of a program, except to recognize that data-dependent branches incur a high cost in terms of performance. Beyond this, the programmer has little control over the detailed branch structure generated by the compiler, and it is hard to make branches more predictable. Ultimately, we must rely on a combination of good code generation by the compiler to minimize the use of conditional branches, and effective branch prediction by the processor to reduce the number of branch mispredictions. 5.13 Understanding Memory Performance
All of the code we have written,...
View Full Document