These facilities are valuable for debugging

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: be maintained by zero extending 16-bit operands. Specifically, the C code in the following example does not need sign extension nor does it need prefixes for operand size overrides. static short int a, b; if (a==b) { ... } • Code for comparing these 16-bit operands might be: U Pipe V Pipe xor eax, eax movw ax, [a] movw bx, [b] xor ebx, ebx ;1 ; 2 (prefix) + 1 ; 4 (prefix) + 1 cmp eax, ebx ;6 Of course, this can only be done under certain circumstances, but the circumstances tend to be quite common. This would not work if the compare was for greater than, less than, greater than or equal, and so on, or if the values in EAX or EBX were to be used in another operation where sign extension was required. The P6 family processors provides special support for the XOR reg, reg instruction where both operands point to the same register, recognizing that clearing a register does not depend on the old value of the register. Additionally, special support is provided for the above specific code sequence to avoid the partial stall. 14-33 CODE OPTIMIZATION The following straight-forward method may be slower on Pentium® processors. movsw movsw cmp eax, a ebx, b ebx, eax ; 1 prefix + 3 ;5 ;9 However, the P6 family processors have improved the performance of the MOVZX instructions to reduce the prevalence of partial stalls. Code written specifically for the P6 family processors should use the MOVZX instructions. • Compares. Use the TEST instruction when comparing a value in a register with 0. TEST essentially ANDs the operands together without writing to a destination register. If a value is ANDed with itself and the result sets the zero condition flag, the value was zero. TEST is preferred over an AND instruction because AND writes the result register which may subsequently cause an AGI or an artificial output dependence on the P6 family processors. TEST is better than CMP .., 0 because the instruction size is smaller. Use the TEST instruction when comparing the result of a boolean AND with an immediate constant for equality or inequality if the register is EAX (if (avar & 8) { }). On the Pentium® processor, the TEST instruction is a 1 clock pairable instruction when the form is TEST EAX, imm or TEST reg, reg. Other forms of TEST take 2 clocks and do not pair. • Address Calculations. Pull address calculations into load and store instructions. Internally, memory reference instructions can have 4 operands: a relocatable load-time constant, an immediate constant, a base register, and a scaled index register. (In the segmented model, a segment register may constitute an additional operand in the linear address calculation.) In many cases, several integer instructions can be eliminated by fully using the operands of memory references. Clearing a Register. The preferred sequence to move zero to a register is XOR reg, reg. This sequence saves code space but sets the condition codes. In contexts where the condition codes must be preserved, use MOV reg, 0. Integer Divide. Typically, an integer divide is preceded by a CDQ instruction. (Divide instructions use EDX: EAX as the dividend and CDQ sets up EDX.) It is better to copy EAX into EDX, then right shift EDX 31 places to sign extend. On the Pentium® processor, the copy/shift takes the same number of clocks as CDQ, but the copy/shift scheme allows two other instructions to execute at the same time. If the value is known to be positive, use XOR EDX, EDX. On the P6 family processors, the CDQ instruction is faster, because CDQ is a single micro-op instruction as opposed to two instructions for the copy/shift sequence. • • • Prolog Sequences. Be careful to avoid AGIs in the procedure and function prolog sequences due to register ESP. Since PUSH can pair with other PUSH instructions, saving callee-saved registers on entry to functions should use these instructions. If possible, load parameters before decrementing ESP. 14-34 CODE OPTIMIZATION In routines that do not call other routines (leaf routines), use ESP as the base register to free up EBP. If you are not using the 32-bit flat model, remember that EBP cannot be used as a general purpose base register because it references the stack segment. • Avoid Compares with Immediate Zero. Often when a value is compared with zero, the operation producing the value sets condition codes that can be tested directly by a Jcc instruction. The most notable exceptions are the MOV and LEA instructions. In these cases, use the TEST instruction. Epilog Sequence. If only 4 bytes were allocated in the stack frame for the current function, instead of incrementing the stack pointer by 4, use POP instructions to prevent AGIs. For the Pentium® processor, use two pops for eight bytes. • 14-35 CODE OPTIMIZATION 14-36 15 Debugging and Performance Monitoring CHAPTER 15 DEBUGGING AND PERFORMANCE MONITORING The Intel Architecture provides extensive debugging facilities for use in debugging code and monitoring code execution and processor performance. These facilities are valuable for debugging applications software, system software, and multitasking operating systems. The debugging support is accessed through the debug registers (DB0 through DB7) and two model-specific regi...
View Full Document

This note was uploaded on 06/07/2013 for the course ECE 1234 taught by Professor Kwhon during the Spring '10 term at University of California, Berkeley.

Ask a homework question - tutors are online