Unformatted text preview: red in spite of the previously described general integerinstruction rules. These special pairs overcome register dependencies, and most involve implicit reads/writes to the ESP register or implicit writes to the condition codes: • Stack Pointer.
push push pop reg/imm reg/imm reg ; push reg/imm ; call ; pop reg • Condition Codes.
cmp add ; jcc ; jne Note that the special pairs that consist of PUSH/POP instructions may have only immediate or register operands, not memory operands. Restrictions On Pair Execution Some integer-instruction pairs may be issued simultaneously but will not execute in parallel: • Data-Cache Conflict—If both instructions access the same data-cache memory bank then the second request (V-pipe) must wait for the first request to complete. A bank conflict occurs when bits 2 through 4 of the two physical addresses are the same. A bank conflict results in a 1-clock penalty on the V-pipe instruction. Inter-Pipe Concurrency—Parallel execution of integer instruction pairs preserves memoryaccess ordering. A multiclock instruction in the U-pipe will execute alone until its last memory access. • For example, the following instructions add the contents of the register and the value at the memory location, then put the result in the register. An add with a memory operand takes 2 clocks to execute. The first clock loads the value from the data cache, and the second clock performs the addition. Since there is only one memory access in the U-pipe instruction, the add in the V-pipe can start in the same clock.
add add (add) eax, meml ebx, mem2 (add) ;1 ; 2 2-cycle The following instructions add the contents of the register to the memory location and store the result at the memory location. An add with a memory result takes 3 clocks to execute. The first clock loads the value, the second performs the addition, and the third stores the result. When paired, the last clock of the U-pipe instruction overlaps with the first clock of the V-pipe instruction execution.
add (add) (add) (add) (add) meml, eax add mem2, ebx ;4 ;5 ;1 ;2 ;3 14-16 CODE OPTIMIZATION No other instructions may begin execution until the instructions already executing have completed. To expose the opportunities for scheduling and pairing, it is better to issue a sequence of simple instructions rather than a complex instruction that takes the same number of clocks. The simple instruction sequence can take advantage of more issue slots. The load/store style code generation requires more registers and increases code size. This impacts Intel486™ processor performance, although only as a second order effect. To compensate for the extra registers needed, extra effort should be put into register allocation and instruction scheduling so that extra registers are only used when parallelism increases. 126.96.36.199. MMX™ INSTRUCTION PAIRING GUIDELINES This section specifies guidelines and restrictions for pairing MMX™ instructions with each other and with integer instructions. Pairing Two MMX™ Instructions The following restrictions apply when pairing of two MMX™ instructions: • Two MMX™ instructions that both use the MMX™ shifter unit (pack, unpack, and shift instructions) are not pairable because there is only one MMX™ shifter unit. Shift operations may be issued in either the U-pipe or the V-pipe, but cannot executed in both pipes in the same clock. Two MMX™ instructions that both use the MMX™ multiplier unit (PMULL, PMULH, PMADD type instructions) are not pairable because there is only one MMX™ multiplier unit. Multiply operations may be issued in either the U-pipe or the V-pipe, but cannot executed in both pipes in the same clock. MMX™ instructions that access either memory or a general-purpose register can be issued in the U-pipe only. Do not schedule these instructions to the V-pipe as they will wait and be issued in the next pair of instructions (and to the U-pipe). The MMX™ destination register of the U-pipe instruction should not match the source or destination register of the V-pipe instruction (dependency check). The EMMS instruction is not pairable with other instructions. If either the TS flag or the EM flag in control register CR0 is set, MMX™ instructions cannot be executed in the V-pipe. • • • • • Pairing an Integer Instruction in the U-Pipe With an MMX™ Instruction in the V-Pipe Use the following guidelines for pairing an integer instruction in the U-pipe and an MMX™ instruction in the V-pipe: • • The MMX™ instruction is not the first MMX™ instruction following a floating-point instruction. The V-pipe MMX™ instruction does not access either memory or a general-purpose register. 14-17 CODE OPTIMIZATION • The U-pipe integer instruction is a pairable U-pipe integer instruction (see Table 14-2). Pairing an MMX™ Instruction in the U-Pipe with an Integer Instruction in the V-Pipe Use the following guidelines for pairing an MMX™ instruction in the U-pipe and an integer instruction in the V-pipe: • • The U-pipe MMX™ instruction does not access either memory or a general-purpose register. The V-pipe instruction is a pairable integer V-pipe instruction (see Table 14-2). 14.5.2. Pipelining Guidelines
The term pipelining refers to the practice of scheduling instructions in the instruction stream to reduce processor stalls due to register, data, or data-cache dependencies. The effect of pipelining on code execution is highly dependent on the family of Intel...
View Full Document
This note was uploaded on 06/07/2013 for the course ECE 1234 taught by Professor Kwhon during the Spring '10 term at Berkeley.
- Spring '10