IntelSoftwareDevelopersManual

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: mic allocation, check that the compiler aligns doubleword or quadword values on 8-byte boundaries. If the compiler does not implement this alignment, then use the following technique to align doublewords and quadwords for optimum code execution: 1. Allocate memory equal to the size of the array or structure plus 4 bytes. 2. Use “bitwise” and to make sure that the array is aligned, for example: double a[5]; double *p, *newp; p = (double*)malloc ((sizeof(double)*5)+4) newp = (p+4) & (-7) 14-11 CODE OPTIMIZATION 14.5. INSTRUCTION SCHEDULING OVERVIEW On all Intel Architecture processors, the scheduling of (arrangement of) instructions in the instruction stream can have a significant affect on the execution speed of the processor. For example, when executing code on a Pentium ® or later Intel Architecture processor, two 1-clock instructions that do not have register or data dependencies between them can generally be executed in parallel (in a single clock) if they are paired—placed adjacent to one another in the instruction stream. Likewise, a long-latency instruction such as a floating-point instruction can often be executed in parallel with a sequence of 1-clock integer instructions or shorter latency floating-point instructions if the instructions are scheduled appropriately in the instruction stream. The following sections describe two aspects of scheduling that can provide improved performance in Intel Architecture processors: pairing and pipelining. Pairing is generally used to optimize the execution of integer and MMX™ instructions; pipelining is generally used to optimize the execution of MMX™ and floating-point instructions. 14.5.1. Instruction Pairing Guidelines The microarchitecture for the Pentium® family of processors (with and without MMX™ technology) contain two instruction execution pipelines: the U-pipe and the V-pipe. These pipelines are capable of executing two Intel Architecture instructions in parallel (during the same clock or clocks) if the two instructions are pairable. Pairable instructions are those instructions that when they appear adjacent to one another in the instruction stream will normally be executed in parallel. By ordering a code sequence so that whenever possible pairable instructions occur sequentially, code can be optimized to take advantage of the Pentium® processor’s two-pipe microarchitecture. NOTE Pairing of instructions improves Pentium® processor performance significantly. It does not slow and sometimes improves the performance of P6 family processors. The following subsections describe the Pentium® processor pairing rules for integer, MMX™, and, floating-point instructions. The pairing rules are grouped into types, as follows: • • • • General pairing rules Integer instruction pairing rules. MMX™ instruction pairing rules. Floating-point instruction pairing rules. GENERAL PAIRING RULES 14.5.1.1. The following are general rules for instruction pairing in code written to run on Pentium® processors: 14-12 CODE OPTIMIZATION • • Unpairable instructions are always executed in the U-pipe. For paired instructions to execute in parallel, the first instruction of the pair must fall on an instruction boundary that forces the instruction to be executed in the U-pipe. The following placements of an instruction in the instruction stream will force an instruction to be executed in the U-pipe: — If the first instruction of a pair of pairable instructions is the first instruction in a block of code, the first instruction will be executed in the U-pipe and the second of the pair will be executed in the V-pipe, resulting in parallel execution of the two instructions. — If the first instruction of a pair of pairable instructions follows an unpairable instruction in the instruction stream, the first of the pairable instructions will be executed in the U-pipe and the second of the pair in the V-pipe, resulting in parallel execution. — After one pair of instructions has been executed in parallel, subsequent pairs will also be executed in parallel until an unpairable instruction is encountered. • Parallel execution of paired instructions will not occur if: — The next two instructions are not pairable instructions. — The next two instructions have some type of register contention (implicit or explicit). There are some special exceptions (see “Special Pairs”, in Section 14.5.1.2., “Integer Pairing Rules”) to this rule where register contention can occur with pairing. — The instructions are not both in the instruction cache. An exception to this that permits pairing is if the first instruction is a one byte instruction. — The processor is operating in single-step mode. • • Instructions that have data dependencies should be separated by at least one other instruction. Pentium® processors without MMX™ technology do not execute a set of paired instructions if either instruction is longer than 7 bytes; Pentium® processors with MMX™ technology do not execute a set...
View Full Document

This note was uploaded on 06/07/2013 for the course ECE 1234 taught by Professor Kwhon during the Spring '10 term at University of California, Berkeley.

Ask a homework question - tutors are online