Unformatted text preview: on Reference Manual. The non-temporal store instructions (MOVNTI, MOVNTPD, MOVNTPS, MOVNTDQ, MOVNTQ, MASKMOVQ, and MASKMOVDQU) minimize cache pollution when writing non-temporal data to memory (see Section 10.4.6.2, "Caching of Temporal vs. Non- 11-36 Vol. 1 PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2) Temporal Data," and Section 10.4.6.1, "Cacheability Control Instructions"). They prevent non-temporal data from being written into processor caches on a store operation. These instructions are implementation specific. Programmers may have to tune their applications for each IA-32 processor implementation to take advantage of these instructions. Besides reducing cache pollution, the use of weakly-ordered memory types can be important under certain data sharing relationships, such as a producer-consumer relationship. The use of weakly ordered memory can make the assembling of data more efficient; but care must be taken to ensure that the consumer obtains the data that the producer intended. Some common usage models that may be affected in this way by weakly-ordered stores are: Library functions that use weakly ordered memory to write results Compiler-generated code that writes weakly-ordered results Hand-crafted code The degree to which a consumer of data knows that the data is weakly ordered can vary for these cases. As a result, the SFENCE or MFENCE instruction should be used to ensure ordering between routines that produce weakly-ordered data and routines that consume the data. SFENCE and MFENCE provide a performance-efficient way to ensure ordering by guaranteeing that every store instruction that precedes SFENCE/MFENCE in program order is globally visible before a store instruction that follows the fence. 11.6.14 Effect of Instruction Prefixes on the SSE/SSE2 Instructions
Table 11-3 describes the effects of instruction prefixes on SSE and SSE2 instructions. (Table 11-3 also applies to SIMD integer and SIMD floating-point instructions in SSE3.) Unpredictable behavior can range from prefixes being treated as a reserved operation on one generation of IA-32 processors to generating an invalid opcode exception on another generation of processors. See also "Instruction Prefixes" in Chapter 2 of the Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 2A, for complete description of instruction prefixes. NOTE
Some SSE/SSE2/SSE3 instructions have two-byte opcodes that are either 2 bytes or 3 bytes in length. Two-byte opcodes that are 3 bytes in length consist of: a mandatory prefix (F2H, F3H, or 66H), 0FH, and an opcode byte. See Table 11-3. Vol. 1 11-37 PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2) Table 11-3. Effect of Prefixes on SSE, SSE2, and SSE3 Instructions
Prefix Type Address Size Prefix (67H) Effect on SSE, SSE2 and SSE3 Instructions Affects instructions with a memory operand. Reserved for instructions without a memory operand and may result in unpredictable behavior. Operand Size (66H) Se...
View Full Document
- Winter '11
- X86, Intel corporation, 64-bit mode, fpu floating-point exception, FPU Control Instructions