ia-32_volume1_basic-arch

Registers can be saved in two ways using an fxsave

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: on Reference Manual. The non-temporal store instructions (MOVNTI, MOVNTPD, MOVNTPS, MOVNTDQ, MOVNTQ, MASKMOVQ, and MASKMOVDQU) minimize cache pollution when writing non-temporal data to memory (see Section 10.4.6.2, "Caching of Temporal vs. Non- 11-36 Vol. 1 PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2) Temporal Data," and Section 10.4.6.1, "Cacheability Control Instructions"). They prevent non-temporal data from being written into processor caches on a store operation. These instructions are implementation specific. Programmers may have to tune their applications for each IA-32 processor implementation to take advantage of these instructions. Besides reducing cache pollution, the use of weakly-ordered memory types can be important under certain data sharing relationships, such as a producer-consumer relationship. The use of weakly ordered memory can make the assembling of data more efficient; but care must be taken to ensure that the consumer obtains the data that the producer intended. Some common usage models that may be affected in this way by weakly-ordered stores are: Library functions that use weakly ordered memory to write results Compiler-generated code that writes weakly-ordered results Hand-crafted code The degree to which a consumer of data knows that the data is weakly ordered can vary for these cases. As a result, the SFENCE or MFENCE instruction should be used to ensure ordering between routines that produce weakly-ordered data and routines that consume the data. SFENCE and MFENCE provide a performance-efficient way to ensure ordering by guaranteeing that every store instruction that precedes SFENCE/MFENCE in program order is globally visible before a store instruction that follows the fence. 11.6.14 Effect of Instruction Prefixes on the SSE/SSE2 Instructions Table 11-3 describes the effects of instruction prefixes on SSE and SSE2 instructions. (Table 11-3 also applies to SIMD integer and SIMD floating-point instructions in SSE3.) Unpredictable behavior can range from prefixes being treated as a reserved operation on one generation of IA-32 processors to generating an invalid opcode exception on another generation of processors. See also "Instruction Prefixes" in Chapter 2 of the Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 2A, for complete description of instruction prefixes. NOTE Some SSE/SSE2/SSE3 instructions have two-byte opcodes that are either 2 bytes or 3 bytes in length. Two-byte opcodes that are 3 bytes in length consist of: a mandatory prefix (F2H, F3H, or 66H), 0FH, and an opcode byte. See Table 11-3. Vol. 1 11-37 PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2) Table 11-3. Effect of Prefixes on SSE, SSE2, and SSE3 Instructions Prefix Type Address Size Prefix (67H) Effect on SSE, SSE2 and SSE3 Instructions Affects instructions with a memory operand. Reserved for instructions without a memory operand and may result in unpredictable behavior. Operand Size (66H) Se...
View Full Document

This note was uploaded on 10/01/2013 for the course CPE 103 taught by Professor Watlins during the Winter '11 term at Mississippi State.

Ask a homework question - tutors are online