Unformatted text preview: destination operand. The PMOVMSKB (move byte mask) instruction creates an 8-bit mask from the packed byte integers in an MMX register and stores the result in the low byte of a generalpurpose register. The mask contains the most significant bit of each byte in the MMX register. (When operating on 128-bit operands, a 16-bit mask is created.) The PMULHUW (multiply packed unsigned word integers and store high result) instruction performs a SIMD unsigned multiply of the words in the two source operands and returns the high word of each result to an MMX register. The PSADBW (compute sum of absolute differences) instruction computes the SIMD absolute differences of the corresponding unsigned byte integers in two source operands, sums the differences, and stores the sum in the low word of the destination operand. The PSHUFW (shuffle packed word integers) instruction shuffles the words in the source operand according to the order specified by an 8-bit immediate operand and returns the result to the destination operand. 10.4.5 MXCSR State Management Instructions The MXCSR state management instructions (LDMXCSR and STMXCSR) load and save the state of the MXCSR register, respectively. The LDMXCSR instruction loads the MXCSR register from memory, while the STMXCSR instruction stores the contents of the register to memory. 10.4.6 Cacheability Control, Prefetch, and Memory Ordering Instructions SSE extensions introduce several new instructions to give programs more control over the caching of data. They also introduces the PREFETCHh instructions, which provide the ability to prefetch data to a specified cache level, and the SFENCE instruction, which enforces program ordering on stores. These instructions are described in the following sections. Vol. 1 10-17 PROGRAMMING WITH STREAMING SIMD EXTENSIONS (SSE) 10.4.6.1 Cacheability Control Instructions The following three instructions enable data from the MMX and XMM registers to be stored to memory using a non-temporal hint. The non-temporal hint directs the processor to when possible store the data to memory without writing the data into the cache hierarchy. See Section 10.4.6.2, "Caching of Temporal vs. Non-Temporal Data," for information about non-temporal stores and hints. The MOVNTQ (store quadword using non-temporal hint) instruction stores packed integer data from an MMX register to memory, using a non-temporal hint. The MOVNTPS (store packed single-precision floating-point values using nontemporal hint) instruction stores packed floating-point data from an XMM register to memory, using a non-temporal hint. The MASKMOVQ (store selected bytes of quadword) instruction stores selected byte integers from an MMX register to memory, using a byte mask to selectively write the individual bytes. This instruction also uses a non-temporal hint. 10.4.6.2 Caching of Temporal vs. Non-Temporal Data Data referenced by a program can be temporal (data will be used again) or nontemporal (data will be referenced once and...
View Full Document
- Winter '11