Unformatted text preview: ked doubleprecision floatingpoint values from an XMM register into memory Nontemporal store of double quadword from an XMM register into memory Nontemporal store of a doubleword from a generalpurpose register into memory 5.7 SSE3 INSTRUCTIONS The SSE3 extensions offers 13 instructions that accelerate performance of Streaming SIMD Extensions technology, Streaming SIMD Extensions 2 technology, and x87FP math capabilities. These instructions can be grouped into the following categories: One x87FPU instruction used in integer conversion One SIMD integer instruction that addresses unaligned data loads Two SIMD floatingpoint packed ADD/SUB instructions Four SIMD floatingpoint horizontal ADD/SUB instructions Vol. 1 525 INSTRUCTION SET SUMMARY Three SIMD floatingpoint LOAD/MOVE/DUPLICATE instructions Two thread synchronization instructions SSE3 instructions can only be executed on Intel 64 and IA32 processors that support SSE3 extensions. Support for these instructions can be detected with the CPUID instruction. See the description of the CPUID instruction in Chapter 3, "Instruction Set Reference, AM," of the Intel 64 and IA32 Architectures Software Developer's Manual, Volume 2A. The sections that follow describe each subgroup. 5.7.1
FISTTP SSE3 x87FP Integer Conversion Instruction
Behaves like the FISTP instruction but uses truncation, irrespective of the rounding mode specified in the floatingpoint control word (FCW) 5.7.2
LDDQU SSE3 Specialized 128bit Unaligned Data Load Instruction
Special 128bit unaligned load designed to avoid cache line splits 5.7.3
ADDSUBPS SSE3 SIMD FloatingPoint Packed ADD/SUB Instructions
Performs singleprecision addition on the second and fourth pairs of 32bit data elements within the operands; singleprecision subtraction on the first and third pairs Performs doubleprecision addition on the second pair of quadwords, and doubleprecision subtraction on the first pair ADDSUBPD 5.7.4
HADDPS SSE3 SIMD FloatingPoint Horizontal ADD/SUB Instructions
Performs a singleprecision addition on contiguous data elements. The first data element of the result is obtained by adding the first and second elements of the first operand; the second element by adding the third and fourth elements of the first operand; the third by adding the first and second elements of the second operand; and the fourth by adding the third and fourth elements of the second operand. Performs a singleprecision subtraction on contiguous data elements. The first data element of the result is obtained by subtracting the second element of the first operand from the first element of the first operand; the second element by subtracting the fourth element of the first operand from the third element of the first operand; the third by subtracting the second HSUBPS 526 Vol. 1 INSTRUCTION SET SUMMARY element of the second operand from the first element of the second operand; and the fourth by subtracting the fourth element of the second operand from...
