ia-32_volume1_basic-arch

Between packed and scalar single precision and double

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ked double-precision floatingpoint values from an XMM register into memory Non-temporal store of double quadword from an XMM register into memory Non-temporal store of a doubleword from a general-purpose register into memory 5.7 SSE3 INSTRUCTIONS The SSE3 extensions offers 13 instructions that accelerate performance of Streaming SIMD Extensions technology, Streaming SIMD Extensions 2 technology, and x87-FP math capabilities. These instructions can be grouped into the following categories: One x87FPU instruction used in integer conversion One SIMD integer instruction that addresses unaligned data loads Two SIMD floating-point packed ADD/SUB instructions Four SIMD floating-point horizontal ADD/SUB instructions Vol. 1 5-25 INSTRUCTION SET SUMMARY Three SIMD floating-point LOAD/MOVE/DUPLICATE instructions Two thread synchronization instructions SSE3 instructions can only be executed on Intel 64 and IA-32 processors that support SSE3 extensions. Support for these instructions can be detected with the CPUID instruction. See the description of the CPUID instruction in Chapter 3, "Instruction Set Reference, A-M," of the Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 2A. The sections that follow describe each subgroup. 5.7.1 FISTTP SSE3 x87-FP Integer Conversion Instruction Behaves like the FISTP instruction but uses truncation, irrespective of the rounding mode specified in the floating-point control word (FCW) 5.7.2 LDDQU SSE3 Specialized 128-bit Unaligned Data Load Instruction Special 128-bit unaligned load designed to avoid cache line splits 5.7.3 ADDSUBPS SSE3 SIMD Floating-Point Packed ADD/SUB Instructions Performs single-precision addition on the second and fourth pairs of 32-bit data elements within the operands; single-precision subtraction on the first and third pairs Performs double-precision addition on the second pair of quadwords, and double-precision subtraction on the first pair ADDSUBPD 5.7.4 HADDPS SSE3 SIMD Floating-Point Horizontal ADD/SUB Instructions Performs a single-precision addition on contiguous data elements. The first data element of the result is obtained by adding the first and second elements of the first operand; the second element by adding the third and fourth elements of the first operand; the third by adding the first and second elements of the second operand; and the fourth by adding the third and fourth elements of the second operand. Performs a single-precision subtraction on contiguous data elements. The first data element of the result is obtained by subtracting the second element of the first operand from the first element of the first operand; the second element by subtracting the fourth element of the first operand from the third element of the first operand; the third by subtracting the second HSUBPS 5-26 Vol. 1 INSTRUCTION SET SUMMARY element of the second operand from the first element of the second operand; and the fourth by subtracting the fourth element of the second operand from...
View Full Document

This note was uploaded on 10/01/2013 for the course CPE 103 taught by Professor Watlins during the Winter '11 term at Mississippi State.

Ask a homework question - tutors are online