ia-32_volume1_basic-arch

For simd and x87 programming the fxsave and fxrstor

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: st operand from the first element of the first operand; the second element by subtracting the fourth element of the first operand from the third element of the first operand; the third by subtracting the second element of the second operand from the first element of the second operand; and the fourth by subtracting the fourth element of the second operand from the third element of the second operand. HSUBPS OperandA, OperandB -- OperandA (128 bits, four data elements): 3a, 2a, 1a, 0a -- OperandB (128 bits, four data elements): 3b, 2b, 1b, 0b -- Result (Stored in OperandA): 2b-3b, 0b-1b, 2a-3a, 0a-1a The HADDPD instruction performs a double-precision addition on contiguous data elements. The first data element of the result is obtained by adding the first and second elements of the first operand; the second element by adding the first and second elements of the second operand. HADDPD OperandA, OperandB -- OperandA (128 bits, two data elements): 1a, 0a -- OperandB (128 bits, two data elements): 1b, 0b -- Result (Stored in OperandA): 1b+0b, 1a+0a The HSUBPD instruction performs a double-precision subtraction on contiguous data elements. The first data element of the result is obtained by subtracting the second element of the first operand from the first element of the first operand; the second element by subtracting the second element of the second operand from the first element of the second operand. HSUBPD OperandA OperandB -- OperandA (128 bits, two data elements): 1a, 0a -- OperandB (128 bits, two data elements): 1b, 0b -- Result (Stored in OperandA): 0b-1b, 0a-1a 12-6 Vol. 1 PROGRAMMING WITH SSE3 AND SUPPLEMENTAL SSE3 12.3.6 Two Thread Synchronization Instructions The MONITOR instruction sets up an address range that is used to monitor writeback-stores. MWAIT enables a logical processor to enter into an optimized state while waiting for a write-back-store to the address range set up by MONITOR. MONITOR and MWAIT require the use of general purpose registers for its input. The registers used by MONITOR and MWAIT must be initialized properly; register content is not modified by these instructions. 12.4 WRITING APPLICATIONS WITH SSE3 EXTENSIONS The following sections give guidelines for writing application programs and operating-system code that use SSE3 instructions. 12.4.1 Guidelines for Using SSE3 Extensions The following guidelines describe how to maximize the benefits of using SSE3 extensions: Ensure that the processor supports SSE3 extensions. Ensure that your operating system supports SSE/SSE2/SSE3 extensions. (Operating system support for the SSE extensions implies support for SSE2 extensions, the x87 and SIMD instructions of SSE3 extensions.) Ensure your operating system supports MONITOR and MWAIT. Employ the optimization and scheduling techniques described in the Intel 64 and IA-32 Architectures Optimization Reference Manual (see Section 1.4, "Related Literature"). 12.4.2 Checking for SSE3 Support Before an application attempts...
View Full Document

This note was uploaded on 10/01/2013 for the course CPE 103 taught by Professor Watlins during the Winter '11 term at Mississippi State.

Ask a homework question - tutors are online