This preview shows page 1. Sign up to view the full content.
Unformatted text preview: is set). A SIMD floating-point exception flag that is set when the corresponding exception is unmasked will not generate a fault; only the next occurrence of that unmasked exception will generate a fault. -- An application which checks the x87 FPU status word to determine if any masked exception flags were set during an x87 FPU library call will also need to check the MXCSR register to detect a similar occurrence of a masked exception flag being set during an SSE/SSE2/SSE3 library call. 11.6 WRITING APPLICATIONS WITH SSE/SSE2 EXTENSIONS The following sections give some guidelines for writing application programs and operating-system code that uses the SSE and SSE2 extensions. Because SSE and SSE2 extensions share the same state and perform companion operations, these guidelines apply to both sets of extensions. Chapter 12 in the Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A, discusses the interface to the processor for context switching as well as other operating system considerations when writing code that uses SSE/SSE2/SSE3 extensions. 11.6.1 General Guidelines for Using SSE/SSE2 Extensions The following guidelines describe how to take full advantage of the performance gains available with the SSE and SSE2 extensions: Ensure that the processor supports the SSE and SSE2 extensions. Ensure that your operating system supports the SSE and SSE2 extensions. (Operating system support for the SSE extensions implies support for SSE2 extension and vice versa.) 1. SSE3 refers to ADDSUBPD, ADDSUBPS, HADDPD, HADDPS, HSUBPD and HSUBPS; the only other SSE3 instruction that can raise floating-point exceptions is FISTTP: it can generate x87 FPU invalid operation and inexact result exceptions. Vol. 1 11-27 PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2) Use stack and data alignment techniques to keep data properly aligned for efficient memory use. Use the non-temporal store instructions offered with the SSE and SSE2 extensions. Employ the optimization and scheduling techniques described in the Intel Pentium 4 Optimization Reference Manual (see Section 1.4, "Related Literature," for the order number for this manual). 11.6.2 Checking for SSE/SSE2 Support Before an application attempts to use the SSE and/or SSE2 extensions, it should check that they are present on the processor and that the operating system supports them. The application can make this check by following these steps: 1. Check that the processor supports the CPUID instruction by attempting to execute the CPUID instruction. If the processor does not support the CPUID instruction, it will generate an invalid-opcode exception (#UD). 2. Check that the processor supports the SSE and/or SSE2 extensions (true if CPUID.01H:EDX.SSE[bit 25] = 1 and/or CPUID.01H:EDX.SSE2[bit 26] = 1). 3. Check that the processor supports the FXSAVE and FXRSTOR instructions (true if CPUID.01H:EDX.FXSR[bit 24] = 1). 4. Check that the operating system supports the FXSAVE and FXRSTOR instructions....
View Full Document
This note was uploaded on 10/01/2013 for the course CPE 103 taught by Professor Watlins during the Winter '11 term at Mississippi State.
- Winter '11