ia-32_instruction-set-ref_a-m

Documentation see

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: . This is either four packed single-precision floating-point values or a scalar single-precision floating-point value. The __m128d data type holds two packed double-precision floating-point values or a scalar double-precision floating-point value. The __m128i data type can hold sixteen byte, eight word, or four doubleword, or two quadword integer values. The compiler aligns __m128, __m128d, and __m128i local and global data to 16-byte boundaries on the stack. To align integer, float, or double arrays, use the declspec statement as described in Intel C/C++ compiler documentation. See http://www.intel.com/support/performancetools/. The __m128, __m128d, and __m128i data types are not basic ANSI C data types and therefore some restrictions are placed on its usage: Use __m128, __m128d, and __m128i only on the left-hand side of an assignment, as a return value, or as a parameter. Do not use it in other arithmetic expressions such as "+" and ">>." Do not initialize __m128, __m128d, and __m128i with literals; there is no way to express 128-bit constants. Use __m128, __m128d, and __m128i objects in aggregates, such as unions (for example, to access the float elements) and structures. The address of these objects may be taken. Use __m128, __m128d, and __m128i data only with the intrinsics described in this user's guide. See Appendix C, "InteL C/C++ Compiler Intrinsics and Functional Equivalents," in the Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 2B, for more information on using intrinsics. The compiler aligns __m128, __m128d, and __m128i local data to 16-byte boundaries on the stack. Global __m128 data is also aligned on 16-byte boundaries. (To align float arrays, you can use the alignment declspec described in the following section.) Because the new instruction set treats the SIMD floating-point registers in the same way whether you are using packed or scalar data, there is no __m32 data type to represent scalar data as you might expect. For scalar operations, you should use the __m128 objects and the "scalar" forms of the intrinsics; the compiler and the processor implement these operations with 32-bit memory references. Vol. 2 3-13 INSTRUCTION SET REFERENCE, A-M The suffixes ps and ss are used to denote "packed single" and "scalar single" precision operations. The packed floats are represented in right-to-left order, with the lowest word (right-most) being used for scalar operations: [z, y, x, w]. To explain how memory storage reflects this, consider the following example. The operation: float a[4] { 1.0, 2.0, 3.0, 4.0 }; __m128 t _mm_load_ps(a); Produces the same result as follows: __m128 t _mm_set_ps(4.0, 3.0, 2.0, 1.0); In other words: t [ 4.0, 3.0, 2.0, 1.0 ] Where the "scalar" element is 1.0. Some intrinsics are "composites" because they require more than one instruction to implement them. You should be familiar with the hardware features pro...
View Full Document

Ask a homework question - tutors are online