SSE-Slides-2

SSE-Slides-2 - Floating Point Operations and Streaming SIMD...

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon
Floating Point Operations and Streaming SIMD Extensions Advanced Topics Spring 2009 Prof. Robert van Engelen
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
HPC II Spring 2009 26 3/18/09 SIMD Short Vector Extensions Using SIMD short vector extensions can result in large performance gains Instruction set extensions execute fast New wide registers to hold short vectors of ints, floats, doubles Parallel operations on short vectors Typical vector length is 128 bit Vector of 4 floats, 2 doubles, or 1 to 16 ints (128 bit to 8 bit ints) Technologies: MMX and SSE (Intel) 3DNow! (AMD) AltiVec (PowerPC) PA-RISC MAX (HP)
Background image of page 2
SSE SIMD Technology History HPC II Spring 2009 27 3/18/09 Technology First appeared Description MMX Pentium with MMX Introduced 8-byte packed integers SSE Pentium III Added 16-byte packed single precision floating point numbers SSE2 Pentium 4 Added 16-byte packed double precision floating point numbers and integers SSE3 Pentium 4 with HT Added horizontal operations on packed single and double precision floating point SSE4 P4 & Core i7 Added various instructions not specifically intended for multimedia SSE5 AMD Added fused/accumulate and permutation instructions, and precision control
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
SSE Instruction Set Eight 128 bit registers xmm0 … xmm7 Each register packs 16 bytes (8 bit int) 8 words (16 bit int) 4 doublewords (32 bit int) 2 quadwords (64 bit int) 4 floats (IEEE 754 single precision) 2 doubles (IEEE 754 double precision) Note: integer operations are signed or unsigned HPC II Spring 2009 28 3/18/09
Background image of page 4
SSE Instruction Set Instruction format: instruction<suffix> xmm, xmm/m128, [imm8/r32] m128 is a 128-bit memory location (16-byte aligned address), imm8 is an 8-bit immediate operand, r32 a 32-bit register operand Instruction suffix for floating-point operations: ps: packed single precision float pd: packed double precision float ss: scalar (applies to lower data element) single precision float sd: scalar (applies to lower data element) double precision float Instruction suffix for integer operations: b: byte w: word d: doubleword q: quadword dq: double quadword HPC II Spring 2009 29 3/18/09
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
SSE Data Movement Little endian order HPC II Spring 2009 30 3/18/09 W7 W6 W5 W4 W3 W2 W1 W0 a+14 a+14 a+10 a+8 a+6 a+4 a+2 a W7 W6 W5 W4 W3 W2 W1 W0 xmm0: movdqa xmm0, [a] movdqu xmm0, [a] Use when a is 16-byte aligned Use when a is not aligned (expensive!)
Background image of page 6
SSE Data Movement HPC II Spring 2009 31 3/18/09 Instruction Suffix Description movdqa movdqu Move double quadword aligned Move double quadword unaligned mova movu ps,pd Move single/double precision float aligned Move single/double precision float unaligned movhl movlh ps ps Move packed float high to low Move packed float low to high moveh movel ps,pd ps,pd Move high packed float (single/double)
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 8
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 03/08/2012 for the course CSE 721 taught by Professor Saday during the Winter '11 term at Ohio State.

Page1 / 24

SSE-Slides-2 - Floating Point Operations and Streaming SIMD...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online