ia-32_volume1_basic-arch

128 bit simd instruction with single cycle throughput

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: decode 4 instructions per cycle or 5 instructions per cycle with Macrofusion. Macrofusion fuses common sequence of two instructions as one decoded instruction (micro-ops) to increase decoding throughput. Microfusion fuses common sequence of two micro-ops as one micro-ops to improve retirement throughput. 2-14 Vol. 1 INTEL 64 AND IA-32 ARCHITECTURES Instruction queue provides caching of short loops to improve efficiency. Stack pointer tracker improves efficiency of executing procedure/function entries and exits. Branch prediction unit employs dedicated hardware to handle different types of branches for improved branch prediction. Advanced branch prediction algorithm directs instruction fetch unit to fetch instructions likely in the architectural code path for decoding. 2.2.3.2 Execution Core The execution core of the Intel Core microarchitecture is superscalar and can process instructions out of order to increases the overall rate of instructions executed per cycle (IPC). The execution core employs the following feature to improve execution throughput and efficiency: Up to six micro-ops can be dispatched to execute per cycle Up to four instructions can be retired per cycle Three full arithmetic logical units SIMD instructions can be dispatched through three issue ports Most SIMD instructions have 1-cycle throughput (including 128-bit SIMD instructions) Up to eight floating-point operation per cycle Many long-latency computation operation are pipelined in hardware to increase overall throughput Reduced exposure to data access delays using Intel Smart Memory Access 2.2.4 SIMD Instructions Beginning with the Pentium II and Pentium with Intel MMX technology processor families, five extensions have been introduced into the Intel 64 and IA-32 architectures to perform single-instruction multiple-data (SIMD) operations. These extensions include the MMX technology, SSE extensions, SSE2 extensions, SSE3 extensions, and Supplemental Streaming SIMD Extensions 3. Each of these extensions provides a group of instructions that perform SIMD operations on packed integer and/or packed floating-point data elements. SIMD integer operations can use the 64-bit MMX or the 128-bit XMM registers. SIMD floating-point operations use 128-bit XMM registers. Figure 2-4 shows a summary of the various SIMD extensions (MMX technology, SSE, SSE2, SSE3, and SSSE3), the data types they operate on, and how the data types are packed into MMX and XMM registers. The Intel MMX technology was introduced in the Pentium II and Pentium with MMX technology processor families. MMX instructions perform SIMD operations on packed Vol. 1 2-15 INTEL 64 AND IA-32 ARCHITECTURES byte, word, or doubleword integers located in MMX registers. These instructions are useful in applications that operate on integer arrays and streams of integer data that lend themselves to SIMD processing. SSE extensions were introduced in the Pentium III processor family. SSE instructions operate on packed single-precision float...
View Full Document

Ask a homework question - tutors are online