ia-32_volume1_basic-arch

Other ops may proceed around it the processor employs

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: hed traces that are no longer needed. 2-12 Vol. 1 INTEL 64 AND IA-32 ARCHITECTURES 2.2.3 Intel CoreTM Microarchitecture Intel Core microarchitecture introduces the following features that enable high performance and power-efficient performance for single-threaded as well as multithreaded workloads: Intel Wide Dynamic Execution enable each processor core to fetch, dispatch, execute in high bandwidths to support retirement of up to four instructions per cycle. -- Fourteen-stage efficient pipeline -- Three arithmetic logical units -- Four decoders to decode up to five instruction per cycle -- Macro-fusion and micro-fusion to improve front-end throughput -- Peak issue rate of dispatching up to six micro-ops per cycle -- Peak retirement bandwidth of up to 4 micro-ops per cycle -- Advanced branch prediction -- Stack pointer tracker to improve efficiency of executing function/procedure entries and exits Intel Advanced Smart Cache delivers higher bandwidth from the second level cache to the core, and optimal performance and flexibility for singlethreaded and multi-threaded applications. -- Large second level cache up to 4 MB and 16-way associativity -- Optimized for multicore and single-threaded execution environments -- 256 bit internal data path to improve bandwidth from L2 to first-level data cache Intel Smart Memory Access prefetches data from memory in response to data access patterns and reduces cache-miss exposure of out-of-order execution. -- Hardware prefetchers to reduce effective latency of second-level cache misses -- Hardware prefetchers to reduce effective latency of first-level data cache misses -- Memory disambiguation to improve efficiency of speculative execution execution engine Intel Advanced Digital Media Boost improves most 128-bit SIMD instruction with single-cycle throughput and floating-point operations. -- Single-cycle throughput of most 128-bit SIMD instructions -- Up to eight floating-point operation per cycle -- Three issue ports available to dispatching SIMD instructions for execution Vol. 1 2-13 INTEL 64 AND IA-32 ARCHITECTURES Intel Core 2 Extreme, Intel Core 2 Duo processors and Intel Xeon processor 5100 series implement two processor cores based on the Intel Core microarchitecture, the functionality of the subsystems in each core are depicted in Figure 2-3. Instruction Fetch and P reD ecode Instruction Q ueue M icrocode ROM D ecode S hared L2 C ache U p to 10.7 G B /s FS B R enam e/A lloc R etirem ent U nit (R e-O rder B uffer) S cheduler A LU B ranch M M X /S S E /FP M ove A LU FA dd M M X /S S E A LU FM ul M M X/S S E Load S tore L1D C ache and D T LB Figure 2-3. The Intel Core Microarchitecture Pipeline Functionality 2.2.3.1 The Front End The front end of Intel Core microarchitecture provides several enhancements to feed the Intel Wide Dynamic Execution engine: Instruction fetch unit prefetches instructions into an instruction queue to maintain steady supply of instruction to the decode units. Four-wide decode unit can...
View Full Document

Ask a homework question - tutors are online