Unformatted text preview: hed traces that are no longer needed. 2-12 Vol. 1 INTEL 64 AND IA-32 ARCHITECTURES 2.2.3 Intel CoreTM Microarchitecture Intel Core microarchitecture introduces the following features that enable high performance and power-efficient performance for single-threaded as well as multithreaded workloads: Intel Wide Dynamic Execution enable each processor core to fetch, dispatch, execute in high bandwidths to support retirement of up to four instructions per cycle. -- Fourteen-stage efficient pipeline -- Three arithmetic logical units -- Four decoders to decode up to five instruction per cycle -- Macro-fusion and micro-fusion to improve front-end throughput -- Peak issue rate of dispatching up to six micro-ops per cycle -- Peak retirement bandwidth of up to 4 micro-ops per cycle -- Advanced branch prediction -- Stack pointer tracker to improve efficiency of executing function/procedure entries and exits Intel Advanced Smart Cache delivers higher bandwidth from the second level cache to the core, and optimal performance and flexibility for singlethreaded and multi-threaded applications. -- Large second level cache up to 4 MB and 16-way associativity -- Optimized for multicore and single-threaded execution environments -- 256 bit internal data path to improve bandwidth from L2 to first-level data cache Intel Smart Memory Access prefetches data from memory in response to data access patterns and reduces cache-miss exposure of out-of-order execution. -- Hardware prefetchers to reduce effective latency of second-level cache misses -- Hardware prefetchers to reduce effective latency of first-level data cache misses -- Memory disambiguation to improve efficiency of speculative execution execution engine Intel Advanced Digital Media Boost improves most 128-bit SIMD instruction with single-cycle throughput and floating-point operations. -- Single-cycle throughput of most 128-bit SIMD instructions -- Up to eight floating-point operation per cycle -- Three issue ports available to dispatching SIMD instructions for execution Vol. 1 2-13 INTEL 64 AND IA-32 ARCHITECTURES Intel Core 2 Extreme, Intel Core 2 Duo processors and Intel Xeon processor 5100 series implement two processor cores based on the Intel Core microarchitecture, the functionality of the subsystems in each core are depicted in Figure 2-3. Instruction Fetch and P reD ecode Instruction Q ueue M icrocode ROM D ecode S hared L2 C ache U p to 10.7 G B /s FS B R enam e/A lloc R etirem ent U nit (R e-O rder B uffer) S cheduler A LU B ranch M M X /S S E /FP M ove A LU FA dd M M X /S S E A LU FM ul M M X/S S E Load S tore L1D C ache and D T LB Figure 2-3. The Intel Core Microarchitecture Pipeline Functionality The Front End The front end of Intel Core microarchitecture provides several enhancements to feed the Intel Wide Dynamic Execution engine: Instruction fetch unit prefetches instructions into an instruction queue to maintain steady supply of instruction to the decode units. Four-wide decode unit can...
