Through the processor to determine dependencies and

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: rs. The retirement unit then linearly searches the instruction pool for completed instructions that no longer have data dependencies with other instructions or unresolved branch predictions. When completed instructions are found, the retirement unit commits the results of these instructions to memory and/or the IA-32 registers (the processor's eight general-purpose registers and eight x87 FPU data registers) in the order they were originally issued and retires the instructions from the instruction pool. 2.2.2 Intel NetBurst Microarchitecture The Intel NetBurst microarchitecture provides: The Rapid Execution Engine -- Arithmetic Logic Units (ALUs) run at twice the processor frequency -- Basic integer operations can dispatch in 1/2 processor clock tick Hyper-Pipelined Technology -- Deep pipeline to enable industry-leading clock rates for desktop PCs and servers -- Frequency headroom and scalability to continue leadership into the future Advanced Dynamic Execution -- Deep, out-of-order, speculative execution engine Up to 126 instructions in flight Up to 48 loads and 24 stores in pipeline1 Reduces the misprediction penalty associated with deeper pipelines Advanced branch prediction algorithm 4K-entry branch target array -- Enhanced branch prediction capability 1. Intel 64 and IA-32 processors based on the Intel NetBurst microarchitecture at 90 nm process can handle more than 24 stores in flight. Vol. 1 2-9 INTEL 64 AND IA-32 ARCHITECTURES New cache subsystem -- First level caches Advanced Execution Trace Cache stores decoded instructions Execution Trace Cache removes decoder latency from main execution loops Execution Trace Cache integrates path of program execution flow into a single line Low latency data cache Full-speed, unified 8-way Level 2 on-die Advance Transfer Cache Bandwidth and performance increases with processor frequency -- Second level cache High-performance, quad-pumped bus interface to the Intel NetBurst microarchitecture system bus -- Supports quad-pumped, scalable bus clock to achieve up to 4X effective speed -- Capable of delivering up to 8.5 GBytes of bandwidth per second Superscalar issue to enable parallelism Expanded hardware registers with renaming to avoid register name space limitations 64-byte cache line size (transfers data up to two lines per sector) Figure 2-2 is an overview of the Intel NetBurst microarchitecture. This microarchitecture pipeline is made up of three sections: (1) the front end pipeline, (2) the out-oforder execution core, and (3) the retirement unit. 2-10 Vol. 1 INTEL 64 AND IA-32 ARCHITECTURES System Bus Frequently used paths Less frequently used paths Bus Unit 3rd Level Cache Optional 2nd Level Cache 8-Way Front End Trace Cache Microcode ROM 1st Level Cache 4-way Fetch/Decode Execution Out-Of-Order Core Retirement Branch History Update BTBs/Branch Prediction OM16521 Figure 2-2. The Intel NetBurst Microarchitecture The Front End Pipeline The front end supplies instructions in program order to the...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online