Unformatted text preview: processor microarchitecture. The P6 processor microarchitecture was later enhanced with an on-die, Level 2 cache, called Advanced Transfer Cache. The microarchitecture is a three-way superscalar, pipelined architecture. Three-way superscalar means that by using parallel processing techniques, the processor is able on average to decode, dispatch, and complete execution of (retire) three instructions per clock cycle. To handle this level of instruction throughput, the P6 processor family uses a decoupled, 12-stage superpipeline that supports out-of-order instruction execution. Figure 2-1 shows a conceptual view of the P6 processor microarchitecture pipeline with the Advanced Transfer Cache enhancement. Vol. 1 2-7 INTEL 64 AND IA-32 ARCHITECTURES System Bus Frequently used Bus Unit Less frequently used 2nd Level Cache On-die, 8-way 1st Level Cache 4-way, low latency Front End Execution Instruction Cache Microcode ROM Fetch/ Decode Execution Out-of-Order Core Retirement Branch History Update BTSs/Branch Prediction
OM16520 Figure 2-1. The P6 Processor Microarchitecture with Advanced Transfer Cache Enhancement
To ensure a steady supply of instructions and data for the instruction execution pipeline, the P6 processor microarchitecture incorporates two cache levels. The Level 1 cache provides an 8-KByte instruction cache and an 8-KByte data cache, both closely coupled to the pipeline. The Level 2 cache provides 256-KByte, 512-KByte, or 1-MByte static RAM that is coupled to the core processor through a full clock-speed 64-bit cache bus. The centerpiece of the P6 processor microarchitecture is an out-of-order execution mechanism called dynamic execution. Dynamic execution incorporates three dataprocessing concepts: Deep branch prediction allows the processor to decode instructions beyond branches to keep the instruction pipeline full. The P6 processor family implements highly optimized branch prediction algorithms to predict the direction of the instruction. Dynamic data flow analysis requires real-time analysis of the flow of data through the processor to determine dependencies and to detect opportunities for out-of-order instruction execution. The out-of-order execution core can monitor many instructions and execute these instructions in the order that best optimizes 2-8 Vol. 1 INTEL 64 AND IA-32 ARCHITECTURES the use of the processor's multiple execution units, while maintaining the data integrity. Speculative execution refers to the processor's ability to execute instructions that lie beyond a conditional branch that has not yet been resolved, and ultimately to commit the results in the order of the original instruction stream. To make speculative execution possible, the P6 processor microarchitecture decouples the dispatch and execution of instructions from the commitment of results. The processor's out-of-order execution core uses data-flow analysis to execute all available instructions in the instruction pool and store the results in temporary registe...
View Full Document
- Winter '11
- X86, Intel corporation, 64-bit mode, fpu floating-point exception, FPU Control Instructions