ia-32_volume1_basic-arch

Intel netburst microarchitecture system bus supports

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: out-of-order execution core. It performs a number of functions: Prefetches instructions that are likely to be executed Fetches instructions that have not already been prefetched Decodes instructions into micro-operations Generates microcode for complex instructions and special-purpose code Delivers decoded instructions from the execution trace cache Predicts branches using highly advanced algorithm The pipeline is designed to address common problems in high-speed, pipelined microprocessors. Two of these problems contribute to major sources of delays: time to decode instructions fetched from the target Vol. 1 2-11 INTEL 64 AND IA-32 ARCHITECTURES wasted decode bandwidth due to branches or branch target in the middle of cache lines The operation of the pipeline's trace cache addresses these issues. Instructions are constantly being fetched and decoded by the translation engine (part of the fetch/decode logic) and built into sequences of ops called traces. At any time, multiple traces (representing prefetched branches) are being stored in the trace cache. The trace cache is searched for the instruction that follows the active branch. If the instruction also appears as the first instruction in a pre-fetched branch, the fetch and decode of instructions from the memory hierarchy ceases and the prefetched branch becomes the new source of instructions (see Figure 2-2). The trace cache and the translation engine have cooperating branch prediction hardware. Branch targets are predicted based on their linear addresses using branch target buffers (BTBs) and fetched as soon as possible. 2.2.2.2 Out-Of-Order Execution Core The out-of-order execution core's ability to execute instructions out of order is a key factor in enabling parallelism. This feature enables the processor to reorder instructions so that if one op is delayed, other ops may proceed around it. The processor employs several buffers to smooth the flow of ops. The core is designed to facilitate parallel execution. It can dispatch up to six ops per cycle (this exceeds trace cache and retirement op bandwidth). Most pipelines can start executing a new op every cycle, so several instructions can be in flight at a time for each pipeline. A number of arithmetic logical unit (ALU) instructions can start at two per cycle; many floating-point instructions can start once every two cycles. 2.2.2.3 Retirement Unit The retirement unit receives the results of the executed ops from the out-of-order execution core and processes the results so that the architectural state updates according to the original program order. When a op completes and writes its result, it is retired. Up to three ops may be retired per cycle. The Reorder Buffer (ROB) is the unit in the processor which buffers completed ops, updates the architectural state in order, and manages the ordering of exceptions. The retirement section also keeps track of branches and sends updated branch target information to the BTB. The BTB then purges pre-fetc...
View Full Document

Ask a homework question - tutors are online