67%(3)2 out of 3 people found this document helpful
This preview shows page 243 - 245 out of 808 pages.
The most common mechanism for controlling the communication of operands among the execution unitsis calledregister renaming. When an instruction that updates registeris decoded, atagis generated
224CHAPTER 5. OPTIMIZING PROGRAM PERFORMANCEOperationLatencyIssue TimeInteger Add11Integer Multiply41Integer Divide3636Floating-Point Add31Floating-Point Multiply52Floating-Point Divide3838Load (Cache Hit)31Store (Cache Hit)31Figure 5.12:Performance of Pentium III Arithmetic Operations.Latency represents the total numberof cycles for a single operation. Issue time denotes the number of cycles between successive, independentoperations. (Obtained from Intel literature).giving a unique identifier to the result of the operation. An entryis added to a table maintaining theassociation between each program register and the tag for an operation that will update this register. Whena subsequent instruction using registeras an operand is decoded, the operation sent to the Execution Unitwill containas the source for the operand value. When some execution unit completes the first operation,it generates a resultindicating that the operation with tagproduced value. Any operation waitingforas a source will then useas the source value. By this mechanism, values can be passed directly fromone operation to another, rather than being written to and read from the register file. The renaming tableonly contains entries for registers having pending write operations. When a decoded instruction requires aregister, and there is no tag associated with this register, the operand is retrieved directly from the registerfile. With register renaming, an entire sequence of operations can be performed speculatively, even thoughthe registers are updated only after the processor is certain of the branch outcomes.5.7.2Functional Unit PerformanceFigure 5.12 documents the performance of some of basic operations for an Intel Pentium III. These timingsare typical for other processors as well. Each operation is characterized by two cycle counts: thelatency,indicating the total number of cycles the functional unit requires to complete the operation; and theissuetime, indicating the number of cycles between successive, independent operations. The latencies range fromone cycle for basic integer operations; several cycles for loads, stores, integer multiplication, and the morecommon floating-point operations; and then to many cycles for division and other complex operations.As the third column in Figure 5.12 shows, several functional units of the processor arepipelined, meaningthat they can start on a new operation before the previous one is completed. The issue time indicates thenumber of cycles between successive operations for the unit. In a pipelined unit, the issue time is smallerthan the latency. A pipelined function unit is implemented as a series of stages, each of which performspart of the operation. For example, a typical floating-point adder contains three stages: one to process theexponent values, one to add the fractions, and one to round the final result. The operations can proceed