The most common mechanism for controlling the communication of operands among

The most common mechanism for controlling the

This preview shows page 243 - 245 out of 808 pages.

The most common mechanism for controlling the communication of operands among the execution units is called register renaming . When an instruction that updates register is decoded, a tag is generated
Image of page 243
224 CHAPTER 5. OPTIMIZING PROGRAM PERFORMANCE Operation Latency Issue Time Integer Add 1 1 Integer Multiply 4 1 Integer Divide 36 36 Floating-Point Add 3 1 Floating-Point Multiply 5 2 Floating-Point Divide 38 38 Load (Cache Hit) 3 1 Store (Cache Hit) 3 1 Figure 5.12: Performance of Pentium III Arithmetic Operations. Latency represents the total number of cycles for a single operation. Issue time denotes the number of cycles between successive, independent operations. (Obtained from Intel literature). giving a unique identifier to the result of the operation. An entry is added to a table maintaining the association between each program register and the tag for an operation that will update this register. When a subsequent instruction using register as an operand is decoded, the operation sent to the Execution Unit will contain as the source for the operand value. When some execution unit completes the first operation, it generates a result indicating that the operation with tag produced value . Any operation waiting for as a source will then use as the source value. By this mechanism, values can be passed directly from one operation to another, rather than being written to and read from the register file. The renaming table only contains entries for registers having pending write operations. When a decoded instruction requires a register , and there is no tag associated with this register, the operand is retrieved directly from the register file. With register renaming, an entire sequence of operations can be performed speculatively, even though the registers are updated only after the processor is certain of the branch outcomes. 5.7.2 Functional Unit Performance Figure 5.12 documents the performance of some of basic operations for an Intel Pentium III. These timings are typical for other processors as well. Each operation is characterized by two cycle counts: the latency , indicating the total number of cycles the functional unit requires to complete the operation; and the issue time , indicating the number of cycles between successive, independent operations. The latencies range from one cycle for basic integer operations; several cycles for loads, stores, integer multiplication, and the more common floating-point operations; and then to many cycles for division and other complex operations. As the third column in Figure 5.12 shows, several functional units of the processor are pipelined , meaning that they can start on a new operation before the previous one is completed. The issue time indicates the number of cycles between successive operations for the unit. In a pipelined unit, the issue time is smaller than the latency. A pipelined function unit is implemented as a series of stages, each of which performs part of the operation. For example, a typical floating-point adder contains three stages: one to process the exponent values, one to add the fractions, and one to round the final result. The operations can proceed
Image of page 244
Image of page 245

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture