73 Pages

551-12-16-2002

Course: EECC 551, Fall 2009
School: RIT
Rating:
 
 
 
 
 

Word Count: 8528

Document Preview

of Reduction Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by: Result forwarding and bypassing to reduce latency and hide or reduce the effect of true data dependence. Hazard detection hardware to stall the pipeline starting with the instruction that uses the result. Compiler-based static pipeline scheduling to separate the dependent instructions...

Register Now

Unformatted Document Excerpt

Coursehero >> New York >> RIT >> EECC 551

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
of Reduction Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by: Result forwarding and bypassing to reduce latency and hide or reduce the effect of true data dependence. Hazard detection hardware to stall the pipeline starting with the instruction that uses the result. Compiler-based static pipeline scheduling to separate the dependent instructions minimizing actual hazards and stalls in scheduled code. Dynamic scheduling: Uses a hardware-based mechanism to rearrange instruction execution order to reduce stalls at runtime. Enables handling some cases where dependencies are unknown at compile time. Similar to the other pipeline optimizations above, a dynamically scheduled processor cannot remove true data dependencies, but tries to avoid or reduce stalling. (In Appendix A.8, Chapter 3.2) EECC551 - Shaaban #1 lec # 4 Winter 2002 12-16-2002 Dynamic Pipeline Scheduling: The Concept Dynamic pipeline scheduling overcomes the limitations of in-order execution by allowing out-of-order instruction execution. Instruction are allowed to start executing out-of-order as soon as their operands are available. Example: In the case of in-order execution SUBD must wait for DIVD to complete which stalled ADDD before starting execution In out-of-order execution SUBD can start as soon as the values of its operands F8, F14 are available. DIVD F0, F2, F4 ADDD F10, F0, F8 SUBD F12, F8, F14 This implies allowing out-of-order instruction commit (completion). May lead to imprecise exceptions if an instruction issued earlier raises an exception. This is similar to pipelines with multi-cycle floating point units. (In Appendix A.8, Chapter 3.2) EECC551 - Shaaban #2 lec # 4 Winter 2002 12-16-2002 Dynamic Pipeline Scheduling Dynamic instruction scheduling is accomplished by: Dividing the Instruction Decode ID stage into two stages: Issue: Decode instructions, check for structural hazards. Read operands: Wait until data hazard conditions, if any, are resolved, then read operands when available. (All instructions pass through the issue stage in order but can be stalled or pass each other in the read operands stage). In the instruction fetch stage IF, fetch an additional instruction every cycle into a latch or several instructions into an instruction queue. Increase the number of functional units to meet the demands of the additional instructions in their EX stage. Two dynamic scheduling approaches exist: Dynamic scheduling with a Scoreboard used first in CDC6600 The Tomasulo approach pioneered by the IBM 360/91 (In Appendix A.8, Chapter 3.2) EECC551 - Shaaban #3 lec # 4 Winter 2002 12-16-2002 Dynamic Scheduling With A Scoreboard The score board is a hardware mechanism that maintains an execution rate of one instruction per cycle by executing an instruction as soon as its operands are available and no hazard conditions prevent it. It replaces ID, EX, WB with four stages: ID1, ID2, EX, WB Every instruction goes through the scoreboard where a record of data dependencies is constructed (corresponds to instruction issue). A system with a scoreboard is assumed to have several functional units with their status information reported to the scoreboard. If the scoreboard determines that an instruction cannot execute immediately it executes another waiting instruction and keeps monitoring hardware units status and decide when the instruction can proceed to execute. The scoreboard also decides when an instruction can write its results to registers (hazard detection and resolution is centralized in the scoreboard). (In Appendix A.8) EECC551 - Shaaban #4 lec # 4 Winter 2002 12-16-2002 The basic structure of a MIPS processor with a scoreboard (In Appendix A.8) EECC551 - Shaaban #5 lec # 4 Winter 2002 12-16-2002 Instruction Execution Stages with A Scoreboard 1 Issue (ID1): If a functional unit for the instruction is available, the scoreboard issues the instruction to the functional unit and updates its internal data structure; structural and WAW hazards are resolved here. (this replaces part of ID stage in the conventional MIPS pipeline). The scoreboard monitors the availability of the source operands. A source operand is available when no earlier active instruction will write it. When all source operands are available the scoreboard tells the functional unit to read all operands from the registers (no forwarding supported) and start execution (RAW hazards resolved here dynamically). This completes ID. The functional unit starts execution upon receiving operands. When the results are ready it notifies the scoreboard (replaces EX, MEM in MIPS). Once the scoreboard senses that a functional unit completed execution, it checks for WAR hazards and stalls the completing instruction if needed otherwise the write back is completed. 2 Read operands (ID2): 3 Execution (EX): 4 Write result (WB): (In Appendix A.8) EECC551 - Shaaban #6 lec # 4 Winter 2002 12-16-2002 Three Parts of the Scoreboard 1 2 Instruction status: Which of 4 steps the instruction is in. Functional unit status: Indicates the state of the functional unit (FU). Nine fields for each functional unit: Busy Op Fi Fj, Fk Qj, Qk Rj, Rk Indicates whether the unit is busy or not Operation to perform in the unit (e.g., + or ) Destination register Source-register numbers Functional units producing source registers Fj, Fk Flags indicating when Fj, Fk are ready (set to Yes after operand is available to read) 3 (In Appendix A.8) Register result status: Indicates which functional unit will write to each register, if one exists. Blank when no pending instructions will write that register. EECC551 - Shaaban #7 lec # 4 Winter 2002 12-16-2002 The Scoreboard: Detailed Pipeline Control (In Appendix A.8) EECC551 - Shaaban #8 lec # 4 Winter 2002 12-16-2002 A Scoreboard Example The following code is run on the MIPS with a scoreboard given earlier with: Functional Unit (FU) Integer Floating Point Multiply Floating Point add Floating point Divide # of FUs 1 2 1 1 EX Latency 0 10 2 40 All functional units are not pipelined L.D L.D MUL.D SUB.D DIV.D ADD.D (In Appendix A.8) F6, 34(R2) F2, 45(R3) F0, F2, F4 F8, F6, F2 F10, F0, F6 F6, F8, F2 Real Data Dependence (RAW) Anti-dependence Output Dependence (WAR) (WAW) EECC551 - Shaaban #9 lec # 4 Winter 2002 12-16-2002 Scoreboard Example: Cycle 1 FP Latency: Add = 2 cycles, Multiply = 10, Divide = 40 Instruction status Instruction j k L.D F6 34+ R2 L.D F2 45+ R3 MUL.DF0 F2 F4 SUB.D F8 F6 F2 DIV.D F10 F0 F6 ADD.DF6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Read ExecutionWrite operands complete Result Issue 1 Busy Yes No No No No Op Load dest Fi F6 S1 Fj S2 Fk R2 FU for j FU for k Fj? Qj Qk Rj Fk? Rk Yes Clock 1 FU F0 F2 F4 F6 F8 F10 Integer F12 ... F30 EECC551 - Shaaban #10 lec # 4 Winter 2002 12-16-2002 Scoreboard Example: Cycle 2 FP Latency: Add = 2 cycles, Multiply = 10, Divide = 40 Instruction status Instruction j k L.D F6 34+ R2 L.D F2 45+ R3 MUL.DF0 F2 F4 SUB.D F8 F6 F2 DIV.D F10 F0 F6 ADD.DF6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Issue 1 Read ExecutionWrite operands complete Result 2 Busy Yes No No No No Op Load dest Fi F6 S1 Fj S2 Fk R2 FU for j FU for k Fj? Qj Qk Rj Fk? Rk Yes Clock 2 FU F0 F2 F4 F6 F8 F10 Integer F12 ... F30 Issue second L.D? No, stall on structural hazard EECC551 - Shaaban #11 lec # 4 Winter 2002 12-16-2002 Scoreboard Example: Cycle 3 Instruction status Instruction j k L.D F6 34+ R2 L.D F2 45+ R3 MUL.DF0 F2 F4 SUB.D F8 F6 F2 DIV.D F10 F0 F6 ADD.DF6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Issue 1 Read ExecutionWrite operands complete Result 2 3 ? Busy Yes No No No No Op Load dest Fi F6 S1 Fj S2 Fk R2 FU for j FU for k Fj? Qj Qk Rj Fk? Rk Yes Clock 3 FU F0 F2 F4 F6 F8 F10 Integer F12 ... F30 Issue MUL.D? No, stall on structural hazard EECC551 - Shaaban #12 lec # 4 Winter 2002 12-16-2002 Scoreboard Example: Cycle 4 Instruction status Instruction j k L.D F6 34+ R2 L.D F2 45+ R3 MUL.DF0 F2 F4 SUB.D F8 F6 F2 DIV.D F10 F0 F6 ADD.DF6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Issue 1 Read ExecutionWrite operands complete Result 2 3 4 Busy Yes No No No No Op Load dest Fi F6 S1 Fj S2 Fk R2 FU for j FU for k Fj? Qj Qk Rj Fk? Rk Yes Clock 4 FU F0 F2 F4 F6 F8 F10 Integer F12 ... F30 EECC551 - Shaaban #13 lec # 4 Winter 2002 12-16-2002 Scoreboard Example: Cycle 5 Instruction status Instruction j k L.D F6 34+ R2 L.D F2 45+ R3 MUL.D F0 F2 F4 SUB.D F8 F6 F2 DIV.D F10 F0 F6 ADD.D F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Issue 1 5 Read ExecutionWrite operands complete Result 2 3 4 Busy Yes No No No No Op Load dest Fi F2 S1 Fj S2 Fk R3 FU for j FU for k Fj? Qj Qk Rj Fk? Rk Yes Clock 5 FU F0 F2 Integer F4 F6 F8 F10 F12 ... F30 EECC551 - Shaaban #14 lec # 4 Winter 2002 12-16-2002 Scoreboard Example: Cycle 6 Instruction status Instruction j k L.D F6 34+ R2 L.D F2 45+ R3 MUL.D F0 F2 F4 SUB.D F8 F6 F2 DIV.D F10 F0 F6 ADD.D F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Issue 1 5 6 Read ExecutionWrite operands complete Result 2 3 4 6 Busy Yes Yes No No No Op Load Mult dest Fi F2 F0 S1 Fj F2 S2 Fk R3 F4 FU for j FU for k Fj? Qj Qk Rj Integer No Fk? Rk Yes Yes Clock 6 FU F0 F2 F4 F6 F8 F10 F12 ... F30 Mult1 Integer EECC551 - Shaaban #15 lec # 4 Winter 2002 12-16-2002 Scoreboard Example: Cycle 7 Instruction status Instruction j k L.D F6 34+ R2 L.D F2 45+ R3 MUL.D F0 F2 F4 SUB.D F8 F6 F2 DIV.D F10 F0 F6 ADD.D F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Issue 1 5 6 7 Read ExecutionWrite operands complete Result 2 3 4 6 7 Busy Yes Yes No Yes No Op Load Mult Sub dest Fi F2 F0 F8 S1 Fj F2 F6 S2 Fk R3 F4 F2 FU for j FU for k Fj? Qj Qk Rj Integer Integer No Yes Fk? Rk Yes Yes No Clock 7 FU F0 F2 F4 F6 F8 F10 Add F12 ... F30 Mult1 Integer Read multiply operands? EECC551 - Shaaban #16 lec # 4 Winter 2002 12-16-2002 Scoreboard Example: Cycle 8a (First half of cycle 8) Instruction status Instruction j k L.D F6 34+ R2 L.D F2 45+ R3 MUL.D F0 F2 F4 SUB.D F8 F6 F2 DIV.D F10 F0 F6 ADD.D F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Issue 1 5 6 7 8 Read ExecutionWrite operands complete Result 2 3 4 6 7 Busy Yes Yes No Yes Yes Op Load Mult Sub Div dest Fi F2 F0 F8 F10 S1 Fj F2 F6 F0 S2 Fk R3 F4 F2 F6 FU for j FU for k Fj? Qj Qk Rj Integer Integer Mult1 No Yes No Fk? Rk Yes Yes No Yes Clock 8 FU F0 F2 F4 F6 F8 F10 Add Divide F12 ... F30 Mult1 Integer EECC551 - Shaaban #17 lec # 4 Winter 2002 12-16-2002 Scoreboard Example: Cycle 8b (Second half of cycle 8) Instruction status Instruction j k L.D F6 34+ R2 F2 45+ R3 L.D MUL.D F0 F2 F4 F6 F2 SUB.D F8 DIV.D F10 F0 F6 ADD.D F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Issue 1 5 6 7 8 Read ExecutionWrite operands complete Result 2 3 4 6 7 8 Busy No Yes No Yes Yes Op Mult Sub Div dest Fi F0 F8 F10 S1 Fj F2 F6 F0 S2 Fk F4 F2 F6 FU for j FU for k Fj? Qj Qk Rj Yes Yes No Fk? Rk Yes Yes Yes Mult1 Clock 8 FU F0 Mult1 F2 F4 F6 F8 F10 Add Divide F12 ... F30 EECC551 - Shaaban #18 lec # 4 Winter 2002 12-16-2002 Scoreboard Example: Cycle 9 FP Latency: Add = 2 cycles, Multiply = 10, Divide = 40 Instruction status Instruction j k L.D F6 34+ R2 L.D F2 45+ R3 MUL.D F0 F2 F4 SUB.D F8 F6 F2 DIV.D F10 F0 F6 ADD.D F6 F8 F2 Functional unit status Time Name Integer 10 Mult1 Mult2 2 Add Divide Register result status Issue 1 5 6 7 8 ? Read ExecutionWrite operands complete Result 2 3 4 6 7 8 9 9 Busy No Yes No Yes Yes Op Mult Sub Div dest Fi F0 F8 F10 S1 Fj F2 F6 F0 S2 Fk F4 F2 F6 FU for j FU for k Fj? Qj Qk Rj Yes Yes No Fk? Rk Yes Yes Yes Mult1 Clock 9 FU F0 Mult1 F2 F4 F6 F8 F10 Add Divide F12 ... F30 Read operands for MUL.D & SUB.D? Issue ADD.D? EECC551 - Shaaban #19 lec # 4 Winter 2002 12-16-2002 Scoreboard Example: Cycle 11 Instruction status Instruction j k L.D F6 34+ R2 L.D F2 45+ R3 MUL.D F0 F2 F4 SUB.D F8 F6 F2 DIV.D F10 F0 F6 ADD.D F6 F8 F2 Functional unit status Time Name Integer 8 Mult1 Mult2 0 Add Divide Register result status Issue 1 5 6 7 8 Read ExecutionWrite operands complete Result 2 3 4 6 7 8 9 9 11 Busy No Yes No Yes Yes Op Mult Sub Div dest Fi F0 F8 F10 S1 Fj F2 F6 F0 S2 Fk F4 F2 F6 FU for j FU for k Fj? Qj Qk Rj Yes Yes No Fk? Rk Yes Yes Yes Mult1 Clock 11 FU F0 Mult1 F2 F4 F6 F8 F10 Add Divide F12 ... F30 EECC551 - Shaaban #20 lec # 4 Winter 2002 12-16-2002 Scoreboard Example: Cycle 12 Instruction status Instruction j k L.D F6 34+ R2 L.D F2 45+ R3 MUL.D F0 F2 F4 SUB.D F8 F6 F2 DIV.D F10 F0 F6 ADD.D F6 F8 F2 Functional unit status Time Name Integer 7 Mult1 Mult2 Add Divide Register result status Issue 1 5 6 7 8 Read ExecutionWrite operands complete Result 2 3 4 6 7 8 9 9 11 12 Busy No Yes No No Yes Op Mult dest Fi F0 S1 Fj F2 S2 Fk F4 FU for j FU for k Fj? Qj Qk Rj Yes Fk? Rk Yes Div F10 F0 F6 Mult1 No Yes Clock 12 FU F0 Mult1 F2 F4 F6 F8 F10 Divide F12 ... F30 Read operands for DIV.D? EECC551 - Shaaban #21 lec # 4 Winter 2002 12-16-2002 Scoreboard Example: Cycle 13 Instruction status Instruction j k L.D F6 34+ R2 L.D F2 45+ R3 MUL.D F0 F2 F4 SUB.D F8 F6 F2 DIV.D F10 F0 F6 ADD.D F6 F8 F2 Functional unit status Time Name Integer 6 Mult1 Mult2 Add Divide Register result status Issue 1 5 6 7 8 13 Busy No Yes No Yes Yes Read ExecutionWrite operands complete Result 2 3 4 6 7 8 9 9 11 12 Op Mult Add Div dest Fi F0 F6 F10 S1 Fj F2 F8 F0 S2 Fk F4 F2 F6 FU for j FU for k Fj? Qj Qk Rj Yes Yes No Fk? Rk Yes Yes Yes Mult1 Clock 13 FU F0 Mult1 F2 F4 F6 F8 F10 Add Divide F12 ... F30 EECC551 - Shaaban #22 lec # 4 Winter 2002 12-16-2002 Scoreboard Example: Cycle 17 Instruction status Instruction j k L.D F6 34+ R2 L.D F2 45+ R3 MUL.D F0 F2 F4 SUB.D F8 F6 F2 DIV.D F10 F0 F6 ADD.D F6 F8 F2 Functional unit status Time Name Integer 2 Mult1 Mult2 Add Divide Register result status Issue 1 5 6 7 8 13 Busy No Yes No Yes Yes Read ExecutionWrite operands complete Result 2 3 4 6 7 8 9 9 11 12 14 Op Mult Add Div 16 dest Fi F0 F6 F10 S1 Fj F2 F8 F0 S2 Fk F4 F2 F6 FU for j FU for k Fj? Qj Qk Rj Yes Yes No Fk? Rk Yes Yes Yes Mult1 Clock 17 FU F0 Mult1 F2 F4 F6 F8 F10 Add Divide F12 ... F30 Write result of ADD.D? No, WAR hazard EECC551 - Shaaban #23 lec # 4 Winter 2002 12-16-2002 Scoreboard Example: Cycle 20 Instruction status Instruction j k L.D F6 34+ R2 L.D F2 45+ R3 MUL.D F0 F2 F4 SUB.D F8 F6 F2 DIV.D F10 F0 F6 ADD.D F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Issue 1 5 6 7 8 13 Busy No No No Yes Yes Read ExecutionWrite operands complete Result 2 3 4 6 7 8 9 19 20 9 11 12 14 Op 16 dest Fi S1 Fj S2 Fk FU for j FU for k Fj? Qj Qk Rj Fk? Rk Add Div F6 F10 F8 F0 F2 F6 Yes Yes Yes Yes Clock 20 FU F0 F2 F4 F6 F8 F10 Add Divide F12 ... F30 EECC551 - Shaaban #24 lec # 4 Winter 2002 12-16-2002 Scoreboard Example: Cycle 21 Instruction status Instruction j k L.D F6 34+ R2 L.D F2 45+ R3 MUL.D F0 F2 F4 SUB.D F8 F6 F2 DIV.D F10 F0 F6 ADD.D F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add Divide Register result status Issue 1 5 6 7 8 13 Busy No No No Yes Yes Read ExecutionWrite operands complete Result 2 3 4 6 7 8 9 19 20 9 11 12 21 14 16 dest S1 S2 Op Fi Fj Fk FU for j FU for k Fj? Qj Qk Rj Fk? Rk Add Div F6 F10 F8 F0 F2 F6 Yes Yes Yes Yes Clock 21 FU F0 F2 F4 F6 F8 F10 Add Divide F12 ... F30 EECC551 - Shaaban #25 lec # 4 Winter 2002 12-16-2002 Scoreboard Example: Cycle 22 Instruction status Instruction j k L.D F6 34+ R2 L.D F2 45+ R3 MUL.D F0 F2 F4 SUB.D F8 F6 F2 DIV.D F10 F0 F6 ADD.D F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add 40 Divide Register result status Issue 1 5 6 7 8 13 Busy No No No No Yes Read ExecutionWrite operands complete Result 2 3 4 6 7 8 9 19 20 9 11 12 21 14 16 22 dest S1 S2 Op Fi Fj Fk FU for j FU for k Fj? Qj Qk Rj Fk? Rk Div F10 F0 F6 Yes Yes Clock 22 FU F0 F2 F4 F6 F8 F10 Divide F12 ... F30 EECC551 - Shaaban #26 lec # 4 Winter 2002 12-16-2002 Scoreboard Example: Cycle 61 Instruction status Instruction j k L.D F6 34+ R2 L.D F2 45+ R3 MUL.D F0 F2 F4 SUB.D F8 F6 F2 DIV.D F10 F0 F6 ADD.D F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add 0 Divide Register result status Issue 1 5 6 7 8 13 Busy No No No No Yes Read ExecutionWrite operands complete Result 2 3 4 6 7 8 9 19 20 9 11 12 21 61 14 16 22 dest S1 S2 Op Fi Fj Fk FU for j FU for k Fj? Qj Qk Rj Fk? Rk Div F10 F0 F6 Yes Yes Clock 61 FU F0 F2 F4 F6 F8 F10 Divide F12 ... F30 EECC551 - Shaaban #27 lec # 4 Winter 2002 12-16-2002 Scoreboard Example: Cycle 62 Instruction status Instruction j k L.D F6 34+ R2 L.D F2 45+ R3 MUL.D F0 F2 F4 SUB.D F8 F6 F2 DIV.D F10 F0 F6 ADD.D F6 F8 F2 Functional unit status Time Name Integer Mult1 Mult2 Add 0 Divide Register result status Issue 1 5 6 7 8 13 Busy No No No No No Read ExecutionWrite operands complete Result 2 3 4 6 7 8 9 19 20 9 11 12 21 61 62 14 16 22 dest S1 S2 Op Fi Fj Fk Instruction Block done FU for j FU for k Fj? Qj Qk Rj Fk? Rk Clock 62 FU F0 F2 F4 F6 F8 F10 F12 ... F30 We have: In-oder issue, Out-of-order execute and commit EECC551 - Shaaban #28 lec # 4 Winter 2002 12-16-2002 Dynamic Scheduling: The Tomasulo Algorithm Developed at IBM and first implemented in IBM's 360/91 mainframe in 1966, about 3 years after the debut of the scoreboard in the CDC 6600. Dynamically schedule the pipeline in hardware to reduce stalls. Differences between IBM 360 & CDC 6600 ISA. IBM has only 2 register specifiers/instr vs. 3 in CDC 6600. IBM has 4 FP registers vs. 8 in CDC 6600. Current CPU architectures that can be considered descendants of the IBM 360/91 which implement and utilize a variation of the Tomasulo Algorithm include: RISC CPUs: Alpha 21264, HP 8600, MIPS R12000, PowerPC G4 RISC-core x86 CPUs: AMD Athlon, Pentium III, 4, Xeon, (In Chapter 3.2) EECC551 - Shaaban #29 lec # 4 Winter 2002 12-16-2002 Tomasulo Algorithm Vs. Scoreboard Control & buffers distributed with Function Units (FU) Vs. centralized in Scoreboard: FU buffers are called "reservation stations" which have pending instructions and operands and other instruction status info. Reservations stations are sometimes referred to as "physical registers" or "renaming registers" as opposed to architecture registers specified by the ISA. ISA Registers in instructions are replaced by either values (if available) or pointers to reservation stations (RS) that will supply the value later: This process is called register renaming. Avoids WAR, WAW hazards. Allows for hardware-based loop unrolling. More reservation stations than ISA registers are possible , leading to optimizations that compilers can't achieve and prevents the number of ISA registers from becoming a bottleneck. Instruction results go (forwarded) to FUs from RSs, not through registers, over Common Data Bus (CDB) that broadcasts results to all FUs. Loads and Stores are treated as FUs with RSs as well. Integer instructions can go past branches, allowing FP ops beyond basic block in FP queue. (In Chapter 3.2) EECC551 - Shaaban #30 lec # 4 Winter 2002 12-16-2002 IBM 360/91 Tomasulo-based Vs. CDC 6600 Scoreboard-based Pipelined Functional Units (6 load, 3 store, 3 +, 2 x/) window size: 14 instructions No issue on structural hazard WAW: renaming avoids it WAR: renaming avoids it Broadcast results from FU (Implements forwarding) Control: reservation stations (In Chapter 3.2) Multiple Functional Units (Not pipelined) (1 load/store, 1 + , 2 x, 1 ) 5 instructions same stall issue stall completion Write/read registers (Forwarding not supported) central scoreboard EECC551 - Shaaban #31 lec # 4 Winter 2002 12-16-2002 Dynamic Scheduling: The Tomasulo Approach The basic structure of a MIPS floating-point unit using Tomasulo's algorithm (In Chapter 3.2) EECC551 - Shaaban #32 lec # 4 Winter 2002 12-16-2002 Reservation Station Fields Op Operation to perform in the unit (e.g., + or ) Vj, Vk Value of Source operands S1 and S2 Store buffers have a single V field indicating result to be stored. Qj, Qk Reservation stations producing source registers. (value to be written). No ready flags as in Scoreboard; Qj,Qk=0 => ready. Store buffers only have Qi for RS producing result. A: Address information for loads or stores. Initially immediate field of instruction then effective address when calculated. Busy: Indicates reservation station and FU are busy. Register result status: Qi Indicates which functional unit will write each register, if one exists. Blank (or 0) when no pending instructions exist that will write to that register. (In Chapter 3.2) #33 lec # 4 Winter 2002 EECC551 - Shaaban 12-16-2002 1 Three Stages of Tomasulo Algorithm Issue: Get instruction from pending Instruction Queue. Instruction issued to a free reservation station (no structural hazard). Selected RS is marked busy. Control sends available instruction operands values (from ISA registers) to assigned RS. Operands not available yet are renamed to RSs that will produce the operand (register renaming). 2 Execution (EX): Operate on operands. When both operands are ready then start executing on assigned FU. If all operands are not ready, watch Common Data Bus (CDB) for needed result (forwarding done via CDB). 3 Write result (WB): Finish execution. Write result on Common Data Bus to all awaiting units Mark reservation station as available. Normal data bus: data + destination ("go to" bus). Common Data Bus (CDB): data + source ("come from" bus): 64 bits for data + 4 bits for Functional Unit source address. Write data to waiting RS if source matches expected RS (that produces result). Does the result forwarding via broadcast to waiting RSs. (In Chapter 3.2) EECC551 - Shaaban #34 lec # 4 Winter 2002 12-16-2002 Steps in The Tomsulo Approach and The Requirements of Each Step (In Chapter 3.2) EECC551 - Shaaban #35 lec # 4 Winter 2002 12-16-2002 Drawbacks of The Tomasulo Approach Implementation Complexity: Example: The implementation of the Tomasulo algorithm may have caused delays in the introduction of 360/91, MIPS 10000, IBM 620 among other CPUs. Many high-speed associative result stores using (CDB) are required. Performance limited by Common Data Bus Possible solution: Multiple CDBs more Functional Unit and RS logic needed for parallel associative stores. EECC551 - Shaaban #36 lec # 4 Winter 2002 12-16-2002 (In Chapter 3.2) Tomasulo Approach Example Using the same code used in the scoreboard example to be run on the Tomasulo configuration given earlier: # of RSs Integer Floating Point Multiply/divide Floating Point add 1 2 3 EX Latency 0 10/40 2 Pipelined Functional Units L.D L.D MUL. D SUB.D DIV.D ADD.D (In Chapter 3.2) F6, 34(R2) F2, 45(R3) F0, F2, F4 F8, F6, F2 F10, F0, F6 F6, F8, F2 EECC551 - Shaaban #37 lec # 4 Winter 2002 12-16-2002 Real Data Dependence (RAW) Anti-dependence Output Dependence (WAR) (WAW) Tomasulo Example: Cycle 0 Instruction status Instruction j L.D F6 34+ L.D F2 45+ MUL.D F0 F2 SUB.D F8 F6 F10 DIV.D F0 ADD.D F6 F8 Reservation Stations Time Name 0 Add1 0 Add2 0 Add3 0 Mult1 0 Mult2 Register result status k R2 R3 F4 F2 F6 F2 Issue Execution complete Write Result Load1 Load2 Load3 Busy No No No Address Busy Op No No No No No S1 Vj S2 Vk RS for j Qj RS for k Qk Clock 0 FU F0 F2 F4 F6 F8 F10 F12 ... F30 EECC551 - Shaaban #38 lec # 4 Winter 2002 12-16-2002 Tomasulo Example Cycle 1 Instruction status Instruction j L.D F6 34+ L.D F2 45+ MUL.D F0 F2 SUB.D F8 F6 DIV.D F10 F0 ADD.D F6 F8 Reservation Stations Time Name 0 Add1 0 Add2 Add3 0 Mult1 0 Mult2 Register result status k R2 R3 F4 F2 F6 F2 Issue 1 Execution complete Write Result Load1 Load2 Load3 Busy Yes No No No Address 34+R2 Busy Op No No No No No S1 Vj S2 Vk RS for j Qj RS for k Qk Clock 1 FU F0 F2 F4 F6 Load1 F8 F10 F12 ... F30 EECC551 - Shaaban #39 lec # 4 Winter 2002 12-16-2002 Tomasulo Example: Cycle 2 Instruction status Instruction j L.D F6 34+ L.D F2 45+ MUL.D F0 F2 SUB.D F8 F6 DIV.D F10 F0 ADD.D F6 F8 Reservation Stations Time Name 0 Add1 0 Add2 Add3 0 Mult1 0 Mult2 Register result status k R2 R3 F4 F2 F6 F2 Issue 1 2 Execution complete Write Result Load1 Load2 Load3 Busy Yes Yes No Address 34+R2 45+R3 Busy Op No No No No No S1 Vj S2 Vk RS for j Qj RS for k Qk Clock 2 FU F0 F2 Load2 F4 F6 Load1 F8 F10 F12 ... F30 Note: Unlike 6600, can have multiple loads outstanding EECC551 - Shaaban #40 lec # 4 Winter 2002 12-16-2002 Tomasulo Example: Cycle 3 Instruction status Instruction j L.D F6 34+ L.D F2 45+ MUL.D F0 F2 SUB.D F8 F6 DIV.D F10 F0 ADD.D F6 F8 Reservation Stations Time Name 0 Add1 0 Add2 Add3 0 Mult1 0 Mult2 Register result status k R2 R3 F4 F2 F6 F2 Issue 1 2 3 Execution complete 3 Write Result Load1 Load2 Load3 Busy Yes Yes No Address 34+R2 45+R3 Busy Op No No No Yes MULTD No S1 Vj S2 Vk RS for j Qj RS for k Qk R(F4) Load2 Clock 3 FU F0 Mult1 F2 Load2 F4 F6 Load1 F8 F10 F12 ... F30 EECC551 - Shaaban #41 lec # 4 Winter 2002 12-16-2002 Tomasulo Example: Cycle 4 Instruction status Instruction j L.D F6 34+ L.D F2 45+ MUL.DF0 F2 SUB.D F8 F6 DIV.D F10 F0 ADD.DF6 F8 Reservation Stations Time Name 0 Add1 0 Add2 Add3 0 Mult1 0 Mult2 Register result status k R2 R3 F4 F2 F6 F2 Issue 1 2 3 4 Execution complete 3 4 Write Result 4 Busy No Yes No Address 45+R3 Load1 Load2 Load3 S1 Busy Op Vj Yes SUBD M(34+R2) No No Yes MULTD No S2 Vk RS for j Qj RS for k Qk Load2 R(F4) Load2 Clock 4 FU F0 Mult1 F2 Load2 F4 F6 M(34+R2) F8 Add1 F10 F12 ... F30 Load2 completing; what is waiting for it? EECC551 - Shaaban #42 lec # 4 Winter 2002 12-16-2002 Tomasulo Example: Cycle 5 Instruction status Instruction j L.D F6 34+ L.D F2 45+ MUL.DF0 F2 SUB.D F8 F6 DIV.D F10 F0 ADD.DF6 F8 Reservation Stations Time Name 2 Add1 0 Add2 Add3 10 Mult1 0 Mult2 Register result status k R2 R3 F4 F2 F6 F2 Issue 1 2 3 4 5 Execution complete 3 4 Write Result 4 5 Load1 Load2 Load3 Busy No No No Address S1 Busy Op Vj Yes SUBD M(34+R2) No No Yes MULTDM(45+R3) Yes DIVD S2 Vk M(45+R3) RS for j Qj RS for k Qk R(F4) M(34+R2) Mult1 Clock 5 FU F0 Mult1 F2 M(45+R3) F4 F6 M(34+R2) F8 Add1 F10 Mult2 F12 ... F30 EECC551 - Shaaban #43 lec # 4 Winter 2002 12-16-2002 Tomasulo Example Cycle 6 Instruction status Instruction j L.D F6 34+ L.D F2 45+ MUL.DF0 F2 SUB.D F8 F6 DIV.D F10 F0 ADD.DF6 F8 Reservation Stations Time Name 1 Add1 0 Add2 Add3 9 Mult1 0 Mult2 Register result status k R2 R3 F4 F2 F6 F2 Issue 1 2 3 4 5 6 Execution complete 3 4 Write Result 4 5 Load1 Load2 Load3 Busy No No No Address S1 Busy Op Vj Yes SUBD M(34+R2) Yes ADDD No Yes MULTDM(45+R3) Yes DIVD S2 Vk M(45+R3) M(45+R3) R(F4) M(34+R2) RS for j Qj Add1 RS for k Qk Mult1 Clock 6 FU F0 Mult1 F2 M(45+R3) F4 F6 Add2 F8 Add1 F10 Mult2 F12 ... F30 ADD.D is issued here vs. scoreboard EECC551 - Shaaban #44 lec # 4 Winter 2002 12-16-2002 Tomasulo Example: Cycle 7 Instruction status Instruction j L.D F6 34+ L.D F2 45+ MUL.DF0 F2 SUB.D F8 F6 DIV.D F10 F0 ADD.DF6 F8 Reservation Stations Time Name 0 Add1 0 Add2 Add3 8 Mult1 0 Mult2 Register result status k R2 R3 F4 F2 F6 F2 Issue 1 2 3 4 5 6 Execution complete 3 4 7 Write Result 4 5 Load1 Load2 Load3 Busy No No No Address S1 Busy Op Vj Yes SUBD M(34+R2) Yes ADDD No Yes MULTDM(45+R3) Yes DIVD S2 Vk M(45+R3) M(45+R3) R(F4) M(34+R2) RS for j Qj Add1 RS for k Qk Mult1 Clock 7 FU F0 Mult1 F2 M(45+R3) F4 F6 Add2 F8 Add1 F10 Mult2 F12 ... F30 RS Add1 completing; what is waiting for it? EECC551 - Shaaban #45 lec # 4 Winter 2002 12-16-2002 Tomasulo Example: Cycle 10 Instruction status Instruction j L.D F6 34+ L.D F2 45+ MUL.DF0 F2 SUB.D F8 F6 DIV.D F10 F0 ADD.D F6 F8 Reservation Stations Time Name 0 Add1 0 Add2 0 Add3 5 Mult1 0 Mult2 Register result status k R2 R3 F4 F2 F6 F2 Busy No Yes No Yes Yes Issue 1 2 3 4 5 6 Op Execution complete 3 4 7 10 S1 Vj S2 Vk M(45+R3) R(F4) M(34+R2) RS for j Qj RS for k Qk Write Result 4 5 8 Load1 Load2 Load3 Busy No No No Address ADDD M()M() MULTDM(45+R3) DIVD Mult1 Clock 10 FU F0 Mult1 F2 M(45+R3) F4 F6 Add2 F8 F10 F12 ... F30 M()M() Mult2 RS Add2 completing; what is waiting for it? EECC551 - Shaaban #46 lec # 4 Winter 2002 12-16-2002 Tomasulo Example: Cycle 11 Instruction status Instruction j k L.D F6 34+ R2 L.D F2 45+ R3 MUL.DF0 F2 F4 SUB.D F8 F6 F2 DIV.D F10 F0 F6 F8 F2 ADD.DF6 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 No 0 Add3 No 4 Mult1 Yes 0 Mult2 Yes Register result status Issue 1 2 3 4 5 6 Op Execution complete 3 4 7 10 S1 Vj S2 Vk Write Result 4 5 8 11 RS for j Qj RS for k Qk Load1 Load2 Load3 Busy No No No Address MULTD M(45+R3) DIVD R(F4) M(34+R2) Mult1 Clock 11 FU F0 Mult1 F2 M(45+R3) F4 F6 (M-M)+M() F8 F10 F12 ... F30 M()M() Mult2 Write back result of ADD.D in this cycle EECC551 - Shaaban #47 lec # 4 Winter 2002 12-16-2002 Tomasulo Example: Cycle 15 Instruction status Instruction j L.D F6 34+ L.D F2 45+ MUL.DF0 F2 SUB.D F8 F6 DIV.D F10 F0 ADD.DF6 F8 Reservation Stations Time Name 0 Add1 0 Add2 Add3 0 Mult1 0 Mult2 Register result status k R2 R3 F4 F2 F6 F2 Issue 1 2 3 4 5 6 Execution complete 3 4 15 7 10 S1 Vj S2 Vk Write Result 4 5 8 11 RS for j Qj RS for k Qk Load1 Load2 Load3 Busy No No No Address Busy Op No No No Yes MULTDM(45+R3) Yes DIVD R(F4) M(34+R2) Mult1 Clock 15 FU F0 Mult1 F2 M(45+R3) F4 F6 (MM)+M() F8 F10 F12 ... F30 M()M() Mult2 Mult1 completing; what is waiting for it? EECC551 - Shaaban #48 lec # 4 Winter 2002 12-16-2002 Tomasulo Example: Cycle 16 Instruction status Instruction j L.D F6 34+ L.D F2 45+ MUL.DF0 F2 SUB.D F8 F6 DIV.D F10 F0 ADD.DF6 F8 Reservation Stations Time Name 0 Add1 0 Add2 Add3 0 Mult1 40 Mult2 Register result status k R2 R3 F4 F2 F6 F2 Issue 1 2 3 4 5 6 Execution complete 3 4 15 7 10 S1 Vj S2 Vk Write Result 4 5 16 8 11 RS for j Qj RS for k Qk Load1 Load2 Load3 Busy No No No Address Busy Op No No No No Yes DIVD M*F4 M(34+R2) Clock 16 FU F0 M*F4 F2 M(45+R3) F4 F6 (MM)+M() F8 F10 F12 ... F30 M()M() Mult2 Only Divide instruction remains EECC551 - Shaaban #49 lec # 4 Winter 2002 12-16-2002 Tomasulo Example: Cycle 57 (vs 62 cycles for scoreboard) Instruction status Instruction j L.D F6 34+ L.D F2 45+ MUL.DF0 F2 SUB.D F8 F6 DIV.D F10 F0 ADD.DF6 F8 Reservation Stations Time Name 0 Add1 0 Add2 Add3 0 Mult1 0 Mult2 Register result status k R2 R3 F4 F2 F6 F2 Issue 1 2 3 4 5 6 Execution complete 3 4 15 7 56 10 S1 Vj Write Result 4 5 16 8 57 11 S2 Vk Busy No No No Address Load1 Load2 Load3 Busy Op No No No No No RS for j Qj RS for k Qk Instruction Block done Clock 57 FU F0 M*F4 F2 M(45+R3) F4 F6 (MM)+M() F8 F10 F12 ... F30 M()M() M*F4/M Again we have: In-oder issue, Out-of-order execution, completion EECC551 - Shaaban #50 lec # 4 Winter 2002 12-16-2002 Tomasulo Loop Example Loop: L.D MUL.D S.D DADDUI BNE F0, 0(R1) F4,F0,F2 F4, 0(R1) R1,R1,# -8 R1,R2,Loop ; branch if R1 = R2 Assume Multiply takes 4 clocks. Assume first load takes 8 clocks (possibly due to a cache miss), second load takes 4 clocks (hit). Assume R1 = 80 initially. Assume branch is predicted taken. We'll go over the execution of the first two loop iterations. (In Chapter 3.2) EECC551 - Shaaban #51 lec # 4 Winter 2002 12-16-2002 Loop Example Cycle 0 Instruction status Instruction j L.D F0 0 MUL.D F4 F0 S.D F4 0 L.D F0 0 MUL.D F4 F0 S.D F4 0 Reservation Stations Time Name 0 Add1 0 Add2 0 Add3 0 Mult1 0 Mult2 Register result status R1 Clock 0 80 k R1 F2 R1 R1 F2 R1 iteration 1 1 1 2 2 2 Issue Execution Write complete Result Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: L.D MUL.D S.D DADDUI BNE Busy Address No No No Qi No No No Busy Op No No No No No S1 Vj S2 Vk F0, 0(R1) F4,F0,F2 F4, 0(R1) R1, R1, #-8 R1,R2,loop F0 Qi F2 F4 F6 F8 F10 F12 ... F30 EECC551 - Shaaban #52 lec # 4 Winter 2002 12-16-2002 Loop Example Cycle 1 Instruction status Instruction j L.D F0 0 MUL.D F4 F0 S.D F4 0 L.D F0 0 MUL.D F4 F0 S.D F4 0 Reservation Stations Time Name 0 Add1 0 Add2 0 Add3 0 Mult1 0 Mult2 Register result status R1 Clock 1 80 k R1 F2 R1 R1 F2 R1 iteration 1 1 1 2 2 2 Issue 1 Execution Write complete Result Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: L.D MUL.D S.D DADDUI BNE Busy Yes No No No No No Address 80 Qi Busy Op No No No No No S1 Vj S2 Vk F0, 0(R1) F4,F0,F2 F4, 0(R1) R1, R1, #-8 R1,R2,loop F0 Qi Load1 F2 F4 F6 F8 F10 F12 ... F30 EECC551 - Shaaban #53 lec # 4 Winter 2002 12-16-2002 Loop Example Cycle 2 Instruction status Instruction j L.D F0 0 MUL.D F4 F0 S.D F4 0 L.D F0 0 MUL.D F4 F0 S.D F4 0 Reservation Stations Time Name 0 Add1 0 Add2 0 Add3 0 Mult1 0 Mult2 Register result status R1 Clock 2 80 k R1 F2 R1 R1 F2 R1 iteration 1 1 1 2 2 2 Issue 1 2 Execution Write complete Result Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: L.D MUL.D S.D DADDUI Load1 BNE Busy Yes No No No No No Address 80 Qi Busy Op No No No Yes MULTD No S1 Vj S2 Vk R(F2) F0, 0(R1) F4,F0,F2 F4, 0(R1) R1, R1, #-8 R1,R2,loop F0 Qi Load1 F2 F4 Mult1 F6 F8 F10 F12 ... F30 EECC551 - Shaaban #54 lec # 4 Winter 2002 12-16-2002 Loop Example Cycle 3 Instruction status Instruction j L.D F0 0 MUL.D F4 F0 S.D F4 0 L.D F0 0 MUL.D F4 F0 S.D F4 0 Reservation Stations Time Name 0 Add1 0 Add2 0 Add3 0 Mult1 0 Mult2 Register result status R1 Clock 3 80 k R1 F2 R1 R1 F2 R1 iteration 1 1 1 2 2 2 Issue 1 2 3 Execution Write complete Result Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: L.D MUL.D S.D DADDUI Load1 BNE Busy Address Yes 80 No No Qi Yes 80 Mult1 No No Busy Op No No No Yes MULTD No S1 Vj S2 Vk R(F2) F0, 0(R1) F4,F0,F2 F4, 0(R1) R1, R1, #-8 R1,R2,loop F0 Qi Load1 F2 F4 Mult1 F6 F8 F10 F12 ... F30 EECC551 - Shaaban #55 lec # 4 Winter 2002 12-16-2002 Loop Example Cycle 4 Instruction status Instruction j L.D F0 0 MUL.D F4 F0 S.D F4 0 L.D F0 0 MUL.D F4 F0 S.D F4 0 Reservation Stations Time Name 0 Add1 0 Add2 0 Add3 0 Mult1 0 Mult2 Register result status k R1 F2 R1 R1 F2 R1 iteration 1 1 1 2 2 2 Issue 1 2 3 Execution Write complete Result Load1 Load2 Load3 Store1 80 Store2 Store3 RS for j RS for k Qj Qk Code: L.D MUL.D S.D Load1 DADDUI BNE Busy Yes No No Yes No No Address 80 Qi Mult1 Busy Op No No No Yes MULTD No S1 Vj S2 Vk R(F2) F0, 0(R1) F4,F0,F2 F4, 0(R1) R1, R1, #-8 R1,R2,loop Clock 4 R1 72 F0 Qi Load1 F2 F4 Mult1 F6 F8 F10 F12 ... F30 EECC551 - Shaaban #56 lec # 4 Winter 2002 12-16-2002 Loop Example Cycle 5 Instruction status Instruction j L.D F0 0 MUL.D F4 F0 S.D F4 0 L.D F0 0 MUL.D F4 F0 S.D F4 0 Reservation Stations Time Name 0 Add1 0 Add2 0 Add3 0 Mult1 0 Mult2 Register result status R1 Clock 5 72 k R1 F2 R1 R1 F2 R1 iteration 1 1 1 2 2 2 Issue 1 2 3 Execution Write complete Result Load1 Load2 Load3 Store1 80 Store2 Store3 RS for j RS for k Qj Qk Code: L.D MUL.D S.D Load1 DADDUI BNE Busy Yes No No Yes No No Address 80 Qi Mult1 Busy Op No No No Yes MULTD No S1 Vj S2 Vk R(F2) F0, 0(R1) F4,F0,F2 F4, 0(R1) R1, R1, #-8 R1,R2,loop F0 Qi Load1 F2 F4 Mult1 F6 F8 F10 F12 ... F30 EECC551 - Shaaban #57 lec # 4 Winter 2002 12-16-2002 Loop Example Cycle 6 Instruction status Instruction j L.D F0 0 MUL.D F4 F0 S.D F4 0 L.D F0 0 MUL.D F4 F0 S.D F4 0 Reservation Stations Time Name 0 Add1 0 Add2 0 Add3 0 Mult1 0 Mult2 Register result status R1 Clock 6 72 k R1 F2 R1 R1 F2 R1 iteration 1 1 1 2 2 2 Issue 1 2 3 6 Execution Write complete Result Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: L.D MUL.D S.D Load1 DADDUI BNE Busy Yes Yes No Yes No No Address 80 72 Qi 80 Mult1 Busy Op No No No Yes MULTD No S1 Vj S2 Vk R(F2) F0, 0(R1) F4,F0,F2 F4, 0(R1) R1, R1, #-8 R1,R2,loop F0 Qi Load2 F2 F4 Mult1 F6 F8 F10 F12 ... F30 Note: F0 never sees Load1 result EECC551 - Shaaban #58 lec # 4 Winter 2002 12-16-2002 Loop Example Cycle 7 Instruction status Instruction j L.D F0 0 MUL.D F4 F0 S.D F4 0 L.D F0 0 MUL.D F4 F0 S.D F4 0 Reservation Stations Time Name 0 Add1 0 Add2 0 Add3 0 Mult1 0 Mult2 Register result status k R1 F2 R1 R1 F2 R1 iteration 1 1 1 2 2 2 Issue 1 2 3 6 7 S1 Vj Execution Write complete Result Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: L.D MUL.D S.D Load1 DADDUI Load2 BNE Busy Yes Yes No Yes No No Address 80 72 Qi 80 Mult1 Busy Op No No No Yes MULTD Yes MULTD S2 Vk R(F2) R(F2) F0, 0(R1) F4,F0,F2 F4, 0(R1) R1, R1, #-8 R1,R2,loop Clock 7 R1 72 F0 Qi Load2 F2 F4 Mult2 F6 F8 F10 F12 ... F30 EECC551 - Shaaban #59 lec # 4 Winter 2002 12-16-2002 Loop Example Cycle 8 Instruction status Instruction j L.D F0 0 MUL.D F4 F0 S.D F4 0 L.D F0 0 MUL.D F4 F0 S.D F4 0 Reservation Stations Time Name 0 Add1 0 Add2 0 Add3 0 Mult1 0 Mult2 Register result status R1 Clock 8 72 k R1 F2 R1 R1 F2 R1 iteration 1 1 1 2 2 2 Issue 1 2 3 6 7 8 S1 Vj Execution Write complete Result Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: L.D MUL.D S.D DADDUI Load1 BNE Load2 Busy Yes Yes No Yes Yes No Address 80 72 Qi 80 Mult1 72 Mult2 Busy Op No No No Yes MULTD Yes MULTD S2 Vk R(F2) R(F2) F0, 0(R1) F4,F0,F2 F4, 0(R1) R1, R1, #-8 R1,R2,loop F0 Qi Load2 F2 F4 Mult2 F6 F8 F10 F12 ... F30 EECC551 - Shaaban #60 lec # 4 Winter...

Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

San Jose State - CN - 1412
RIT - EECC - 551
So far we have dealt with data hazards in instruction pipelines by: Result forwarding and bypassing to reduce latency and hide or reduce the effect of true data dependence. Hazard detection hardware to stall the pipeline starting with the instruc
RIT - EECC - 551
Static Conditional Branch Prediction Branch prediction schemes can be classified into static and dynamic schemes. Static methods are usually carried out by the compiler. They are static because the prediction is already known before the program is e
RIT - EECC - 551
Evolution of Processor PerformanceMulti-cycle Pipelined (single issue) Multiple Issue (CPI <1) Superscalar/VLIWCPI> 101.1-100.5 - 1.1.35 - .5 (?)Source: John P. Chen, Intel LabsEECC551 - Shaaban#1 lec # 6 Winter 2002 1-8-2003Multipl
RIT - EECC - 551
Evolution of Processor PerformanceMulti-cycle Pipelined (single issue) Multiple Issue (CPI <1) Superscalar/VLIWCPI> 101.1-100.5 - 1.1.35 - .5 (?)Source: John P. Chen, Intel LabsEECC551 - Shaaban#1 lec # 6 Winter 2002 1-8-2003Multipl
RIT - EECC - 551
Static Compiler Optimization Techniques We already examined the following static compiler techniques aimed at improving pipelined CPU performance: Static pipeline scheduling (in ch 4.1). Loop unrolling (ch 4.1). Static branch prediction (in ch 4.
RIT - EECC - 551
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row (~every 8 msec). Static RAM may be used for main memory if the added expense, low
RIT - EECC - 551
Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row (~every 8 msec). Static RAM may be used for main memory if the added expense, low density, hi
RIT - EECC - 551
Types of Cache Misses: The Three C's1 Compulsory: On the first access to a block; the blockmust be brought into the cache; also called cold start misses, or first reference misses.2 Capacity: Occur because blocks are being discardedfrom cache b
RIT - EECC - 551
Types of Cache Misses: The Three C's1 Compulsory:On the first access to a block; the block must be brought into the cache; also called cold start misses, or first reference misses. Occur because blocks are being discarded from cache because cache c
RIT - EECC - 551
Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory if the added expense, low density, high po
RIT - EECC - 551
Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory if the added expense, low density, high power consump
RIT - EECC - 551
Input/Output & System Performance Issues System I/O Connection Structure Types of Buses in the system. I/O Data Transfer Methods. Cache & I/O. I/O Performance Metrics. Magnetic Disk Characteristics. I/O System Modeling Using Queuing Theory.
RIT - EECC - 551
Input/Output & System Performance Issues System I/O Connection Structure Types of Buses in the system. I/O Data Transfer Methods. Cache & I/O. I/O Performance Metrics. Magnetic Disk Characteristics. I/O System Modeling Using Queuing Theory.
RIT - EECC - 551
Magnetic Disk Characteristics I/O Connection Structure Types of Buses Cache & I/O I/O Performance Metrics I/O System Modeling Using Queuing Theory Designing an I/O System RAID (Redundant Array of Inexpensive Disks) I/O Benchmarks ABCs o
RIT - EECC - 551
Magnetic Disk Characteristics I/O Connection Structure Types of Buses Cache & I/O I/O Performance Metrics I/O System Modeling Using Queuing Theory Designing an I/O System RAID (Redundant Array of Inexpensive Disks) I/O Benchmarks ABCs o
San Jose State - CN - 1412
RIT - EECC - 551
EECC551 Review Instruction In-order Floating Point/Multicycle Pipelining Instruction-Level Parallelism (ILP). Loop-unrolling Dynamic Pipeline Scheduling. The Tomasulo Algorithm Dynamic Branch Prediction. Multiple Instruction Issue (CPI < 1): S
San Jose State - CN - 1412
San Jose State - CN - 1412
San Jose State - CN - 1412
San Jose State - CN - 1412
San Jose State - CN - 1412
Stevens - CS - 540
CS 540 Quantitative Software EngineeringLecture 1 Think like an engineerRoadmapq q q q q q qMy history Class mechanics Quantitative Software Engineering Software Process Models CMM and the SEI Ethics of our profession Reading: first chapter
Stevens - CS - 540
CS 540 Quantitative Software EngineeringLecture Summary 1-11Objectives, Projects, Requirements, Design, Prototyping, Architecture, Estimation, Risk, Usability, Construction, and TestingHistory: Software Crisis PersistsqThe software crisis,
Stevens - CS - 540
CS 540 Quantitative Software EngineeringLecture 2 People, Process, Product, ProjectPeopleSoftware Trustworthiness depends on people: I propose that customers insist that software products identify a Software Architect and Software Project Mana
Stevens - CS - 540
CS 540 Quantitative Software EngineeringLecture 3 RequirementsSoftware Requirements Processq q q q q qRequirements Elicitation Requirements Analysis Use Cases Requirements Specification Prototype/Modeling Requirements ManagementHighlights
Stevens - CS - 540
CS 540 Quantitative Software EngineeringLecture 4 PrototypingSoftware Requirements Processq q q q q qRequirements Elicitation Requirements Analysis Use Cases Requirements Specification Prototype/Modeling Requirements ManagementSerial devel
Stevens - CS - 540
CS 540 Quantitative Software EngineeringLecture 7 EstimationEstimate size, then Estimate effort, schedule and cost from size Bound estimatesProject Metrics: Why Estimate?q q q q q qCost and schedule estimation Measure progress Calibrate mod
Stevens - CS - 540
CS 540 Quantitative Software EngineeringLecture 8 Risks & MetricsRisk Exposure = Prob(failure) x Cost of FailureSoftware RiskqRisk is the possibility that an undesirable event in the life of a software project can happen Requires uncertain
Stevens - CS - 540
CS 540 Quantitative Software EngineeringLecture 9 Metrics and Ease of UseFun, Function and FreedomHistorical Background from Hardware to Softwareq q q q q q qMan machine interface Cognitive Psychology Human Factors engineering Human-Compute
Stevens - CS - 540
CS 540 Quantitative Software EngineeringLecture 11 Testing, Verification, Validation and CertificationYou can't test in quality Independent system testersSoftware Quality vs. Software TestingqSoftware Quality Management (SQM) refers to proc
Stevens - CS - 540
CS 540 Fall 2007CS 540: Fundamentals of Quantitative Software Engineering (Issue August 07)CS 540A Fundamentals of QSE I Call #: Instructor: Ogden, G. D. 10247 Credits: 3.00 - 3.00 Session: Normal Academic Term 08/29/2007-12/22/2007 Enrollment Cap
Oregon State - SR - 1053
Table 1. Vectors used in transformation experiments. Vector name GFP type Antibiotic resistance pAGR2 DsRed ApR, GmR pAGC cyanFP ApR, GmR pAGY yellowFP ApR, GmR pAGY408 GFPuv ApR, KmR, GmR gfp+ pUT mini-Tn5gfp GFPmut2 ApR pFP6301 GFPuv KmR pFP6158 Ds
Oregon State - SR - 1053
Table 2b. Recovery of Xcc from foliage and roots from Field B of two seed fields in central Oregon, mid-August 2003. Field B Foliage Foliage Roots Roots Plant # CFU/g dry wt Log[CFU/g dry wt] CFU/g dry wt Log[CFU/g dry wt 1 0 0 2 17.4 x 107 7.9 0 3 0
Oregon State - SR - 1053
Table 1. Effect of herbicides on stand reduction of native grass species grown for seed, Madras, Oregon, 2000-2002.Herbicide Axiom Beacon Clarity Diuron Frontier Goal Kerb Maverick Sencor Sinbar Surflan untreated1 2Rate per acre 11 oz 0.76 oz 4 p
Oregon State - SR - 1053
Table 4. Effect of herbicides on reduced heading of native grass species grown for seed, Madras, Oregon, 2000-2002.Herbicide Axiom Beacon Clarity Diuron Frontier Goal Kerb Maverick Sencor Sinbar Surflan untreated1 2Rate per acre 11 oz 0.76 oz 4 p
Oregon State - SR - 1053
Table 2. Effect of application timing for Beacon on seed set in rough bluegrass cultivars near Madras, Oregon, 2002-2003. Treatment Rate Application date Percent reduction in seed set `Sabre' `Laser' Beacon 0.75 oz Oct 8 0.0 b1 8.3 Beacon 0.75 oz Nov
San Jose State - CN - 1411
Oregon State - SR - 1053
Table 1. Carrot seed crops monitored for development of Xanthomonas campestris pv. carotae in 20012002 and 2002-2003 in central Oregon and central Washington.Years, state & field code Type of carrot 2001/02 Oregon OA Nantes OB Nantes OC Nantes x Ams
Oregon State - SR - 1053
Frequency of Plants on whichFrequency of Plants on which Xanthomonas campestris pv. carotae was Detected20Xanthomonas campestris detectedpv. carotae was20 18 16 14 12Incidence (of 20 plants sampled)18 16 14 12 10 8 6 4 2 0 WA WB WC WDJul
Oregon State - SR - 1053
Population of Xanthomonas campestris pv. carotae Detected on Infected Plants from Washington Carrot Fields 1.00E+09 1.00E+08 1.00E+07Population of Xanthomonas campestris pv. carotae Detected on Infected Plants from Washington Carrot Fields1.00E+1
Oregon State - SR - 1053
Ave incidence infected plants - ORIncidence of plants with X. campestris carotae: OR 2002/0320 1820 18Incidence (of 20 plants sampled)1616 14 12 10 01 OctNov 01Incidence (of 20 plants sampled)14 12 10 8 6 4 2 0 OA OB OC OD OE8 02 Ap
Oregon State - SR - 1053
1.0E+01 9.0E+00 8.0E+00 7.0E+00log avg cfu/g dry wt1.0E+09 1.0E+08Log (cfu/g dry tissue) of infected plants1.0E+07 1.0E+06 1.0E+05 1.0E+04 1.0E+03 1.0E+02 1.0E+01 1.0E+006.0E+00 5.0E+00 4.0E+00 3.0E+00 2.0E+00 1.0E+00 0.0E+00 OA OB OC ODAu
Oregon State - SR - 1053
Table 2. Mean incidence of plants on which Xanthomonas campestris pv. carotae was detected in direct-seeded and steckling carrot seed crops sampled through the 2001-2002 and 2002-2003 seasons in central Oregon and central Washington. State & method o
Oregon State - SR - 1053
1.00E+09 1.00E+08 1.00E+07 Stock Seed Harvest Seed1.0E+09 1.0E+08 1.0E+07log cfu/g seed1.00E+05 1.00E+04 1.00E+03 1.00E+02 1.00E+01 1.00E+00Ave cfu/g seedOA OB OE1.00E+061.0E+06 1.0E+05 1.0E+04 1.0E+03 1.0E+02 1.0E+01 1.0E+00OAOCOG
Oregon State - SR - 1053
Table 1. Soil test results from samples taken on March 15, 2003, state-wide variety test trial, at Central Oregon Agricultural Research Center, Madras, Oregon. NH4 P K S Soil depth pH NO3 (lb/a) (lb/a) (ppm) (ppm) (ppm) 0-12 7.8 18 17 38 333 18.4 12-
Oregon State - SR - 1053
Table 1. Soil test results from samples taken on March 15, 2003, state-wide variety test trial, at Central Oregon Agricultural Research Center, Madras, Oregon. NH4 P K S Soil depth pH NO3 (lb/a) (lb/a) (ppm) (ppm) (ppm) 0-12 7.8 18 17 38 333 18.4 12-
Oregon State - SR - 1053
Table 4. Peppermint performance in noninfested and infested variety trials OSU-Central Oregon Agricultural Research Center, Madras, 2003. Proportion of plot with healthy mint at harvest3 7/29/03 (%) 85 ab 96 a 100 a 98 a 99 a 88 ab 94 ab 31 c 73 bN
Oregon State - SR - 1053
Table 5. First year peppermint oil partial compositional analysis from infested and noninfested variety trial, Central Oregon Agricultural Research Center, Madras, 2000. Variety APIN1 (%) Infested Black Mitcham Todds 84M0107-7 M90-11 87M0109-1 KcKell
Oregon State - SR - 1053
Table 2. Peppermint performance in noninfested and infested variety trials OSU-Central Oregon Agricultural Research Center, Madras, 2001.Non-infested Proportion of plot with healthy mint at harvest3 8/1/01 (%) 91 96 94 94 95 85 91 50 87 Hay yield dr
Oregon State - SR - 1053
Table 3. Peppermint performance in noninfested and infested variety trials OSU-Central Oregon Agricultural Research Center, Madras, 2002.Non-infested Stand1 4/22/02 (%) 100 100 100 100 100 100 100 85 100 Non-wilted2 7/29/02 (%) 94.8 a 96.5 a 99.5 a
Oregon State - SR - 1053
Table 7. Third year peppermint oil partial compositional analysis from infested and noninfested variety trials OSU-Central Oregon Agricultural Research Center, Madras, 2002. Variety APIN1 SAB BPIN OCT EUC LIM (%) Infested Black Mitcham Todds 84MO107-
Oregon State - SR - 1053
Table 8. Fourth year peppermint oil partial compositional analysis from infested and noninfested soil, Central Oregon peppermint variety trial, Madras, 2003. Variety APIN1 SAB BPIN OCT EUC (%) Infested Black Mitcham Todds 84M0107-7 M90-11 87M0109-1 K
Oregon State - SR - 1053
Figure 1. Spring plant stand (percent plot area with mint ground cover) in noninfested and infested variety trials at OSU-Central Oregon Agricultural Research Center, Madras, 2002-2003.Black Mitchem120120McKellip 98120Todds120B90-91001
Oregon State - SR - 1053
Table 1. Soil test analyses from alfalfa variety trial soil samples taken at the Central Oregon Agricultural Research Center, Madras, Oregon.Date 1998 2000 2002 3/2003 Depth (in) 0-10 0-12 pH 6.7 6.7 7.4 P (ppm) 25 29 29 K (ppm) 539 215 386 Ca (meq
Oregon State - SR - 1053
Table 3. The fall dormancy, winter hardiness, disease, insect, and pest ratings for the 1998 planted fall dormancy alfalfa variety trial conducted at COARC, Madras, OR.Variety FD Bw Vw Fw An PRR SAA PA BAA SN APH SNKN NRKN RLN Spreador 1 4 1 4 1 1 4
Oregon State - SR - 1053
Table 4. Total yield results of the fall dormancy alfalfa trial planted on August 25, 1998 at the Central Oregon Agricultural Research Center at Madras, Oregon. Fall dormancy 1999 total yield (ton/acre) 8.74 8.66 8.50 8.51 8.56 8.47 8.57 NS NS NS 0.7
Oregon State - SR - 1053
Table 6. 1999 yield results of the fall dormancy alfalfa trial planted on August 25, 1998 at the Central Oregon Agricultural Research Center at Madras, Oregon. Fall dormancy 1 2 3 4 5 6 Mean PLSD 0.01 PLSD 0.05 PLSD 0.10 Pr. > F CV% Harvest date 1st
Oregon State - SR - 1053
Table 7. 2000 yield results of the fall dormancy alfalfa trial planted on August 25, 1998 at the Central Oregon Agricultural Research Center at Madras, Oregon.Fall dormancy 1 2 3 4 5 6 Mean PLSD 0.01 PLSD 0.05 PLSD 0.10 Pr. > F CV% Harvest date1s
Oregon State - SR - 1053
Table 9. 2002 yield results of the fall dormancy alfalfa trial planted on August 25, 1998 at the Central Oregon Agricultural Research Center at Madras, Oregon. Fall dormancy 1 2 3 4 5 6 Mean PLSD 0.01 PLSD 0.05 PLSD 0.10 Pr. > F CV% Harvest deate 1st