Lec13c - 1 COMP 4300 Computer Architecture...

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 COMP 4300 Computer Architecture Instruction-level parallelism: Tomasulo (cont.) Dr. Xiao Qin Auburn University http://www.eng.auburn.edu/~xqin [email protected] Fall, 2010 2 Tomasulo’s Algorithm • Tracks when operands for instructions are available. – Minimizes RAW hazards. – Introduces register renaming to minimize WAW and WAR hazards. • Supports the overlapped execution of multiple iterations of a loop. 3 Tomasulo Organization FP adders FP adders Add1 Add2 Add3 FP multipliers FP multipliers Mult1 Mult2 From Mem FP Registers Reservation Stations Common Data Bus (CDB) To Mem FP Op Queue Load Buffers Store Buffers Load1 Load2 Load3 Load4 Load5 Load6 Normal data bus: data + destination Common data bus: data + source 4 Three Stages of Tomasulo Algorithm 1. Issue —get instruction from FP Op Queue If reservation station free (no structural hazard), control issues instr & sends operands (renames registers). 2. Execute —operate on operands (EX) When both operands ready then execute; if not ready, watch Common Data Bus for result 3. Write result —finish execution (WB) Write on Common Data Bus to all awaiting units; mark reservation station available • Normal data bus: data + destination (“go to” bus) • Common data bus : data + source (“ come from ” bus) – 64 bits of data + 4 bits of Functional Unit source address – Write if matches expected Functional Unit (produces result) – Does the broadcast • Example speed: 3 clocks for Fl .pt. +,-; 10 for * ; 40 clks for / 5 Reservation Station Components Op : Operation to perform in the unit (e.g., + or –) Vj , Vk : Value of Source operands – Store buffers has V field, result to be stored Qj , Qk : Reservation stations producing source registers (value to be written) – Note: Qj,Qk=0 => ready – Store buffers only have Qi for RS producing result Busy : Indicates reservation station or FU is busy Register result status —Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register. 6 Tomasulo Example Instruction status: E xec Write Instruction j k Issue Com p R esult Busy Address LD F6 34+ R2 Load1 No LD F2 45+ R3 Load2 No MULTD F0 F2 F4 Load3 No SUBD F8 F6 F2 DIVD F10 F0 F6 ADDD F6 F8 F2 Reservation Stations: S1 S2 R S R S Time Name B usy O p Vj Vk Q j Q k Add1 No Add2 No Add3 No Mult1 No Mult2 No Register result status: Clock F0 F2 F4 F6 F8 F10 F12 ... F30 FU Clock cycle counter FU count down Instruction stream 3 Load/Buffers 3 FP Adder R.S. 2 FP Mult R.S. 7 Scoreboard Example Instruction status Read ExecutioWrite Instruction j k Issue operand completeResult LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 MULTDF0 F2 F4 6 SUBD F8 F6 F2 7 DIVD F10 F0 F6 ADDD F6 F8 F2 Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?...
View Full Document

{[ snackBarMessage ]}

Page1 / 28

Lec13c - 1 COMP 4300 Computer Architecture...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online