This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: CSEE w4824 Homework 4 Handout 14 Prof. Luca Carloni November 2, 2011 This homework is due at the beginning of class on Wednesday, November 16 ( two-day extension ). A correct answer without adequate explanation or derivation will have points deducted. To get full credit, • write legibly/type, and • show all work (label relevant items, show derivations, include explanations). Problem 1 . Tomasulo’s Algorithm. (15 points) . In this problem, you are to simulate the following MIPS code fragment using a processor architecture based on Tomasulo’s Algorithm : ADD R2, R0, R0 SUB.D F8, F8, F9 ADD.D F7, F7, F9 MUL.D F3, F8, F7 L.D F4, 0(R2) SUB.D F4, F4, F7 S.D F4, 0(R4) L.D F6, 0(R8) MUL.D F1, F9, F3 BNE R2, R0, target ADDI R2, R2, #8 MUL.D F1, F7, F5 target: MUL.D F4, F1, F6 L.D F6, 0(R4) MUL.D F2, F4, F6 S.D F4, 0(R2) ADD.D F4, F1, F6 Make the following assumptions: • Assume to have an architecture implementing Tomasulo’s algorithm as in the figure above, which is the same as H&P Fig. 2.9. Even though the figure does not show them, you should assume that one integer function unit is present to execute ALU instructions. • Assume that there is also dedicated branch-execution unit and that the branch is resolved in the ’Execution’ stage and, therefore, it does not have to go through the ’Write’ stage. • Assume that there is no branch prediction: i.e. each subsequent instructions waits for the branch to be resolved before being issued. • Assume that the bandwidth of the common data bus allows the broadcasting of only one instruction per cycle from a functional unit to the other functional units. • Assume the following execution times: instruction cycles integer ALU 1 branch 1 load/stores 1 FP addition 3 FP multiplication 5 Do the followings: • (a). Analyze carefully the code fragment to determine which instructions are executed. Simulate the code frag- ment on the given architecture and fill out Table 1 for those instructions that are executed by indicating the clock cycle number(s) in each step in which the instruction is active (not stalled). Use a distinct alphabet letter to denote each stall with a unique identifier. For instance, if a FP addition should have been issued at the third cycle but got stalled a couple of cycles, then it is issued at the sixth cycle, stalls another couple of cycles before being executed from the eighth to the tenth cycle, and, finally, the result is written in the eleventh cycle, then you should write as follows (where a and b denote the identifiers of the two different stalls):...
View Full Document
This note was uploaded on 11/12/2011 for the course CSEE 4824 taught by Professor Carloni during the Fall '11 term at Columbia.
- Fall '11