fa09_cs433_hw2_sol - CS433: Computer Systems Organization...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
CS433: Computer Systems Organization Fall 2009 Homework 2 Assigned: Sept/15 Due in class Sept/29 Total points: 40 for undergraduate students, 44 for graduate students. Instructions: Please write your name, NetID and an alias on your homework submissions for posting grades (If you don’t want your grades posted, then don’t write an alias). We will use this alias throughout the semester. Homeworks are due in class on the date posted. Problem 1: Data dependence (8 points) Here is an unusual loop. First, list the dependences (output, anti and true) and then rewrite the loop so that it is parallel. for (i = 1; i < 100; i = i + 1) { a[i] = b[i] + c[i]; // S1 b[i] = a[i] + d[i]; //S2 a[i + 1] = a[i] + e[i]; //S3 } There are five dependences in the loop. True dependence from S1 to S2 on a. Anti dependence from S1 to S2 on b. Loop-carried true dependence from S3 to S2 on a. Loop-carried true dependence from S3 to S3 on a. Loop-carried output dependence from S3 to S1 on a. To parallelize the loop, we simply “break” all of the loop carried dependences. The loop-carried output dependence can be removed by the compiler through renaming while the loop-carried true dependences can be removed if the compiler modifies the code. Here is the parallel version of the loop: for (i = 1; i <= 100; i = i + 1) { a[i] = b[i] + c[i]; // S1 b[i] = a[i] + d[i]; //S2 } Looking at the original code, it should be apparent that S3 doesn’t do any useful work in the loop. This transformed version of the loop is functionally equivalent to the original loop.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Problem 2: Tomasulo's algorithm (12 points) This exercise examines Tomasulo’s algorithm on a simple loop operation. Consider the following code fragment: LOOP: L.D F2, 0(R1) L.D F4, 8(R1) DIV.D F6, F2, F4 MUL.D F8, F6, F6 ADD.D F6, F2, F4 MUL.D F10, F6, F6 S.D F8, 0(R1) S.D F10, 8(R1) DADDI R1, R1, 16 BNEZ R1, LOOP 1. The pipeline functional units are described by the following table FU type Cycles in EX #of FU’s # of Reservation Stations Integer 1 1 5 FP add/subtract 4 1 4 FP multiply/divide 15 2 4 2. Functional units are NOT pipelined (i.e., if one instruction is using the functional unit, another instruction cannot enter it). 3. All stages except EX take one cycle to complete. 4. There is no forwarding between functional units. Both integer and floating point results are communicated through the CDB. 5. Memory accesses use the integer functional unit to perform effective address calculation. All loads and stores will access memory during the EX stage. Pipeline stage EX does both the effective address calculation and memory access for loads/stores. 6.
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 8

fa09_cs433_hw2_sol - CS433: Computer Systems Organization...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online