hw2 - CS 6143 COMPUTER ARCHITECTURE II HOMEWORK II FALL...

This preview shows pages 1–3. Sign up to view the full content.

CS 6143 COMPUTER ARCHITECTURE II FALL 2010 HOMEWORK II Polytechnic Institute of NYU Page 1 of 20 Handout No : 5 September 22, 2010 DUE : October 13, 2010 READ : ‚ Related portions of Chapters 2, 3 and Appendix A of the Hennessy book ‚ Related portions of Chapter 7 of the Jordan book ASSIGNMENT: There are three problems. Solve all homework and exam problems as shown in class and past exam solutions 1) Consider the piece of code studied in Problem 4 of Homework I. This code is for the DAXPY application we discussed in class : Assume that this is machine model number 3 : The MIPS uses the Tomasulo algorithm of the Hennessy book and as discussed class. Additional assumptions are as follows : The functional unit timings are as listed on A-72 of the Hennessy book : ADD.D, MUL.D and DIV.D take 3, 11 and 41 clock periods, respectively ; the number of reservation station buffers for FP operations is as given in class ; there are enough number of CDB buses to eliminate bottlenecks ; Branch instructions take 2 clock periods, but there is no delayed branch ; there are enough functional units for integer instructions not to cause stalls ; Store instructions complete in the WR stage ; there is a perfect memory with no stalls. In which clock period, will the second iteration of the loop be completed, assuming there are just two iterations ? That is, what is the last clock period in which the Write-Result stage of an instruction from the second iteration be done. Show the forwardings and write-in-the-first-half- read-in-the-second-half cases among the instructions. To answer the question, continue with the following table : Instruction IF ID EX WR L.D F0, 0(R1) 1 2 3-4 5 MUL.D F0, F0, F2 2 3 4/5 - 15 16 Continue ... ... ... ... loop : L.D F0, 0(R1) ; load X[i] MUL.D F0, F0, F2 ; multiply a * X[i] L.D F4, 0(R2) ; load Y[i] ADD.D F0, F0, F4 ; add a * X[i[ + Y[i] S.D F0, 0(R2) ; store Y[i] DADDI R1, R1, #(-8) 10 ; decrement X index DADDI R2, R2, #(-8) 10 ; decrement Y index BNEZ R1, loop ; loop if not done

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Polytechnic Institute of NYU Page 2 of 20 CS6143 Handout No : 5 September 22, 2010 2) Consider the same DAXPY code given in Problem 1 above again. The MIPS is implemented as the scalar hardware-speculative Tomasulo algorithm machine dis- cussed in class : Machine model number 4. One (1) instruction can be committed per cycle. a) Assume that the functional unit timings are as listed in Figure 2.2 on page 75 of the Hennessy book ; the number of reservation station buffers for FP operations is as given in class ; there is a Branch Unit in the EX stage for calculating its effective address and determining the condition ; there is also additional branch prediction hardware in and out of the pipeline ; there are enough functional units for integer instructions not to cause stalls ; the L1 cache memories take one clock period each and there are no cache misses.
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 02/02/2011 for the course CS 6143 taught by Professor Hadimioglu during the Fall '10 term at NYU Poly.

Page1 / 20

hw2 - CS 6143 COMPUTER ARCHITECTURE II HOMEWORK II FALL...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online