{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Homework5 02

# Homework5 02 - LSU EE 4720 Homework 5 Solution Due 3...

This preview shows pages 1–2. Sign up to view the full content.

Solution To answer the questions below you need to use the PSE dataset viewer program. PSE (pro- nounced see) runs on Solaris and Linux; you can use the computer accounts distributed in class to run it, a Linux distribution may also be provided for running it on other systems. Procedures for setting up the class account and using PSE are at http://www.ece.lsu.edu/ee4720/proc.html ; preliminary documentation for PSE is at http://www.ece.lsu.edu/ee4720/pse.pdf . Near the beginning of the semester the performance of a program to compute π was evaluated with and without optimization. It’s back, down below. Follow instructions referred to above to view the execution of the optimized and unoptimized versions of the pi program running on a simulated 4-way dynamically scheduled superscalar machine with a 48-instruction reorder buffer. The datasets to use are pi_opt.ds and pi_noopt.ds . ( a ) Based on the pipeline execution diagram compute the CPI of the main loop for a large number of iterations in the optimized version. Do not use the IPC displayed by PSE, instead base it on the PED. In your answer describe how the CPI was determined. To find the precise CPI first find a repeating pattern. Fortunately, once the branch predictor warms up and the ROB fills each iteration is identical so a unit of the repeating pattern is one iteration long. One such iteration (not the first) starts at cycle (time) 339, the next starts at 345, for a time of 6 cycles. There are 9 instructions (including the nop ), so the CPI is 6 9 = 2 3 . ( b ) Consider first the optimized version of the program. Would it run faster with a larger reorder buffer? Would it run faster on an 8-way superscalar machine? How else might the processor be modified to improve performance? Explain each answer. An important feature to notice is that, except for nop , instructions wait many cycles before executing. All of the waiting instructions are waiting for operands and so execution time is limited by the critical path through the code. (No instruction in the loop waits for a functional unit, there are enough for this loop.) Grid 20 insn X 5 cyc Rank: 4/7 Pos. 1/7 0.76 IPC over 38 cycles.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 5

Homework5 02 - LSU EE 4720 Homework 5 Solution Due 3...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online