Computer Science and Engineering 431 Spring 2017 PRACTICE EXAM SOLUTIONS Tuesday, February 28 th Exam time: 50 minutes/part x 2 parts Test Value: XX pts. Total possible points: YY ( max score = ZZ%) 1. Understanding Performance (P pts) Assume that machines A, B, C, and D implement the same ISA. a) Machine A runs at frequency F A and Machine B runs at frequency F B . For a given program P1 on input FOO, Machine A executes I A instructions with CPI C A and Machine B executes I B instructions with CPI C B . If Machine A has twice the performance of Machine B when executing P1(FOO), what is the ratio of frequencies of Machine A/Machine B expressed in terms of the other variables? If A has twice the performance of B, then E(A) = E(B)/2. Via the execution time equation (and some algebra): 2 ( I A * CPI A * 1/F A ) = I B * CPI B * 1/F B ) . F A /F B = 2 I A *CPI A /( I B * CPI B ) b) For a program P2 on input QUUX, Machine C and Machine D have the same performance. For a different input to program P2, BAR, Machine C is twice as fast as Machine D despite the same number of instructions being executed on all machines for all inputs. Of instruction count, frequency and CPI, explain which factor(s) are most likely responsible for the difference in performance as a function of inputs to P2, and why. Describe at least one plausible cause for the difference. There are multiple plausible answers here. We do, however, know that ISA and instruction count are both given to be the same, so that is not where the divergence comes from. Similarly, we can assume that the frequency of C and D does not change as a function of input. Thus, one or both of C and D have a different CPI when executing P2 as a function of the input. There are many possible causes for this. Among them are: Different cache hit rates as a function of input (for small inputs, differences in cache size/organization may not be exposed the same way as for large), different paths taken between the two inputs exacerbate microarchitecture differences (i.e. a slower functional unit is only heavily used on one of the two input paths – at its most extreme, different inputs can effectively cause the execution of a fundamentally different program, such as with bzip2 being the binary for both compression and decompression tasks.
2. Pipelining and data hazards (P pts) a) Consider the following sequence of instructions scheduled on a five stage, scalar MIPS pipeline. In the table below, for each cycle, indicate the cycle an instruction completes a stage with a capital letter (FDXMW) and indicate stalls with a lower case letter (fdxmw) in a circle. Assume full forwarding/bypass networks. Indicate by drawing a vertical arrow when a value is bypassed from one instruction to another in the cycle that the forwarding occurs . For simplicity, elide W à D bypassing arrows. Use the grid on the following page. Assume all loads/stores are 1-cycle hits and that there are no exceptions.
