final09-solutions

final09-solutions - 1 Prof. Martin Tuesday, Dec. 22, 2009...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 Prof. Martin Tuesday, Dec. 22, 2009 CIS501 Computer Architecture Final Exam Solutions 1. [ 7 Points ] True or False . If a statement is false, briefly explain how so by describing how the statement may most simply be made true. Unjustified (or poorly justified) false answers will be marked wrong. Simply stating the negation of the false statement is not sufficient justification. Please be specific! (a) The Alpha 21264 architecture is very dynamic. Answer: True (b) Moores law predicts an exponential improvement in transistor switching speeds over time. Answer: False. Mores law predicts an exponential increase in the number of transistors per chip. (c) The cost to manufacture a chip is proportional to the area of the chip. Answer: False. Because yields decrease with chip size (due to defects), the cost of a chip is super-linear in the area of the chip. (d) A compiler optimization can increase performance yet hurt CPI. Answer: True (e) Clustering (such as that used by the Alpha 21264) is one approach for tackling the n 2 problem of dependence cross checking logic. Answer: False. Clustering tackles the n 2 bypassing and register file port problems, but does not address the dependence checking issue. (f) Assuming similar pipeline depths, high branch prediction accuracy is generally more important in a superscalar (multiple-issue) processor than in a scalar (single-issue) processor. Answer: True. (g) An easy way to exploit data parallelism in your programs is to call library code that has been optimized to use vector instructions. Answer: True 2. [ 8 Points ] Multiplying Performance . (a) Consider a computation that consists of calculating the sum of thousands of integers. Using a simple single-cycle datapath as a starting point, what are four techniques or approaches we discussed that each can increase performance by a factor of four or more (that is, altogether these four techniques could result in a 256x speedup!) (Single word answers are sufficient.) Answer: (1) pipelining (2) instruction-level parallelism via superscalar execution, (3) data- level parallelism via vector instructions, and (4) coarse-grained parallelism using multicore. If just multithreading was given, it was worth one point (rather than two points), as hardware multithreading is unlikely to provide a 4x performance improvement. 2 3. [ 11 Points ] Performance & ISAs (Part 2) . An alternative way of accelerating the computation from the previous question is introducing a new three-input ADD3 instruction to your favorite ISA. This new instruction operates on normal 64-bit registers only and performs the computation A = B + C + D . (a) What impact if any would adding the ADD3 instruction have on the following aspects of a pipelined datapath: Answer: [6 Points] Instruction fetch: None Stall logic: Need to check for additional input register Bypassing: Additional bypassing datapaths for more input registers Register file: Additional read port...
View Full Document

This note was uploaded on 10/19/2011 for the course CS 501 taught by Professor Matin during the Fall '10 term at UPenn.

Page1 / 9

final09-solutions - 1 Prof. Martin Tuesday, Dec. 22, 2009...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online