*This preview shows
page 1. Sign up to
view the full content.*

**Unformatted text preview: **1 University of California at Berkeley College of Engineering Computer Science Division { EECS
CS 152 Fall 1995 D. Patterson & R. Yung Computer Architecture and Engineering Midterm I
Your Name: SID Number: Discussion Section:
You may bring two pages of notes. You have 180 minutes. Please write your name on this cover sheet and also at the top left of each page. The point value of each question is indicated in brackets after it. Show your work. Write neatly and be well organized. Good Luck! Problem Score 1 2 3 4 Total Your Name: 2 Question #1: Technology and performance 20 pts]
The cost/performance of two CPUs is to be examined, each running the same instruction set. The rst option is a Gallium Arsenide (GaAs) CPU. A 10 cm (about 4") diameter GaAs wafer costs $2000. The manufacturing process creates 4 defects per square cm. The CPU fabricated in this techonology is expected to have a clock rate of 1000 MHz, with an average clock cycles per instruction of 2.5 if we assume an in nitely fast memory system. The size of the GaAs CPU is 1.0 cm by 1.0 cm. The second option is a CMOS CPU. A 20 cm (about 8") diameter CMOS wafer costs $1000 and has 1 defect per square cm. The 1.0 cm by 2.0 cm CPU executes multiple instructions per clock cycle to achieve an average clock cycles per instruction of 0.75, assuming an in nitely fast memory, while achieving a clock rate of 200 MHz. (The CPU is larger because it has on-chip caches and executes multiple instructions per clock cycle.) Assume is 2.0 for both GaAs and CMOS. Yield for GaAs and CMOS wafers are 0.8 and 0.9 respectively. Most of this information is summarized in the following table: Wafer Diam. Wafer Yield Cost Defects Freq. CPI Die Area Test Dies (cm) ($) (1/cm2) (MHz) (cm cm) (per wafer) GaAs 10 0.80 $2000 4 1000 2.5 1.0 1.0 4 CMOS 20 0.90 $1000 1 200 0.75 1.0 2.0 4 Hint: Here are two equations that may help: 2 wafer diameter dies/wafer = (wafer diameter=2) ; p2 die area ; test dies per wafer die area die yield = wafer yield
; 1 + defects per unit area die area a) Calculate the average execution time for each instruction with an in nitely fast memory. Which is faster and by what factor? Show your work. 6 pts] b) How many seconds will each CPU take to execute a 1 billion instruction program? 3 pts] Your Name: 3 c) What is the cost of an GaAs die for this CPU? Repeat the calculation for a CMOS die. Show your work. 7 pts] d) What is the ratio of the cost of the GaAs die to the cost of the CMOS die? 1 pt] e) Based on the costs and performance ratios of the CPU calculated above, what is the ratio of cost/performance of the CMOS CPU to the GaAs CPU? 3 pts] Your Name: 4 Question #2: Instruction Set Architecture and Registers Sets 15 pts]
Imagine a modi ed MIPS instruction set with a register le consisting of 64 general-purpose registers rather than the usual 32. Assume that we still want to use a uniform instruction length of four bytes and that the total number of opcodes must remain unchanged. Also assume that you can expand and contract elds in an instruction, but that you cannot omit them. a) How would the format of R-type (arithmetic and logical) instructions change? Label all the elds with their name and bit length. What is the consequence of this change? 4 pts] b) How does this change the I-type instructions? What is the consequence of this change? 3 pts] c) How does this change the J-type instructions? What is the consequence of this change? 3 pts] d) Imagine we are translating machine code to use the larger register set. Give an example of an instruction that used to t into the old format, but is impossible to translate directly into a single instruction in the new format. Write a short sequence of instructions that could replace it. 5 pts] Your Name: 5 Question #3: Left Shift vs. Multiply 20 pts]
Design a 16-bit left shifter that shifts 0 to 15 bits using only 4:1 multiplexors. a) How many levels of 4:1 multiplexors are needed? Show your work. 4 pts] b) If the delay per multiplexor is 2ns, what is the speed of this shifter? (Assume zero delay for the wires.) 3 pts] c) Draw the four leftmost bits of the multiplexors with the proper connections. (You might want to practice drawing it on the back of a page, and then transfer the nal version here.) 6 pts] Your Name: 6 d) Multiplies can be accomplished by left shifts if the multiplier is a power of two. Write the MIPS instruction(s) that performs multiply via left shifts for a multiplier that is positive and a power of two. Assume the multiplicand is in register $4, the multiplier is in register $5, and that the least signi cant 32 bits of the product should be left in $6. You can ignore the potential for over ow in the result register. 7 pts] extra credit) pts] In class, we've seen three versions of the multiply operation (ignoring Booth encoding). Approximately how many clock cycles does a multiply take using the fastest algorithm? Approximately how much faster are multiplies that use the shifting technique (assuming the multiplier is a power of two)? +4 Your Name: 7 Question #4: Enhancing the Single Cycle Datapath 30 pts]
Recall the 32-bit single-cycle control and datapath from class. a) A MIPS instruction that would be useful to have for writing loops is Decrement and Branch if Not Zero (DBNZ). For example, the C loop:
for (i=n i!=0 i--) { <loop body> } could be translated using DBNZ to:
ld beq loop: : <loop : dbnz skiploop: : : R2, addrN R2, skiploop body> R2, loop # if --i !=0 goto loop # i=n # skip loop if n=0 MIPS currently does not have such an instruction (although other processors, such as the x86, do). What changes and additions would be needed to the single cycle data path to support DBNZ? Modify the datapath below to show that support. You do not need to write the control signals. 15 pts] Your Name: 8 b) The basic datapath supports only 32-bit loads. Imagine we wanted to augment the instruction set with new I-type load instructions that do the following:
LLUH
31 16 15 0 31 LULH
16 15 0 Data Memory 0 0 Rd The Load From Lower To Upper Halfword (LLUH) instruction takes the least signi cant 16 bits from a 32-bit word in the data memory and places them in the most signi cant 16 bits of the indicated result register the least signi cant bits are all zeroed. The Load From Upper To Lower Halfword (LULH) instruction takes the most signi cant 16 bits from a 32-bit data memory word and places them in the least signi cant 16 bits of the result register the most signi cant bits are all zeroed. In both cases, the address is generated in the same way as in a normal LW instruction. Thus, LLUH: Rd LULH: Rd
M o set + base]15::0 jj 016 016 jj M o set + base]31::16 What datapath changes must be made to support this style of 16-bit loads? Modify the datapath reproduced below to show that support. 15 pts] ...

View Full
Document