Unformatted text preview: University of California, Berkeley College of Engineering Computer Science Division  EECS Fall 1997 D.A. Patterson Midterm I October 8, 1997 CS152 Computer Architecture and Engineering You are allowed to use a calculator and one 8.5" x 1" doublesided page of notes. You have 3 hours. Good luck!
Your Name: SID Number: Discussion Section: 1 2 3 4
Total /20 /10 /10 /30 /70 Question 1
You are running a benchmark on your company's processor, which runs at 200 MHz, and has these characteristics: Instruction Type Frequency (%) Cycles Arithmetic and Logical 40 1 Load and Store 30 2 Branches 20 3 Floating Point 10 4 Your company is considering o ering a cheaper, lowerperformance version of the processor. Their plan is to remove some of the oating point hardware to reduce the die size of the chip. The new chip will run at the same clock speed. The wafer on which the chip is produced has a diameter of 10cm, a cost of $1000, and 1=(cm2) defects. The manufacturing process results in a 90% wafer yield and a value of 2 for . The current processor has a die size of 12mm 12mm. After the changes, the die size will be 10mm 10mm, and oating point instructions will take 12 cycles to execute. Here are some equations you may nd useful: dies/wafer =
( wafer diameter=2)2 ; wafer diameter p die area 2 die area
; 1 + defects per unit area die area die yield = wafer yield a) What is the CPI and MIPS rating of the original processor? b) What is the CPI and MIPS rating of the smaller processor? 2 Question 1 (cont)
c) What is the old cost per (working) processor? d) What is the new cost per (working) processor? e) What is the percentage change in price per performance? 3 Question 1 (cont)
Your competitors produce a chip that runs at 250 MHz and has the following characteristics for the benchmark: Instruction Type Frequency (%) Cycles Arithmetic and Logical 40 1 Load and Store 30 3 Branches 20 3 Floating Point 10 5 f) What is the CPI and MIPS rating of your competitor's processor for this benchmark? g) Your company's advertising department wants to defend your company's motto (\The Appearance of Excellence") by advertising a higher MIPS rating for your agship processor (the one with the larger die, with the oating point hardware intact) than your competitor's processor. They want you to write a benchmark that gives this result. Describe an instruction mix that would accomplish this marketing goal (give speci c percentages of each instruction type). 4 Question 1 (cont)
h) Instead, you decide to improve the compiler that is used to compile this benchmark on your processor. Your compiler reduces the branches by 50%, but it increases the number of arithmetic and logical instructions by 25% it does not a ect the number of other instructions. What is your new CPI and MIPS? i) Using the original instruction mix, which machine is faster? How much faster is it? 5 Question 2
The ALU presented in the book supported set less than (slt) using the sign bit of the adder (a < b , a ; b < 0). Let's try the set less than operation using the values ;7ten and 6ten . To make it simpler to follow the example, let's limit the binary representation to 4 bits: 1001two and 0110two. 1001two ; 0110two = 1001two + 1010two = 0011two The result suggests that ;7 > 6, which is clearly wrong. Hence we must factor in over ow in the decision. The schematic given below is the most signi cant bit of an ALU. Fix the set output to handle slt correctly. Explain your modi cations and give the new function for the set output as well. Assume that the over ow signal is correct and can be used and that the gates available to you are: inverters, AND and OR. You can use the following blank page if you need more space.
Bitvert CarryIn Operation A 1 0 1 0 0 1 0 1 0 1 0 11 00 11 00 1
11 00 Result B 11 00 11 00 0 1 1 0 11 00 11 00 11 00 11 00 + 11 00 ADDout 2 Less set 3
11 00 1111111111111 0000000000000 11 00 Overflow Detection 11111 00000 Overflow 6 Question 2 (cont) 7 Question 3
The gure below presents the portion of the schematic of a full adder that calculates the carry out signal.
CarryIn A CarryOut B Assume the following characteristics for the gates: AND2: Input load=100fF, propagation delay lowtohigh TPlh=0.2ns, propagation delay hightolow TPhl=0.5ns, load dependent delay TPlhf=TPhlf=0.002ns/fF . TPhl=0.1ns, load dependent delay TPlhf=TPhlf=0.002ns/fF . OR2: Input load=100fF, propagation delay lowtohigh TPlh=0.5ns, propagation delay hightolow
Identify the critical path in this schematic and fully characterize its delay using the linear delay model. Assume that the last OR2 gate drives a capacitance of 300fF. You can use the following blank page if you need more space. 8 Question 3 (cont) 9 Question 4
In October of 1996, Silicon Graphics introduced a new set of instructions known as MIPS Digital Media Extensions (MDMX). Similar to the Intel MMX, the MDMX speci cation uses a single instruction multiple data (SIMD) data path to perform parallel narrow data operations on bytes and halfwords within a single instruction. The MDMX has yet to be implemented on a commercially available microprocessor. We will explore some MDMX ideas by extending the single cycle datapath discussed in class. Where the real MDMX uses 64bit oating point registers, we will use the 32bit integer registers to perform parallel operations on two half words (\Dual Halfs" instructions). Consider two pseudoMDMX instructions (based on the real MDMX!), ADD.DH and MAX.DH. ADD.DH adds two 16bit signed integers in parallel. MAX.DH is more unusual. It performs two simultaneous comparisons and stores the larger results in a third register. The register transfer operations are given below. R x] 0] refers to the half word in bits 15:0 of register x, and R x] 1] refers to bits 31:16. INSTRUCTION rd, rs, rt ADD.DH $r1, $r2, $r3 R rd] 0](R rs] 0]+R rt] 0] R rd] 1](R rs] 1]+R rt] 1] PC(PC+4 MAX.DH $r1, $r2, $r3 for i=0,1 begin if (R rs] i] < R rt] i]) then R rd] i](R rt] i] else R rd] i](R rs] i] end PC(PC+4 10 Question 4 (cont)
a) The single cycle processor developed in class is shown below. Make the necessary datapath modi cations for the two MDMX instructions. You may use a 16bit version of the 32bit ALU. If you de ne your own component, be sure to specify its behavior. Label your control signals with descriptive names. To maximize your chances for partial credit, write down anything else that will help us evaluate your work (you do not need to specify the control functions until part b). You will be graded for correctness more than e ciency. The meaning of some control signals are given below. nPCSel : 0 ) PC PC + 4 1 ) PC PC + 4 + SignExt(Imm16) k 00 ALUCtr : \add00 \sub00 \and00 \or00 \slt00(signed comparison) ExtOp : \zero00 \sign00 Use the diagram below for scratch and neatly draw your nal answer on the next page.
Instruction<31:0> Instruction Fetch Unit nPC sel Imm16 <0:15> Rd <11:15> Rs <16:20> Rt <21:25> CLK Rd Rt RegDst Rs RegWr 5 Ra 32 5 Rb Rt 1 MUX 0 5 RW busA 32 ALUctr Zero 32bit ALU Register File
CLK busB 32 MemWr
0 MemtoReg busW MUX
Data In 32 32 Data Memory ALUSrc CLK ExtOp 32 WrEn Adr 0 Extender MUX 1 Imm16 16 1 32 11 Question 4 (cont) Instruction<31:0> Instruction Fetch Unit nPC sel Imm16 <0:15> Rd <11:15> Rs <16:20> Rt <21:25> CLK Rd Rt RegDst Rs RegWr 5 Ra 32 5 Rb Rt 1 MUX 0 5 RW busA 32 ALUctr Zero 32bit ALU Register File
CLK busB 32 MemWr
0 MemtoReg busW MUX
Data In 32 32 Data Memory ALUSrc CLK ExtOp 32 WrEn Adr 0 Extender MUX 1 Imm16 16 1 32 12 Question 4 (cont)
b) For ADD.DH and MAX.DH, give the values of all control signals, including those you added in part (a). The control signals can be functions of other control signals, values labeled on the datapath, or don't cares. Use b i] to denote bit i on bus b. You may use high level speci cations such as ifthenelse. The number of grids below is not an indication of the number of control signals you will need. Control Line nPCSel RegDst RegWr ExtOp ALUSrc ALUCtr MemWr MemtoReg ADD.DH MAX.DH 13 Question 4 (cont) Extra Credit. 5 points] 16 16bit signed integers are stored in memory as follows:
0x00000000 (halfword 0) 0x00000002 (halfword 1) 0x00000004 (halfword 2) . . 0x0000001E (halfword 15) The following MIPS code (assuming no delay slots) nds the largest integer and stores the result in lower 16 bits of $v0. Since we are dealing with half words, the nal value of the upper 16 bits of $v0 is irrelevant.
lh $v0, addi $t0, LARGEST: lh $s0, slt $t1, beq $t1, addi $v0, NEXT: addi $t0, slt $t1, beq $t1, END: 0x001E($zero) $zero, 0x001C 0($t0) $v0, $s0 $zero, NEXT $zero, $s0 $t0, 2 $t0, $zero $zero, LARGEST #assume the last number is the largest #initialize the halfword pointer #load next half word #update $v0 if $s0 is larger #continue the search until pointer becomes negative Take advantage of the parallelism in MAX.DH to write a faster version of this code. Exactly how many instructions are executed in your code? What are the minimum and maximum number of instructions executed in the nonMDMX code given above? 14 Question 4 (cont) 15 ...
View
Full Document
 Spring '04
 Kubiatowicz
 Computer Architecture, Trigraph, MIPS architecture, SIMD, MDMX, Instruction Type Frequency, oating point hardware

Click to edit the document details