Midterm Solutions

Midterm Solutions - 1. I Amdahl-ighted with T radeofls (1...

Info iconThis preview shows pages 1–12. Sign up to view the full content.

View Full Document Right Arrow Icon
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 4
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 6
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 8
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 10
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 12
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1. I Amdahl-ighted with T radeofls (1 6 points): For the following design decisions, list a benefit and a drawback of the decision. Be brief, 1-2 sentences per benefit or drawback — but be specific in your answers. The first one is done for you as an example. Example: Design Decision: Choosing to use 32 registers instead of 64 registers in an ISA. Benefit: Fewer bits required to specify a register address and simpler register file implementation Drawback: More register spilling may be required (i.e. more loads and possibly stores) a. Design Decision: Using more complex addressing modes, like memory increment. Benefit: x[pgbchzvfis(A J( O i, C Drawback: “mm Complex \vxsiriclioms W443i" £73 twp/Emmieci‘vm'” it'd/OLA. Cu or LT b. Design Decision: Using the single cycle datapath we covered in. class instead of the multicycle. datapath we covered in class. Benefit: ‘ K I A,“ L CPI \S l, (mire! x$ git/“flu.” D b Ir. . ' l I " raw ac CT will m 9% Ly wet/SJ.— (use laiem/ msiu/ c. Design Decision: Using a register-memory machine instead of a load-store machine. Benefitcfiwd -\e {U IL S‘mta- \v‘s’ws (0m oil/idly «((9% memory ' ‘ Dgifig’fxfiv \m aflogxmfi 0MB vaavxi QCCWS ‘4} \wpi’nww’m‘riom \5 were. c alltmgmb (1. Design Decision: Using RISC instead of using CISC. \ Benefit: , t 1, ejm A, {Zen/“up \\V\57\"¢L3V‘1C~V\9 *9 \Wf’lmwil M MRCMIW/ fill (i) pamudwm gamer Drawback: \ A I r ‘ ‘0" ‘l’ laws ,Q, \ov’fiw msh’UAlOfl 3: 0* PM“ ) o) vr X 2. You Make the Call (12 points): For the following proposed design changes, indicate the impact on each component of execution time (i.e. CPI, cycle time, instruction count). Assume that we would optimize the datapath/control in cases where the change impacts machine organization. For each component, indicate whether the change could increase that component, could decrease that component, or whether the change will not impact the component and it will stay the same. You should only circle one of these alternatives for each component of execution time. Assume the use of the multicycle datapath. a. We make use of a new low resistance material to reduce the latency of wires in our processor. The ISA does not change. CPI: could increase couldidecreas will stay the same Cycle time: could increase could decreash will stay the same Instruction count: could increase could decrease 4.: Will stay the same > b. We use two-address code instead of three-address code in the MIPS ISA. We still run the same programs, but of course m‘ st recom '16 for the new ISA __ , aft. Hpaw’ 5&3! ’0‘: 5) Isak mm 1‘ Lb CPI: u d increase. Cycle time: Instruction count: 7 cu] decre a» will stay the same will stay the same will stay the same c. We extend the MIPS ISA with an instruction that can perform (a*b)+(c*d) which is currently implemented with three instructions: two multiplies and an add. We still run the same programs, but of course must recompile for the new ISA. CPI: ‘ could decrease will stay the same Cycle time: W ' , : will stay the same Instruction count: could increase could decre ‘ will stay the same d. We switch to a new compiler — one that reduces the number of loads and stores in our applications. We do not change the ISA or machine organization. CPI: could increase , ,1 o. - . .2 will stay the same Cycle time: could increase 0 uld decrease Kwill sta ,t \ Instruction count: could increase could decreas b will stay the same 3. Don ’t Blank Out (14 points): The following question assumes the use of the small subset of MIPS we covered in class. We will examine the performance of a processor on a particular application, which has the following breakdown: 15% beq and bne instructions, 20% loads, 10% stores, 5% jumps, and 50% R—types. Our processor has a 2 GHz clock. The application executes 1 billion instructions. For the following questions, fill in all blanks to get full credit. a. First, find the execution time of the application on this processor, assuming it uses the single cycle datapath we covered in class. _ Instruction Count = f a,» "l a _ .. u 51 5 \a’KKlDYOfib CPI: 1. 0 Cycle Time = O. *1 ,5 ET: 0'5; b. Second we will try a new processor, based on the multicycle datapath we covered in class. : ‘rl, lTi, 4" i135 P,\S)¥3 Instruction Count= lg? 1.0 +1.4— *btfll‘ CPI= 41(2 Cycle Time = O ; Sn 5 ET: A?) c. Third, we will try the multicycle datapath again but with a change to the MIPS ISA. We will implement memory indirect addressing using the memory indirect load (mil) instruction, which has the specification R[rt]=M[M[R[rs]+SE[I]]]. It will take 6 cycles to execute. We observe that 25% of all loads in our program write a value to the register file that is used only once — the value is used by a dependent load instruction that is not part of that original 25%. In other words, these loads are used to perform memory indirect addressing — and pairs of loads like this can be replaced by a single mil instruction. All other instructions are not impacted by this change. The clock rate is not impacted by this change. » p a 1 0w InstructionCount= (47;: ass—K6 «v 13x5 a—‘ 20x44? EEKS as as q; CPI= to I + 440 3:50 . :- “-— TE; 4" CycleTime= ngv‘3 e g ‘ ’ Ll ET: I». flip 4. C-Jal Later (20 points): Consider the single-cycle processor implementation. Your task will be to augment this datapath with a new instruction: the conditional jal instruction -— we will call this the cjal instruction. This instruction will be an I-type instruction, and will have the following effect: If (R[rs] != 0) PC=R[rt] ' p R[31]=PC+4‘ ' PC=PC+4 else The immediate field should be ignored for this instruction. Implement cjal on the single cycle datapath. Implement your solution on the following two pages. All other instructions must still work correctly after your modifications. You should not add any new ALUs, register file ports, or ports to memory. Hdmnwcnnwonnmmo Hamnficnn,.. ample Hdmanonwod EmEOHK HamannnwouwflHw Hm mmmm mama anm Hmmwmnmfl N wmmwmanflmwm SHwnm mmnw wmmHmmm Hmmean H Hmmwmnmfl Sfiwnm ownw awnm . SmBOHK Efiwnm mmnw Main Controller Input or Output Signal Name R-format I. _E- _I_ _II _ EE__ E- _- _- _ 1 - _ _E- _l_ ALUOpl In ALUOpo m '0) ' 0 o H HH ALU Controller _ 11- m AND 100100 AND R-type SLT 101010 SLT 0111 ALU Action ALUCtrl 5. This Question is De-lay-Mux One (18 points): Assume for the rest of this problem that all logic gates have the following delays: Delay 1 _ So a 2-input AND gate would have delay T and a 4-input OR gate would have delay 3T. Further assume that mux’s have delay 4T. We are going to slightly modify the design of the full adder from what we assumed in class. We will still use two two-input xor’s to implement the Sum logic: Sum=CinA(A"B) We will create a 32-bit adder out of these full adders. We will use the 4-bit CLA that we covered in class as the basic building block of this design, and we will use it (as we did in class) to make 16-bit hierarchical CLAs (HCLA). But instead of connecting these in series to make a 32-bit adder, we will use carry select to speed up the 32-bit adder. The design will look as follows: Your task is to find the maximal delay of this design — i.e. find the delays of 80-15, C32, and $16.31 — the maximal delay of these three outputs will be the maximal delay of the design. Fill in the values in the table on the following page to receive full credit (and to help with possible partial credit). (I point) (1p0int) (2 points) (I point) (I point) (2 points) (2 points) (2 points) (2 points) Find the maximum delay in terms of T of the 32—bit adder — take the maximum of all output bits — including the sum bits (So-S31) and the final carry out (C32). Show your work clearly in the table above. The two figures below are taken from the class notes, if you need to refer to them. Maximal Delay: (2 points) 4-bit CLA X0 Y0 X1 p G Y1 P p p Y2 X3 Y3 S L, 3:057?(:PLPE'l'bl‘Pl'Ps-rw‘mpypg —' P=PoePr WP} 4"T I CO 16-bit Hierarchical CLA (13:62.! 1, ‘ 4-bit I 1T CLA 6. Multiadd MADness (20 points): You have noticed that your most commonly used application only requires 8 bits of precision for most operations, and that most of these operations are just add instructions. This motivates you to look at compressing your register storage to cleverly save on storage space, packing four 8 bit values into a single 32 bit register. So for example, register $t0 may currently contain the following 32 bits: 01011101111000011001100100011011 But we will interpret these bits as four distinct 8-bit signed values, as such: Note that we still read out all 32-bits from the register file, but that we will want to selectively use certain bits from the 32. To manipulate these compressed registers, you implement a new add instruction — the multiadd (MAD) — which will add two registers that have been compressed with four 8 bit values to produce an output to a register that will also store four 8 bit values. So for example, if register $t0 and $t1 currently contain the following bits: $t0 01011101 11100001 10011001 00011011 $t1 00000000 10000000 00010001 01100001 And then we executed mad $t2, $t1, $t0 we would see the following result in register $t2: $t2 Notice a few things about this — first, this instruction has done four independent add operations — each of which uses 8 bits. Second, the sum of 11100001 and 10000000 overflows the 8 bits of storage that we have. Note that this overflow does not impact the sum of 0101 1 101 and 00000000. You may assume that overflows signals are handled without you worrying about them (we’ll just OR all overflow flags together and let the OS sort it out — but you don’t have to worry about doing that on the datapath or control). Formally,~the MD uses the R-type format (the destination register is indicated by the rd field) and has the following impact: R[rd] = (R[rS]31-24 + R[I't]31-24) I (R[TS]23-16 + R[rt]23-16) i (R[rS]15-8 + R[rt]15-8) I (R[rS]74> + R[rt]74)) I am using the : operator here as a concatenation operator. You may use a different opcode than other R type instructions, and may create a new control path through the F SM for this opcode. HHV T: T 0 ensure that each of the four add operations do not interfere with one another, and to avoid modifying the ALU, you should use the AL U four different times to handle the four adds of the MAD. Implement this instruction on the multicycle datapath on the following two pages. All other instructions must still work correctly after your modifications — only the MAD instruction will use the 8-bit compressed format. You should not add any new ALUs, register file ports, or ports to memory. 10 fimIOa H50" 0_ _I Upumm Hbmflficnn,.: QSBU wmmfimmm _wH wmmm Hmmwmnmfl H FQQHmmm mwmlwp wmmm Hdmnucon mama KmBOHK Hmonpm Hmmwmnmumwnm Kmabwnm wmowmnmnm Hummncnn flpmxo HdmnHfionHo Hmowman Sfiwnm wmmm Hmmeananw , instruction decode! register fetch instruction fetch MemRead ALUSrcA = 0 lorD = O ALUSrcA = 0 Start IRWrite ALUSch = 11 ALUSch = 01 ALUOp = 00 ALUOp = 00 PCWrite PCSource = 0 Memory address Op Jump computation 1W) or( compiefion ALUSrcA = 1 ALUSrcA = 1 ALUSrcA =1 ALUSFCB = 00 PCWrite ALUSch = 10 ALUSch = 00 ALUOP = 01 PCSource = 10 ALUOp = 00 ALUOp: 10 PCWriteCond PCSource = 01 Memory I access Memow access , R-type compaetion RegDst = 1 RegWrite MemtoReg = 0 MemWrite lorD = 1 MemRead lorD = 1 Write-back step RegDst= O RegWrite MemtoReg = 1 ;; 12 ...
View Full Document

This note was uploaded on 04/18/2010 for the course CS 151B taught by Professor N/a during the Spring '10 term at UCLA.

Page1 / 12

Midterm Solutions - 1. I Amdahl-ighted with T radeofls (1...

This preview shows document pages 1 - 12. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online