ee457_Final_su2005_sol

ee457_Final_su2005_sol - Summer 2005 EE457 Instructor:...

Info iconThis preview shows pages 1–13. Sign up to view the full content.

View Full Document Right Arrow Icon
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 4
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 6
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 8
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 10
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 12
Background image of page 13
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Summer 2005 EE457 Instructor: Gandhi Puvvada Final Exam (35%) Date: 8/1/2005, Monday Closed Book, Closed Notes; Calculators allowed Time: 9:45 AM- 12:15PM GFSll6 N : Ttl ‘tznq 1 ( 20 points) 15 min. Constant Adder to add SIX: You need to add six (610 = 0001102) to a 6-bit number X (X5X4X3X2X1XO) to produce an 7—bit result Z (26251123222120). 20 is already produced. Produce Z1 and Z2 using simple heuristic techniques. Also produce C3 (carry 3) to go into the 3- bit incrementer on the left. Recall the incrementer design in your HW#8 and simplify the 3-bit incrementer on the left. Delete either p’s (p5p4p3) or g’s (g5g4g3) and write equations to the other (g’s or p’s) and then write equations for CS (C 6,C 5,C 4). Find the individual delay in gate delays for producing each of the 7 bits of Z (26252423222120). To produce the sum bit "3" from two single bits "a" and "b", we use an XOR gate: 3 = a XOR b; Do you use (a XOR b) or (a XN OR b) operation to produce Al: the sum bit "s" from summing "a" and "b" anda"l"? (a XOR b) / (3 XNOR ) \e ' Please count the delay of an XOR gate or an XN OR gate as two—gate delays. / \ X5 X4 X3 X2 X 1 X0 . l l GND I GND 1 GND a2 rj al J a J a b a b. L b cm— 0111 cm i p g s pg 5 p g s P5 52 p4 81 C4 3 3 C3 25 C5 Z4 23 C6 p24g c2 p1“ cl 11% c0! C3 26— c3 CLL Zoo 22 z 1 20 ' t8 let/m , @ You deleted 3 A (p’s / g’s). B Equations for the remaining p’s / g’s and also the C’s: i t Delay F3 3 XS C4 2 £33 C3 : .Xsca Z in gates C 20 Agates F4=XH C5: ‘3‘! b3 C3 = xq x3 3 21 antes ¥ C x X C 22 j: gates {;5=)(5 C6: F3 ‘2 a 3 —)<5 1,, 3 3 Z3 agates , Z4 agates . Z5 3 gates K Z6 2 gates ee457_Final_su2005.fm 7/30/05 EE457 Final Exam - Summer 2005 1 / 13 © Copyright 2005 Gandhi Puvvada 92.1 .23 K ( s+z+8 2 i5 points) :0 min. Non-linear pipeline: The ICV (Initial Collision Vector) for a non-linear pipeline is C4C3C2C1 : 0“ m SkJJ -- Complete the incomplete state diagram below, toll -— find the greedy simple cycles Q 3% -- and find MAL (Minimum Achievable Latency). V3 1 2 3 4 Greedy Simple Cycle: ! Square 1 x l x (i) ( \, 5) (ii) ( 3 ) lSubtract3 1 l x Average latencies: (i) %§3(ii) .3;- : 3 Ede by7 i i ix i YY—fi—j MAL=3 Complete the reservation table on the right to evaluate the function Y2. ate 1 at rasing to t e power a 2 by squaring twice. Similarly, subtracting 6 ca Yl = X — 3 i ' be done by subtracting 3 twice! 7 The DPU (datapath unit) on the side was designed to support the evaluation of the functionYl. Yr: [(X2 — 3) /7] Design a new DPU below (by adding muxes, etc.) such that the new DPU supports the evaluation of both k ee457_Final-su2005.fm 7/30/05 Y1 function and Y2 function also as defined in question 2.2 above. EE457 Final Exam - Summer 2005 2 / 13 © Copyright 2005 Gandhi Puvvada it; 3.1 .lrf 3.2 $438.3 3.4 39?: 3.5 2% 3.6 2x46 3.11 2:5 3 ee457flFinal_su2005 .fm 7/3 0/05 ( 23 points) 10 min. Parallel processors: There is no M13 D (SlSD/SIMD/MISD/MIMD) system. Locked-step execution characterizes the behavior of a S i M D (SlSD/SIMD/MISD/MIMD) system. The abbreviation RMW in RMW-race stands for READ MOD! F V W Q '76 One way to solve the problem of RMW-race is to keep the shared variables inQLGBALUocal/global) memory and LOG. KiN g (locking/not locking) access to the memory until all three parts of RMW are done. Such operation is called 3(an gtgm‘gg (atomic/molecular) operation. The operating system may declare some areas of the 810 balflocal/global) memory as non-cacheable so that it can force all processors to access $3,: 3 g d (shared/local) variables from that area. Snoopy controller in a write—through cache-coherence system helps in snooping (watching) for Q; g" Cg (read/write/both read and write) transactions from the W5 . (other processors/same processor) . More writes appear on the bus in the case of a cache coherency system. Ea.th (Each block/The entire cache) maintains a two-state status information (INV and VAL) in the case of a Mia ~ (km (write—through/write-back) cache coherency system where as it maintains a three -state status information (INV, R0, and RW) in the case of a writ; - (mite-through/write~back) cache coherency system. If two processors attempt to write to the same block at the same time, the bus-arbitration system should allow Wonly one of the two/none of the two/both) processors to go on to the bus. In a write-back system, you see three W(i) in the diagram on the side. Loci CI.» (iron-96‘ (write—through/writesback) Circle the W(i) other processors by going on to the bus and announcing on the bus. ’ vaCk (jets.than w“! flow Cb mt WT}: (“berm 0'01an W your write/i. The abbreviation MESI (in MESI cache coherence protocol) stands for Mgdi Bacchus; v2. 3 boxes E13457 Final Exam - Summer 2005 3 / 13 © Copyright 2005 Gandhi Puwada 4 ( )q H5 :3q points) 215 min. Cache: . 4.1 A complete cache datapath design (similar to homework #6 cache questions) is given on the next page. Assume that we are NOT implementing any virtual memory here. Please read/analyze the design and arrive at the specs for this design. Please show brief calculations. ___ g V began“ {Rue (Ive 8 85,3 (fi-z-BEOJ < 1;; @fProcessor:&(32/64)—bit data, (Byte/ 16-bit word/32-bit word)-addressable processor (if , Processor address: 2.3 -bit (logical) 4.... bgqm B’E‘ ..§E° W 5,; musiclhrea 0/) Processor address space: 5 l ‘2. Mega Bytes ‘m Linn. Az-Ao ‘ So fatal Mimi " ” ’> 8 ~ A; ——A a» 2‘2 5in g“) (M a/ Wm of 4 QB (m ‘8 8 o z 2 ® Main memory organization: 2-3033 (2-way/4-way) interleaved to facilitate g efficient cache block transfer . / K 19¢ng 01] mkfipfi TAG RAMrS @ Cache mapping: Sd'AgageigfivflFully—Associative/Set-Associative/Direct) mapping @ Total cache size: 32. KBytes = 2 4:“ 8X {K2 8K8 in BLOCKQI 230? WP ® Block size: 2. 64—bit words = l 6 bytes a—xj more Mmla =§> 4 X @Kg 3 alkjk 6% Total number of block frames in cache = 2-0 Q? = 2" 1n&:,x in‘l’é Clack bola 32am All 3 Set size: 3 blocks 63‘ 59-th1 (KR-1‘2 (i To? W3 41%“ tram ‘MA‘UC 'mTE Team AlfAH WWWWWW Jznjibzt. l'zlumrciz/bloek @Jumber of sets 5'9- : 2 . Fill—in the table below. TAG FIELD SET FIELD WORD BYTE FIELD FIELD Jé_—bit wide _3_—bit wide A2~AO A13 - AJ} A33- ~ Ail A3 (mfg—0) Calculations (optional but recommended): TM 2—“ &£o&%vm8 (ogre pad? in 215(1); L22“ 4* Bbckapwfl Nous 3.“ didcmt WinaJ 7K; 2.” blcclggmmgg 051171 btfaintacl wilt; 9.."coior4. HUACQ W; 13ion tixlci 52 (l bit/S. THANK «61 Zuxqgtfi) Tb. TAG FEM mm 139. )q bibs. 4.2 aModify the cache design on the next page to follow direct mapping in place of the current @/ mapping. All other specs remain the same. How many TAG RAMs in a direct mapping? ‘ (1/2/4). Total number of block frames will decide the block field. The block field is i i bits, namely (AIL; ® "A4)- p15 The remaining upper part is TAG field. Hence the TAG field is I 3 bits, namely (A2 8 - A15 ). . SizeoftheTAGRAMis ZKX :5 ' ' . 7/ 0/05 EE457 F‘ 1 « ee457_Fina]_su2005 fm 3 ma Exam Summer 2005 4/ 13 © Copyright 2005 Gandhi Puwada _ Amo xoo a .2 £5 2.9 Em $3 5205 (II! 522 m2. i< (G \ 5-8.. . VDIBVD mVD ImmD meImwo I I I I I I I I I I I I I I I 8380...". $505 .925 .mcozom.._u_n an we ~ “PORNO “ND..va fimQImmD :0:th «ch—Inna oomflumwo _ 9505 5:3 5:26.965 .5 E g «AH—IND «anmro 00350 n I In PO A. DINND «ND: vma N Manama “OVDINVO fivolmmo «mmolwmn 92mm mENm mENm msfim mENm 92% 92% 92% msfim E msfim mENm msam msfim mENm msfim I. I. I. I. I. I. I. II I. I. I. I. I. I. I. I: 8 un s : um $ V<-mN< €321.20.“ m_ comam wwwfiuw 923:0“. 9.536005. 20:55— Ems. EE457 ' 1E — Fma xam Summerzoos 5/13 ©Copyright 2005 Gandhi Puvvada ee457 Final su2005.fm 7/30/05 5 ( 1% points) 20 min. Virtual memory: . Processor and system: 29-bit virtual address, 26-bit physical address, 32-bit data byte addressable * _ “from TLB: 64-entry 4-way set associative 16—set TLB ==> 4i ‘" bit SET Rafi {Ajgu CPage size: 4 KByte; Page Table: 2-level organization, with page directory of 16K (2154) locations. >Li Kg :- 917‘ $2.45»? peg: 065w: 5.1 Divide the 29-bit virtual address, VAzg-VAO into VPN and page-offset first. :1 ms rants . _ __ [21:22:] VPN ~ VA2 8 — VAlzPage—offset —- VAu - VAC VA2 8 VA; VA“ VAO For the page table purpose, divide the VPN into page—directory and 2nd~1evel page table indices. Page—directory index = VA28 - VAis; 2nd-level page table index = VA - VA‘,L m 51‘?) 5 Mia “" [:3 VA28 VA is VAN VA‘ 1 For the TLB purpose, divide the VPN into TAG and SET fields. TAG = VA28 — VAge; SET = VAisf VA”L $3 at... a 19105 ' @ VA28 VA“; VA“; VA”. 5.2 Divide the 26—bit physical address, PA2 5-PAO into PPFN and page-offset. V," — _. PPFN — PA25 - PA‘, Page—offset — PAH- PAO 1mm PA; 2film“ mm 5.3 Based on the above how many comparator of what size are needed to "search" TLB? Li COMPQ‘QE“ 2‘30!" Ol “Halts <13 kits 0}, TAG + i vaiicl bit) On a TLB miss, to make space for a new entry, an existing TLB entry may have to be replaced "bi L $0 using LRU (Least Recently Used) or some such algorithm. This happens 3:, (a) definitely if the TLB is FULL / FLASE {3; (b) even when TLB is not full, if the specific SET is FULL. / FLASE ‘ When we remove the old TLB entry, we ‘2» d6 no? “sud (Tc LEW-six (1&3ng (need to flush additional items, namely ............ .. / do not need to flush anything). LRU (Least Recently Used) or some such algorithm. This happens (‘5 g (a) definitely if the main memory is FULL / FLASE is, (b) even When main memory is not full, if the specific SET is FULL. TRUE / When we remove the old PAGE, we head “(in W addffiggal {tang ) “am IL: 05506on 9}; TLB “RM Om: ,, ij CAC HE biockz {cwa 1113b: TAG); inTAG RAMJosltzoh 5 gig M page (need to flush additional items, namely ............ .. / do not need to flush anything). [6&5 , 5&5 The two virtual addresses VAN—VAO = 02445688H and VAQg—VAO = 1246 482H map to 5.6 . @ (.11 293% 2, M (the same set / different sets) in TLB because 3&1 55C {max killcl \lAwwvAtz : SH 'va belt: cousins ee457_Final_su2005.fm 7/30/05 EE457 Final Exam ~ Summer 2005 6 / 13 V On a page fault, to make space for a new page, an existing PAGE may have to be replaced using ‘ 39> 5U © Copyright 2005 Gandhi Puvvada ( l \ points) <5 min. Page tables built on~demand by the OS (operating system): Reproduced below is your HW problem with a different table of sorted VPNs (8 VPNS). 5 ET ee457_Final_su2005.fm 7/30/05 4KByte pages (12bit page offset) 32-bit Virtual Address (VA31-VAO) to 32-bit Physical Address (PAat-PAO) translation: 20-bit VPN (Virtual Page Number) VA31-VA12 to 20—blt PPFN (Physical Page Frame Number) PA31 -PA12 translation through S-level page table. 20bit VPN - 5-dlgtt hex VPN, say PORST. Assume that the entry width In these tables is always 32-bite. The top 8-bit: VA31-VA24 (hex digits Po ) are used to lndex the A—Teble. The next 8-bit: VA23-VA18 (hex digits as ) are used to index the B—Tabte. The test 4-bit: VA15~VA12 ( (hex digit T ) is used to Index the C-Table. Questions: 1. There Is (ego) 01“? OM (only one/one or more) top-level table in a mum-level page table system, 2. There‘te (3:3) Wont)! one of each/one or more of each) B-Ievel and C-level tabtem In a mum-level page tab! system. 3. If the operating system has brought the totlowing 8 virtual pages, whose 5-digit hex addresses are shown below, how many tables of what kind (NB/C type) and what size (256 words - t K-Byte) are created by the 08? L @ A-Teble: O‘Al. 016mg 1 Q 83am“: ca)” 256 med; 0 g‘ngfle of page: brought 3ft; c-Tabre: (6%@——————~‘ inserted order RS Q T “*U u.) EEEEEEEEI EEEEEEEHI flflflafifll’lfil VASi-VAZ‘ (PO) vxaspvme (as) VAis-VA‘IZ (T) WNNF‘ WOWt—‘O‘x ##w 000) EE457 Final Exam - Summer 2005 7 / 13 . © Copyrlght 2005 Gandhi Puvvada 7 i ( 44 O - points) 30 min. Pipelining: Modified lab #7 Part I: . The current lab #7 Part I block diagram has been reproduced on the next page for reference. Modifications to part I: Remove the overflow checking requirement. An instruction does not turn itself into a NOP based on overflow anymore. Add a "GAP" stage between EXl and EX2. The gap stage does not process the data. The GAP stage is needed because of the VLSI layout which resulted in placing IF,RF, and EXl stages at one corner of the chip and EX2 and WB stages at a far corner of the chip. «@On page 10, MISS Bruin has Simply inserted the GAP stage, connected up the Signals, and 3"? removed the Carryout (overflow) signals. She also added three more comparators to the a omparator station. Answer the questions below before completing the design on page 10 and/or A3119? 672?, page 11. Page 11 contains a portion of page 10 with a lot of space to draw logic. Vs» 0‘ , 5 k) In the original completed design there (O'Cva (was/were/was no/were no) bubble injecting Q logic/mux(es) in EXl stage Coed (and/or) EX2 stage. In the modified design, since overflow~ I checking requirements are removed, cg”; b; Ag buM lag gig/mum’s in 4.107s“ 23L m: Ex; In the GAP stage, we fl (need/do not need) a Z_gap_Mux. Forwarding help can be offered solely from WB stage in boil; .021 QgIGLNALGMB @MOblFlgb //»\ (in the original design only/in the modified design only/ in both the original and the modified designs). MW . 15>? :There are me (a. (less/more) cases of dependency in the modified design resultin in M g stalling. In the modified design we need to stall an instruction in the RF stage, @ [if it i/afl't’ (is/isn’t) a NOP and {if it depends for its X or Y on a non-N ORNOP/non—NOP) instruction in Q X I g: g A E (write EXl or GAP or EX2 or WB or a combination of them) } ]. @ Dependency for Z register does not cause stalling in be“; adj)" (in the original design only/in the modified design only/in both designs). / Blank area \ K / ee457_Final_su2005.fm 7/30/05 EE457 Final Exam ~ Summer 2005 8/ 13 © Copyright 2005 Gandhi Puwada 4,2sz ézfixm ézfixm an? 8552 «an Eng; owfim mm 8 5.53m @800 2E .qu owmaw "a E coufim 9:00 <N :31. «Q in 5.23 b m/Vfi Swain x35 :34 H5523“ 853E zgumzm hm 2m bi. 35.5mm.— .Sm Um © Copyright 2005 Gandhi Puvvada EE457 Final Exam — Summer 2005 9 / 13 66457 Final su2005‘fm 7/30/05 Aw Q! 2m m, «:5 he” 0A7 azmwx IIIII w Kit 23d NEG .I\ Z “7‘ 2 D O! u, M 02> \ Q. < £- «3' : >< E4 ‘ - E 57 Fmal Exam Summer 2005 10/ 13 © Copyright 2005 Gandhipuwada ce457 Final su2005.fm 7/30/05 32 u) E 33 V” a ‘ & a! no 33 Z" ‘22 a 2* I Z 3 Q d) d g . , , I (-L ‘9 flnllllllm . . 3 Z a 12"?103 - ’ AG. is ou‘""““"~' . A < "I mg; g (630:! g u 5 x YEN" , 3&3 . . an—n"~‘—‘-....—.uo/ . Isl-III an. r“ “g— 7? 02. i" x , @5302 ‘2‘) 3 Ad {20 LL- \ g (‘3 x ‘LL 66m M" z 2:2 ><>(U 1 A \ 04 c4 >< if, E 6 Z Li 1 E g E E 9 7~ O N E! L! Q! Q! 2 >< ‘ x \ x \ Hz“ ‘ J m 32 C9 ")2 (fl ———{::B::1 g 3 U) 2 m3 66457_F1na1__su2005.fm 7/30/05 E13457 Final Exam — Summer 2005 11 / 13 © Copyright 2005 Gandhi Puwada s ( {O Multi-cycle CPU design modification: points) (0 min. The 2nd Edition CU (state diagram) and the DPU are given on thilsland the next page. DPU modification: The VLSI engineer just informed you (the architect!) that, for Whatever reason, they had to use a memory built from static logic (from the [st edition of the text book) similar tdfihemory used in your lab #4. They also decided to remove the MDR (Memory Data Register). Please make any changes needed to either the DPU or the CU (state diagram) or to both. Do “of use“)! about QBBK’M Mud: Emu. {pr {’15 may Instruction decode/ register fetch lnstructlon fetch MemRead ALUSrcA = 0 mm s 0 lRerte ALUSch = 01 ALUOp =- 00 Write PCSource = 00 2nd Ed. State Diagram ALUSrcA = o ALUSch = _11 Memory address computation complatlon ALUSrcA = 1 ALUSch = 00 ALUOp = 01 PCWriteCond PCSouroe = 01 ALUSrcA = 1 ALUst II 10 ALUOp = 00 ALUSch = 00 ALUOp= 10 PCSource :2 10 @916 R—type completion RegDst = 1 Regwnte' Memtofieg = O RegWrite MamtoReg=1 9“ (to (jigsaw. «a More) If; mmry ma?) Cool:an t5 haul obuima 212,1: Hum. Man {g aafivafié a“ gang. Bui- bor wry C6 (1103 AJAX}.ch ‘ibtxfi Li ) Valict dam/vs 8W3 LL Armani“) in min; MAS 03 ATaES 'mTéTlB Al,an (gt/5Ter “11 ALL). Amer 31%.»; <33 M 20.“ {Rapid Eflmg' ee457_Final_su2005.fm 7/30/05 EE457 Final Exam — Summer 2005 12 / 13 © Copyright 2005 Gandhi Puvvada ~52 a: Ea bases 2.93m :23 PE .5 3N , fim 3 fibmmuayiaggaéaubgkfi $53 Jig VSEO am .6; m5; mast! g, «0 ax 438 E ;m .w Qarm J 334 E 5 a. gages mé Es? EB< we a»; newts Ea figuréu anciusonjxx: 96 mmd E m). $3.3» gs‘xépnm Egom ENE dwaUgéfio EV 8:3 Urr<hm swap—h”: Elma SEng _ M9103 cogumE Swag 5.62:2, EE457 Fmal Exam - Summer 2005 13 / 13 © Copyright 2005 Gandhi Puwada 66457 Final su2005.fm 7/30/05 ...
View Full Document

This note was uploaded on 11/02/2008 for the course EE 457 taught by Professor Puvvada during the Fall '08 term at USC.

Page1 / 13

ee457_Final_su2005_sol - Summer 2005 EE457 Instructor:...

This preview shows document pages 1 - 13. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online