spring99_midterm_soln

spring99_midterm_soln - Name Page 1 of 9 CS/ECE 552...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Name:_____________________________________ Page 1 of 9 CS/ECE 552 Introduction to Computer Architecture Midterm Exam Tuesday, March 16, 1999 7:15-9:15 p.m Name:________________________SOLUTION ___________________________________________ Limit your answers to the space provided. Unnecessarily long answers will be penalized. If you use more space than is provided, you are probably doing something wrong. Use the back of each page for any scratch work. Write your last name on each page. Problem 1. Problem 2. Problem 3. Problem 4. Problem 5. Problem 6. Total ______________ (out of 24 points) ______________ (out of 14 points) ______________ (out of 8 points) ______________ (out of 12 points) ______________ (out of 18 points) ______________ (out of 24 points) ______________ (out of 100 points) ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¡ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¡ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¡ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¡ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¡£ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¡ £ (a) Name:_____________________________________ (1) Write the logic equations for the following signals, in the space provided. (The equations should be in terms of the inputs to the logic block for which they represent the outputs.) (10 points) Let Xi , Yi , Ci and Si , with i = 0, 1, ..., 63, represent the individual bits of the two operands, the carries, and the sums, respectively. Let G I and P I , i = 0,....,63 represent the first level generates and propagates. i i Let G II and P II , j = 0,...,15 represent the second level generates and propagates, and let G III and P III , k = 0,..,3 j j k k represent the third level generates and propagates. C 32 = £ C8 = P III = 1 G III = 2 GI = 3 The figure below shows a 64-bit carry lookahead adder with three levels of lookahead. Some signals are shown in the figure to give you an idea of the structure of the adder; however not all relevant signals are shown (for example the carries from the first level lookahead to the single-bit adders). Adder Design (24 points) 16-bit CLA G III + G III .P III + P III .P III .C 0 1 0 1 0 1 G II + G II .P II + P II .P II .C 0 1 0 1 0 1 P II .P II .P II .P II 7 6 5 4 X 3 .Y 3 16-bit CLA £ £ G II + G II .P II + G II .P II .P II + G II .P II .P II .P II 11 10 11 9 11 10 8 11 10 9 16-bit CLA Third Level Lookahead Second Level Lookahead G III 0 Single Bit Adders P III 0 Page 2 of 9 G II P II 0 0 First level Lookahead GI PI 0 0 C0 Name:_____________________________________ Page 3 of 9 (b) Assume that each gate delay is τ time units, and all gates are available with up to 5 inputs. Further assume that all Xi , all Yi , and C 0 are ready at time 0. In the table below, show at what time the chosen signal value is ready. Use the comment column for any comments (comments will be used to determine partial credit in case of an incorrect answer). (6 points) Signal S2 S8 Time Ready 5τ 7τ 7τ Comments (c) (i) (ii) £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ £ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ £¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ ¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ £¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢ ¢ ¢ ¢ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ C 32 τ for G I , P I ; 2τ for G II , P II ; 2τ for G III , P III , and 2τ for C 32 i i i i i i Calculate the number of gates of each type in the lookahead logic (include all levels and do not count gates in the 64 single-bit adders). Show your work. (8 points) There are a total of 16+4+1 = 21 CLA units. Each CLA unit generates a G, a P, and 3 carries. The equations for this logic are: G = g3 + g2.p3 + g1.p3.p2 + g0.p3.p2.p1 (3 AND and 1 OR gate) P = p3.p2.p1.p0 (1 AND gate) c3 = g2 + g1.p2 + g0.p1.p2 + p0.p1.p2.c0 (3 AND gates and 1 OR gate) c2 = g1 + g0.p1 + p0.p1.c0 (2 AND and 1 OR gate) c1 = g0 + p0.c0 (1 AND and 1 OR gate) So each unit has 10 AND gates and 4 OR gates (total of 14 gates). So the overall lookahead logic has 10×21 = 210 AND gates and 4×21 = 84 OR gates. (2) Short Questions (14 points) What are delayed branches? (3 points) A branch instruction for which the instruction following the branch is unconditionally executed regardless of the outcome of the branch. What problem do delayed branches solve, and how? (3 points) Delayed branches solve the problem of branch hazards. By unconditionally executing an instruction following a branch, the pipeline can be kept full, i.e., without any bubbles, until the outcome of the branch is known. £ £ £ £ £ £ £ £ £ £ £ τ for GI , i PI ; i 2τ for C 2 , and 2τ for S 2 τ for G I , P I ; 2τ for G II , P II ; 2τ for C 8 , and 2τ for S 8 i i i i Name:_____________________________________ Page 4 of 9 (iii) The multicycle datapath that we studied had one ALU, which was used to increment the PC as well as for ALU operations (and address calculations), and branch target calculation. The single-cycle datapath had 3 separate adders/ALUs, one for incrementing the PC, one for ALU operations (and address calculation), and one for branch target calculation. Why are a different number of ALUs used in the two different datapaths? (4 points) In the single-cycle datapath, we needed to perform 3 ALU operations in a clock cycle, and since a single ALU could only perform one operation in a cycle, 3 ALUs were needed. In the multicycle datapath, the same ALU could be used for all 3 functions, in different clock cycles. (iv) Is it easier to have a pipelined implementation for an architecture which has fixed-sized instructions (e.g., 4 bytes) than for an architecture which has variable-sized instructions? Why or why not? (4 points) It is easier to have a pipelined implementation fixed-size instructions because there is no need (to find out its length) in order to be able to instruction starts. This allows IF and ID to be for an architecture with to decode an instruction determine where the next overlapped. (3) Machine Performance (8 points) Consider two different implementations M1 and M2 of the same instruction set. There are four classes of instructions (A, B, C, and D) in the instruction set. M1 has a clock rate of 350 MHz and M2 has a clock rate of 450 MHz. The fraction of all instructions that belong to a particular class, and the average number of cycles for each instruction in the two implementations are as below: ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ ££ ££ ££ ££ £ £ £ £ £ £ £ £ £ £ £ £ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢ £ £ £ £ £ £ £ £ £ ££ Class A B C D Percentage of instructions in class 10% 35% 25% 30% CPI on M1 1 2 3 4 CPI on M2 2 2 4 4 Which implementation of the instruction set is faster, and by how much? Show your work. Overall CPI on M1 = 0.1×1 + 0.35×2 + 0.25×3 + 0.3×4 = 2.75 Overall CPI on M2 = 0.1×2 + 0.35×2 + 0.25×4 + 0.3×4 = 3.1 Time on machine = CPI × cycle time × number of instructions. (Time on M1) / (Time on M2) = (2.75/350×106 ) / (3.1/450×106 ) = 1.14 So M2 is 14% faster than M1. Name:_____________________________________ Page 5 of 9 (4) System Balance (12 points) Consider a system with a CPU, connected to separate instruction and data memories, as shown in the figure below. Instruction Memory Data CPU Memory The system is executing the following MIPS code. The code is a loop, which you can assume iterates many times. Also assume that the beq instruction is taken 25% of the time. $L6: lw beq lw lw addu sw $L4: addu addu addu addu slt bne $4,$4,4 $6,$6,4 $5,$5,4 $7,$7,1 $2,$7,10000 $2,$0,$L6 $2,0($4) $2,$0,$L4 $2,0($5) $3,0($6) $2,$2,$3 $2,0($4) Remember that in MIPS the instruction size is 4 bytes, and the word size is also 4 bytes. (i) What bandwidth would the instruction memory need to have to be able to support a sustained instruction execution rate of 200 MIPS when executing the above program? Show your work. (8 points) When the branch is not taken, there are 12 instructions executed and 4 memory references made per loop iteration. When the branch is taken, there are 8 instructions executed, and 1 memory reference made. On average, there are 0.75×12 + 0.25×8 = 11 instructions and 4×0.75 + 1×0.25 = 3.25 memory references per iteration. For each instruction we need to fetch 4 bytes, so to support a sustained instruction execution rate of 200 MIPS we need to have an instruction memory bandwidth of 200 MIPS × 4 bytes per instruction = 800 Mbytes per second (ii) What bandwidth would the data memory need to have to be able to support a sustained instruction execution rate of 200 MIPS when executing the above program? Show your work. (4 points) As above, there are 3.25 memory references per 11 instructions on average. Each memory reference is 4 bytes. So the bandwidth required from the data memory to support an execution rate of 200 MIPS is 200 × (3.25 / 11) × 4 = 236.36 Mbytes per sec. ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢£ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ THIS QUESTION CONTINUES ON THE NEXT PAGE. (a) Name:_____________________________________ (5) For the following instruction sequence, show the pipeline timing in the table below. Label pipeline bubbles caused by stalls with "bubble". (Note: You may not need all 16 cycles provided in the table.) (8 points) Pipelining (18 points) The following figure shows a simple pipeline. This pipeline has a register file that WRITES during the FIRST half of the clock cycle and READS during the SECOND half of the clock cycle. Instructions can be stalled only in the instruction-fetch and instruction-decode stages. Instruction Memory Cycles 1 IF 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 STORE IF LOAD ADD2 ADD2 ADD2 ADD1 Register File ID load r2 <- mem(r1) r2 <- r1 + r2 r3 <- r1 + r3 store mem(r4) <- r2 STORE STORE LOAD ADD2 ADD1 ADD1 ADD1 ID EX ALU ; LOAD ; ADD1 ; ADD2 ; STORE STORE LOAD bubble bubble bubble ADD2 ADD1 EX Memory MEM Data STORE LOAD bubble bubble bubble ADD2 ADD1 MEM Register File WB Page 6 of 9 STORE LOAD bubble bubble bubble WB ADD2 ADD1 Name:_____________________________________ Page 7 of 9 Pipelining question continued (b) Eliminate all of the bubbles caused by the instruction sequence in part (a) by reordering the instruction sequence and/or by adding bypass paths to the pipeline diagram on the previous page. Show the reordered instruction sequence in the space below, and any modifications to the pipeline diagram in the figure on the previous page. For your convenience, the original code sequence from the previous page is replicated below. Be sure to explain how your solution eliminates the bubbles that existed. (10 points) load r2 <- mem(r1) r2 <- r1 + r2 r3 <- r1 + r3 store mem(r4) <- r2 ; LOAD ; ADD1 ; ADD2 ; STORE The first bubble after the LOAD instruction cannot be removed using a bypass. In order to eliminate it the ADD1 and ADD2 instructions are reversed. Notice that they are independent of each other, so they can be moved in this way without affecting the semantics of the code segment. This creates an additional bubble before the STORE instruction, but it can be removed using bypasses. The two original bubbles that remain are removed by adding a bypass from both of the inputs into the MEM/WB pipeline register to both of the inputs of the ID/EX pipeline register. The new bubble that was introduced by the code motion is removed by adding a bypass from the input of the EX/MEM pipeline register to both of the inputs of the ID/EX pipeline register. (6) Datapath (24 points) In this problem you will extend the multi-cycle datapath as presented in the course text in Figures 5.33 and 5.34. These figures have been provided on the last page of this exam. It is possible to decrease the execution time of a program by combining two or more instructions (that would normally take two or more instruction times to execute) into a single instruction that takes only one instruction time (or a fraction more than one) to execute. These hybrid instructions are chosen because the instructions they replace appear often as a group in programs, and because they can be implemented without extensive modification to the datapath. Modify the datapath as necessary to allow the implementation of each instruction. Be sure to include complete information about any new signals that appear in the modified datapath. After the datapath is modified, fill in the list of control signals that must appears at each clock cycle in order to implement the instruction. To reduce the number of signals that you have to write, assume that the initial value of all signals is zero. Remember that if you assert a signal, you must de-assert it when necessary. The number of clock cycles needed for each instruction may vary. Use as many as you need, but make the implementation as efficient as possible. The signals for the first clock cycle already appear in the table. You should not have to modify them. ALL OF YOUR WORK FOR EACH INSTRUCTION SHOULD APPEAR ONLY ON THE PAGE FOR THAT INSTRUCTION. Note: In figure 5.34, ALU stands for arithmetic logic unit, MDR for memory data register, PC for program counter, and IR for instruction register. £ £ £ £ £ £ £ £ ¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ £ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢ ¢ ¢ ¢ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ (a) Name:_____________________________________ The DBNZ Rs,offset instruction decrements the value in register Rs (be sure to write it back to Rs) and branches to the target computed by adding the signed offset in bits [15-0] of the instruction to the PC, if the result of the decrement is non-zero. (Note: you may not need all the clock cycles provided in the table below.) Clock Cycle Note: Signals in parens are not needed (but should be included for clarity) because they were already set to that value. This includes those assumed to be zero at the start of the instruction. Control: The MUX on the B ALU input is expanded to allow input 4 to be a hard-wired "1". The MUX on the Write register input to the register file is expanded to allow input 2 to be bits [25-21] of the instruction register. This allows Rs to be written. Circuitry added to the datapath: In this space was Figure 5.33 from the course text. 8 7 6 5 4 3 2 1 £ £ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ Read Rs and Rt and compute branch target Icrement Rs and Update the PC Write back Rs Instruction -> IR and PC=PC+4 Functional Description Finish MemRd=1, ALUSrcA=0, IorD=0, IRWrite=1, ALUSrcB=01 ALUOp=00, PCWrite=1, PCSource=00 RegDst=10, RegWrite=1, PCWriteCond=0 MemRd=0, IRWrite=0, PCWrite=0 (ALUSrcA=0), ALUSrcB=11, (ALUOp=00) ALUSrcA=1, ALUSrcB=100, (ALUOp=00) PCSource=01,PCWriteCond=1 Signals Page 8 of 9 RegWrite=0 £ £ £ £ £ £ £ £ ¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ £ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢¢ ¢ ¢ ¢ ¢ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ £ £ £ £ £ £ £ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ £ £ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ (b) Name:_____________________________________ The LWI Rs,immediate(Rt) instruction loads the word addressed by the sum of the sign-extended immediate and the value in Rt into register Rs. The value in register Rt is also auto-incremented so that the next execution of the instruction will load the next word in memory. (Note: you may not need all the clock cycles provided in the table below.) Clock Cycle Note: Signals in parens are not needed (but should be included for clarity) because they were already set to that value. This includes those assumed to be zero at the start of the instruction. Control: The MUX on the A ALU input is expanded to allow input 2 to be the output of the register file B output holding register. The MUX on the Write register input to the register file is expanded to allow the 2 to be bits [25-21] of the instruction register. This allows Rs to be written. Circuitry added to the datapath: In this space was Figure 5.33 from the course text. 8 7 6 5 4 3 2 1 £ £ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢ £ £ £ £ £ £ Compute the load address Read Rt and compute branch target Get the value and Compute the new Rt Write the new Rs Write the new Rt Instruction -> IR and PC=PC+4 Functional Description Finish MemRd=0, (MemtoReg=0), (RegDst=0), RegWrite=1 MemRd=1, ALUSrcA=0, IorD=0, IRWrite=1, ALUSrcB=01 ALUOp=00, PCWrite=1, PCSource=00 MemRd=1, IorD=1 (ALUSrcA=10), ALUSRcB=01, (ALUOp=00) ALUSrcA=10, (ALUSrcB=11), (ALUOp=00) MemtoReg=1, RegDst=10, (RegWrite=1) MemRd=0, IRWrite=0, PCWrite=0 (ALUSrcA=0), ALUSrcB=11, (ALUOp=00) Signals Page 9 of 9 RegWrite=0 ...
View Full Document

This test prep was uploaded on 03/29/2008 for the course ECE 552 taught by Professor Wood during the Spring '05 term at University of Wisconsin.

Ask a homework question - tutors are online