Unformatted text preview: CS/ECE 552 Introduction to Computer Architecture Midterm Exam Tuesday, March 11, 2003 7:15-9:15 p.m Name:_________________________________________________________________________ Limit your answers to the space provided. Unnecessarily long answers will be penalized. If you use more space than is provided, you are probably doing something wrong. Use the back of each page for any scratch work. Problem 1. Problem 2. Problem 3. Problem 4. Problem 5. Problem 6. Total ______________ (out of 24 points) ______________ (out of 14 points) ______________ (out of 12 points) ______________ (out of 14 points) ______________ (out of 22 points) ______________ (out of 14 points) ______________ (out of 100 points) (a) (1) Let Xi , Yi , Ci and Si , with i = 0, 1, ..., 63, represent the individual bits of the two operands, the carries, and the sums, respectively. Let G I and P I , i = 0,....,63 represent the first level generates and propagates. Let G II and i i j P II , j = 0,...,15 represent the second level generates and propagates, and let G III and P III , k = 0,..,3 represent j k k the third level generates and propagates. Write the logic equations for the following signals, in the space provided: (10 points) C3 = C 32 = C 20 = P II = 2 G II = 3 The figure below shows a 64-bit carry lookahead adder with three levels of lookahead. Some signals are shown in the figure to give signals are shown (for example the carries from the first level lookahead to the single-bit adders). Lookahead Adder Design (24 points) 16-bit CLA 16-bit CLA 16-bit CLA Third Level Lookahead Second Level Lookahead G III 0 Single Bit Adders P III 0 Page 2 of 10 G II P II 0 0 C0
First level Lookahead GI PI 0 0 Page 3 of 10 (b) Now suppose that the same structure is to be used as an incrementer, i.e., to add 1 to an input number. How would the lookahead logic and single-bit addition units change? Show the design for the lookahead logic and single-bit addition units in case of the incrementer. (8 points) (c) Assume that each gate delay is time units, and all gates are available with up to 4 inputs. How much faster than the adder would the incrementer be? Show your work. (6 points) Page 4 of 10 (2) System Performance (14 points) Recall that the speedup resulting from an enhancement is the time to complete a task using the enhancement divided by the time to complete the task without using the enhancement. Use this relationship to answer the following questions. Be sure to write out your calculations clearly. The base system spends 82% of its time computing and 18% of its time waiting for the disk. During the time that it is computing, the instruction mix and the average cycles per instruction (CPI) for each type is: Type Percent of all instructions executed 40% 30% 30% Integer Other Floating point (i) There are three modifications that can be done to the system. For each modification, compute the speedup resulting from the modification. a). The processor is replaced with a new one that reduces the total computation time by 35%. (3 points) b). The disk is replaced with a solid state device that reduces the disk waiting time by 85%. (3 points) c). The processor is replaced with a new one that has improved floating point performance. The average floating point CPI is reduced to 3; all other aspects are unchanged. (3 points) (ii) Which modification results in the best speedup? (1 point) (iii) For the two modifications in part (i) that did not result in the best speedup, is it possible for them to achieve the speedup achieved by the modification in part (ii)? Show your work and explain your answer. (4 points) CPI 1 5 2 Page 5 of 10 (3) (i) MIPS ISA (12 points) In the MIPS instruction set, the offset in a branch instruction is from the PC of the instruction following the branch, not from the PC of the branch. What is the reason for this choice? (3 points) (ii) What are SET instructions in the MIPS architecture? How are they useful? (3 points) (iii) Why are the offsets for branch instructions and displacements for load and store instructions in the MIPS ISA (I-format) limited to 16 bits? (3 points) (iv) Why is the branch offset shifted left by 2 bits while the displacement for loads and stores not shifted? (3 points) Page 6 of 10 (4) (i) Short Questions (14 points) What is system balance? How does the notion of system balance influence the design of a computer system? (3 points) (ii) Many processors have 5 or 6 stage pipelines. A typical value for the CPI (cycles per instruction) in such processors is in the range of 1.0 to 1.5. Does it mean that the latency of execution of most instruction 1 or 2 clock cycles? Why, or why not? (3 points) (iii) Why do conditional branches impact the performance of a pipelined implementation? (2 points) (iv) Briefly describe 3 solutions to reduce the performance impact of conditional branch instructions in a pipelined implementation. (6 points) (a) (5) THIS QUESTION CONTINUES ON THE NEXT PAGE. For the following instruction sequence, show the pipeline timing in the table below. Label pipeline bubbles caused by stalls "bubble". (Note: you may not need all 16 cycles provided in the table.) (10 points) Pipelining (22 points) The following figure shows a simple pipeline. This pipeline has a register file that WRITES during the FIRST half of the clock cycle and READS during the SECOND half of the clock cycle. Instructions can be stalled only in the instruction-fetch and instruction-decode stages. There is a bypass from the output of the EX stage (output of the EX/MEM latch) to the input of the logic in the EX stage. Cycles 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Stall Memory Inst IF r3 <- r2 + r1 load r2 <- mem(r3 + r1) r2 <- r2 + r1 r4 <- r3 - r1 r1 <- r2 + r1 IF ADD1 LOAD insert 0s Hazard Detect Reg. File ID ADD1 ID ALU EX bypass EX ; ADD1 ; LOAD ; ADD2 ; SUB ; ADD3
Mem Data MEM MEM WB Reg. File Page 7 of 10 WB Page 8 of 10 Pipelining question continued (b) In part (a) it was assumed that the register file writes in the first half of the clock cycle and reads in the second half of the cycle. Now assume that the register file is changed so that READS occur in the first half of the clock cycle and WRITES occur in the second half of the clock cycle. Draw an additional bypass and any additional hardware on the pipeline diagram so that the execution is the same as in part (a). Label the bypass as "bypass 1". Also describe the bypass you added in the space below. (6 points) (c) Eliminate ALL of the bubbles caused by the instruction sequence in part (a) by reordering the instructions (without changing the semantics of course) and by adding an additional bypass path. Assume the original pipeline configuration exists, WITHOUT any of the modifications made in part (b). Show the reordered instruction sequence in the space below and draw the additional bypass on the pipeline diagram labeled "bypass 2". For your convenience, the original code sequence is repeated below. (6 points) r3 <- r2 + r1 load r2 <- mem(r3 + r1) r2 <- r2 + r1 r4 <- r3 - r1 r1 <- r2 + r1 ; ADD1 ; LOAD ; ADD2 ; SUB ; ADD3 Page 9 of 10 (6) Datapath (14 points) In this problem you will extend the multi-cycle datapath as presented in the course text in Figures 5.33 and 5.34. These figures have been provided on the last page of this exam. It is possible to decrease the execution time of a program by combining two or more instructions (that would normally take two or more instruction times to execute) into a single instruction that takes only one instruction time (or a fraction more than one) to execute. These hybrid instructions are chosen because the instructions they replace appear often as a group in programs, and because they can be implemented without extensive modification to the datapath. Modify the datapath as necessary to allow the implementation of each instruction. Be sure to include complete information about any new signals that appear in the modified datapath. After the datapath is modified, fill in the list of control signals that must appears at each clock cycle in order implement the instruction. To reduce the number of signals that you have to write, assume that the initial value all signals is zero. Remember that if you assert a signal, you must de-assert it when necessary. The number clock cycles needed for each instruction may vary. Use as many as you need, but make the implementation efficient as possible. The signals for the first clock cycle already appear in the table. You should not have modify them. to of of as to ALL OF YOUR WORK FOR EACH INSTRUCTION SHOULD APPEAR ONLY ON THE PAGE FOR THAT INSTRUCTION. Note: In figure 5.34, ALU stands for arithmetic logic unit, MDR for memory data register, PC for program counter, and IR for instruction register. (i) On the next page, modify the datapath(and control signals, as necessary), and fill in the list of control signal to implement a LDINC Rs, offset(Rt) instruction. The LDINC Rs, offset(Rt) instruction loads the word addressed by the sum of the sign-extended offset (bits 15-0) and the value in Rt (bits 25-21) into register Rs (bits 20-16). The value in register Rt is also auto-incremented by 4 so that the next execution of the instruction will load the next word in memory. (Note: you may not need all the clock cycles provided in the table on the next page.). Points will be taken off for solutions that require more hardware, or take more time steps, than necessary. (14 points) Clock Cycle 3 8 7 6 5 4 2 1 Instruction -> IR and PC=PC+4 Functional Description MemRd=1, ALUSrcA=0, IorD=0, IRWrite=1, ALUSrcB=01 ALUOp=00, PCWrite=1, PCSource=00 Signals Page 10 of 10 Page 11 of 10 ...
View Full Document
- Spring '05
- Computer Architecture, Central processing unit, clock cycle, G II