Unformatted text preview: Name:_____________________________________ Page 1 of 9 CS/ECE 552 Introduction to Computer Architecture Midterm Exam Thursday, March 23, 1995 7:15-9:15 p.m Name:_________________________________________________________________________ Limit your answers to the space provided. Unnecessarily long answers will be penalized. If you use more space than is provided, you are probably doing something wrong. Use the back of each page for any scratch work. Write your last name on each page. Problem 1. Problem 2. Problem 3. Problem 4. Problem 5. Problem 6. Total ______________ (out of 26 points) ______________ (out of 12 points) ______________ (out of 12 points) ______________ (out of 14 points) ______________ (out of 18 points) ______________ (out of 18 points) ______________ (out of 100 points) Name:_____________________________________ Page 2 of 9 (1) Lookahead Comparator Design (26 points) In class we learned how to build a fast adder using carry look ahead. You used carry lookahead adder (CLA) units, in addition to single-bit adders. In this question, you are to design a lookahead comparator for 4-bit unsigned positive numbers. The comparator uses the same carry lookahead circuit (box CLA with logical equations given below) that was used in the adder design, and 4 special single bit comparators (boxes S3, S2, S1, and S0) as shown below.
CLA Logic Equations G P P = p3p2p1p0 CLA
c0 = 0 g1 g0 p0 G = g3 + g2p3 + g1p2p3 + g0p1p2p3 g3 p3 g2 p2 p1 Single Bit Comparator (S) Logic
e f S3 e f S2 e f S1 e f S0 ei = 1 if ai = bi fi = 1 if ai > bi a3 b3 a2 b2 a1 b1 a0 b0 (a) Give the logic equations for ei and fi . (4 points) ei = ai .bi + ai .bi fi = ai .bi (b) On the above diagram, draw the connection between the various Si and the CLA. Also indicate the values of the outputs (G and P) for each of the cases a 3 a 2 a 1 a 0 > b 3 b 2 b 1 b 0 , a 3 a 2 a 1 a 0 = b 3 b 2 b 1 b 0 and a 3 a 2 a 1 a 0 < b 3 b 2 b 1 b 0 . (10 points) Case a 3a 2a 1a 0 > b 3b 2b 1b 0 a 3a 2a 1a 0 = b 3b 2b 1b 0 a 3a 2a 1a 0 < b 3b 2b 1b 0 G 1 0 0 P 0 1 0 (c) Sketch (use the facing blank page), or describe in words, a schematic for comparing 64-bit positive numbers, A and B. For your design, also answer the following questions: (i) which outputs indicate the result of the comparison, and what are the values of the outputs for A < B, A = B, and A > B, (ii) what is the delay through the 64-bit comparator (measured in gate delays, assuming NAND gates of 4 inputs each). (12 points) In the figure above, ei is connected to pi and fi is connected to gi . For a 64-bit comparator, we will have 3 levels of lookahead, at level 1 will be 16 CLAs, at level 2 will be 4 CLAs, and at level 3 will be 1 CLA. The G and P outputs of the level 3 CLA will be the final output, with the values corresponding the the above table. Total delay through the comparator will be 8d, computed as follows. 2d going through each level for the CLA tree (for a total of 6d) plus 2d through the single-bit comparators. Name:_____________________________________ Page 3 of 9 (2) (12 points) We are interested in comparing the performance of two implementations of a machine, one with special floating-point hardware (Machine MFP), and one without special floating-point hardware (Machine MNFP). Consider a program with the following mix of operations: Integer instructions floating-point multiply floating-point add floating-point divide 68% 12% 16% 4% The number of clock cycles taken by each machine to execute instructions in each class is the following: Instruction Class Integer instructions floating-point multiply floating-point add floating-point divide Machine MFP 1 7 6 20 Machine MNFP 1 25 20 50 (i) MIPS MFP = 30.49 MIPS MNFP = 11.26 (ii) Both machines have a clock rate of 100MHz. What is the native MIPS for each machine executing the above program? State any assumptions that you make. (8 points) CPI MFP = 0.68 1 + 0.12 7 + 0.16 6 + 0.04 20 = 3.28 CPI MNFP = 0.68 1 + 0.12 25 + 0.16 20 + 0.04 50 = 8.88 MIPS = (cycles per second) / (cycles per instruction) What is the relative MFLOPS of the two machines, i.e., the MFLOPS of machine MFP relative to the MFLOPS of machine MNFP, when executing the above program? State any assumptions that you make. (4 points) Since the two machines are executing the same program, the number of FLOPS is the same, and therefore the relative MFLOSP is the same at their relative MIPS. MFLOPS of MFP relative to MFLOPS of MNFP = MIPS of MFP / MIPS of MNFP = 30.49 / 11.26 = 2.71 Name:_____________________________________ Page 4 of 9 (3) (12 points) Consider a system with a CPU, connected to separate instruction and data memories, as shown in the figure below.
Instruction Memory Data CPU Memory The system is executing the following MIPS code. The code is a loop, which you can assume iterates many times. Also assume that the beq instruction is taken 10% of the time. $L6: lw beq lw lw addu sw $L4: addu addu addu addu slt bne $4,$4,4 $6,$6,4 $5,$5,4 $7,$7,1 $2,$7,10000 $2,$0,$L6 $2,0($4) $2,$0,$L4 $2,0($5) $3,0($6) $2,$2,$3 $2,0($4) Suppose that the CPU can execute instructions infinitely fast, but, of course, it must fetch instructions from the (instruction) memory, and fetch and store data from and to the (data) memory. Also suppose that the the bandwidth of the instruction memory is 32106 bytes/sec, and the bandwidth of the data memory is 8106 bytes/sec. In case you have forgotten, the instruction size is 4 bytes, and the word size is also 4 bytes. (i) What is the best-case MIPS that the system can achieve when executing the above code? Show your work in order to receive credit. (8 points) 90% of the time through the loop we execute 12 instrs and make 4 data memory refs. 10% of the time we execute 8 instrs and make 1 data ref. Therefor, avg. number of instrs per iteration = 0.912 + 0.18 = 11.6, and the avg. number of data refs per iteration = 0.94 + 0.11 = 3.7. 4 bytes in each instruction and 4 bytes in each data reference. Since instruction and data memories are separate, instruction memory b/w suggests (32106 )/(11.64)11.6 = 8106 instructions per sec . Data memory bandwidth suggests (8106 )/(3.74)11.6 = 6.27106 Data memory is the bottleneck, so the best-case MIPS is determined by the data memory bandwidth, i.e., 6.27 MIPS. (ii) Now suppose that there was only a single memory for both instructions and data, and that the bandwidth of the memory was 32106 bytes/sec. What is the best case MIPS when executing the above code? (4 points) Now there is a single memory, which must service (11.6 + 3.7)4 bytes per 11.6 instructions. Therefore, the best-case instrs per sec is (32106 )/((11.6+3.7)4)11.6 = 6.07106 Name:_____________________________________ Page 5 of 9 (4) (i) Short Questions (14 points) Give two advantages of the following computer feature: In addition to the use of memory locations, hardware registers, internal to the CPU, are used to hold some operands. (4 points) -Shorter addresses; easier to encode multiple address in short instruction -- Smaller and faster, can speed up access to frequently-used operands (ii) Give one advantage of the following computer feature: Except with branch class instructions, the next instruction to be executed is stored in the memory location at the next higher address. (3 points) Program sequencing is easier -- can generate address of next instruction before current instruction is fetched. (iii) What is the primary implementation disadvantage of variable-sized opcodes. (3 points) Decoding them is harder (iv) What is a 1-address instruction? How many operands can a 1-address instruction have? (4 points) An instruction with only one operand specified explicitly. It can have more that one operand; the exact number depends upon the opcode. Operands not specified explicitly are implicit in the specification of the instruction. Name:_____________________________________ Page 6 of 9 (5) (i) Short pipelining questions. (18 points) What are delayed branches? Explain the use of delayed branches using a short code sequence. (5 points) Branches in which the control transfer occurs after the instruction that succeeds the branch has been executed (regardless of the outcome of the branch). Code sequence is not shown in this solution, but easily. can be worked out (ii) What problem do delayed branches solve? (3 points) Delayed branches solve the problem of pipeline bubbles due to branch hazards. (iii) What is data forwarding. (3 points) Passing a register value from the output of the ALU, or the output of the memory, directly back to the input of the ALU for use, without waiting for it to be written to the register file (iv) What problem does data forwarding solve? (3 points) Pipeline bubbles due to data hazards. The instruction which uses the register value does not have to wait until the value is written into the register file; this reduces or even eliminates pipelines bubbles due to such waiting. (v) Is it easier to pipeline the execution of an instruction set with variable-sized instructions, or an instruction set with fixed-size instructions. Why? (4 points) Fixed-size instructions. It is easy to determine the address of the next instruction before the current instruction is fetched (and decoded) with a fixed size instruction. This allows the fetch and the decode processes to be pipelined. With variable length instructions, the starting address of the next instruction is not known until the current instruction has been fetched and decoded. Name:_____________________________________ Page 7 of 9 (6) Datapath (18 points) Consider the datapath shown in figure 1. It is almost the same as the one in the H&P textbook for the multicycle implementation. The main difference is that there is the option to specify register 31 as the target register during write to the registers. The actions corresponding to the values of the control signals are as follows (IR is used in place of "Instruction Register"): Signal MemRd MemWr ALUSelA RegWrite IorD IRWrite Action if deasserted (0) None None PC is the first ALU operand None PC supplies the address to memory None None PCWrite PCSource ALU's output sent to PC Action if asserted (1) Memdata <- Memory[Read Address] Memory[Write Address] <- Write Data The register whose number is given by bits [25-21] of the instruction register is the first ALU operand Register[Write Register] <- Write data ALU's output is the address supplied to memory The value from memory (MemData) is written into the Instruction register (IR) PC is written; PCSource controls the input Bits [31-28] of the PC, bits [25-0] of the IR and 00(2 bits) are concatenated in order (higher to lower) and fed as input to PC Table 1. 1-bit control lines and their actions Value 0 1 2 0 1 2 0 1 2 0 1 2 Signal ALUSelB ALUOp MemtoReg RegDst Action Second ALU input is the register given by the bits [20-16] of IR Second ALU input is 4 Second ALU input is the lower 16 bits of IR sign-extended to 32 bits ALU does an add ALU does a subtract FUNC field of IR (bits [5-0]) specifies the ALU operation The value fed to the registers as write data comes from the memory The value fed to the registers as write data comes from the ALU's output The value fed to the registers is 0 or 1 depending on whether Neg is 0 or 1 Bits [20-16] of IR specify which register is going to be written if RegWrite is active Register 31 gets written if RegWrite is active Bits [15-11] of IR specify which register is going to be written is RegWrite is active Table 2. 2-bit control lines and their actions Clk 1 2 3 4 5 6 7 8 (c) SLT Rd, Rs, Rt In this instruction IR[31-26] = opcode, IR[25-21] = Rs, IR[20-16] = Rt and IR[15-11] = Rd. This a "set on less than" instruction. The contents of register Rs are compared with the contents of register Rt, and if [Rs] is less than [Rt] then 1 is written in the register Rd else 0 is written in register Rd. ( [R] means contents of register R ) (b) JAL address In this instruction IR[31-26] = opcode and IR[25-0] = address. This is a "Jump and link " instruction. PC+4 gets written into register 31 and then execution continues at the instruction whose address is given by the contents of the address field IR[25-0]. Clk 1 2 3 4 5 6 7 8 Control Signals IorD = 0, MemRd = 1 IRWrite = 1 MemRd = 0, IRWrite = 0 AluSelA = 0, AluSelB = 1, AluOp = ADD. MemtoReg = 1, RegDest = 1, RegWrite =1. RegWrite = 0, PCSource = 1, PCWrite = 1. PCwrite = 0. (a) SW Rt, Rs(offset) In this instruction, IR[31-26] = opcode, IR[25-21] = Rs, IR[20-16] = Rt and IR[15-0] = offset. This instruction stores the contents of register Rt into the memory location whose address is computed by adding the 16-bit signed offset to the base register (Rs). Clk 1 2 3 4 5 6 7 8 Control Signals IorD = 0, MemRd = 1 IRWrite = 1 MemRd = 0, IRWrite = 0 AluSelA = 1, AluSelB = 2, AluOp = ADD MemWr = 1 MemWr=0, AluSelA = 0, AluSelB = 1, AluOp = ADD, PCSource = 0 PCwrite = 1. PCwrite = 0. Name:_____________________________________ Note that ALU sets NEG to 1 when the result of the ALU operation is negative. Also, note that since the instructions are word aligned, the 2 least significant bits of "Jump Address" are always zero. So bits 25 to 0 of the IR register are mapped to bits 27 to 2 of "Jump Address". You are asked to write the control sequence (i.e., the relevant control lines and their values) to fetch from memory and execute the following instructions: Page 8 of 9 Control Signals IorD = 0, MemRd = 1 IRWrite = 1 MemRd = 0, IRWrite = 0 AluSelA = 1, AluSelB = 0, AluOp = SUB /* NEG becomes 1 if result negative */ MemtoReg = 2, RegDest = 2, RegWrite = 1. RegWrite = 0, AluSelA = 0, AluSelB = 1, AluOp = ADD, PCSource = 0 PCwrite = 1. PCwrite = 0. Name:_____________________________________ Page 9 of 9 To reduce the number of signals you have to write assume that initially, at cycle 1, all signals are 0. Whenever you assert a signal during a cycle be sure to deassert during the next cycle if this is necessary. The number of cycles required by each instruction type may be different. You can use as many cycles as you need but not more than 8 for each of the instructions. PCSource ALU Neg AluSelA 32 Shift left by 2 [31-28] Write Reg Read Data 2 Write Data Read reg 2 Read Data1 4 AluSelB 0 1 0 1 2 Read reg 1 Registers RegWrite Sign Ext.
0 1 control ALU Aluop 0 1 [25-21] [20-16] [15-11] 31 [25-0] Instruction [25-0] [15-0] 16 Instr. Register IRWrite MemRd MemWr Read Address Memory Write Address [31-0] IorD 0 PCwrite Figure 1. The Datapath with its control signals PC 1 Write Data 0 1 16 MemtoReg RegDest 0 1 2 0 1 2 [5-0] ...
View Full Document
- Spring '05
- Computer Architecture, Central processing unit, Processor register, CPU cache, Machine code