Unformatted text preview: Name:_____________________________________ Page 1 of 9 CS/ECE 552 Introduction to Computer Architecture Midterm Exam Wednesday, March 20, 1996 7:15-9:15 p.m Name:_______________________SOLUTION ________________________________________ Limit your answers to the space provided. Unnecessarily long answers will be penalized. If you use more space than is provided, you are probably doing something wrong. Use the back of each page for any scratch work. Write your last name on each page. Problem 1. Problem 2. Problem 3. Problem 4. Problem 5. Problem 6. Total ______________ (out of 23 points) ______________ (out of 10 points) ______________ (out of 12 points) ______________ (out of 14 points) ______________ (out of 20 points) ______________ (out of 21 points) ______________ (out of 100 points) Write the logic equations for the following signals, in the space provided: (15 points) G I = X 2 .Y 2 2 P I = X 3 +Y 3 3 G II = G I + G I .P I + G I .P I .P I + G I .P I .P I .P I 3 15 14 15 13 14 15 12 13 14 15 G III = G II + G II .P II + G II .P II .P II + G II .P II .P II .P II 0 3 2 2 1 2 1 0 2 1 0 P III = P II .P II .P II .P II 2 8 9 10 11 C 4 = G II + P II .C 0 0 0 C 12 = G II + G II .P II + G II .P II .P II + P II .P II .P II .C 0 2 1 2 0 2 1 2 1 0 C 48 = G III + G III .P III + G III .P III .P III + P III .P III .P III .C 0 2 1 2 0 2 1 2 1 0 C 64 = G III + G III .P III + G III .P III .P III + G III .P III .P III .P III + P III .P III .P III .P III .C 0 3 2 3 1 3 2 0 3 2 1 3 2 1 0 Adder Design (23 points) 16-bit CLA 16-bit CLA 16-bit CLA S 2 = X 2 Y 2 C 2 (a) (1) Name:_____________________________________ Let Xi , Yi , Ci and Si , with i = 0, 1, ..., 63, represent the individual bits of the two operands, the carries, and the sums, respectively. Let G I and P I , i = 0,....,63 represent the first level generates and propagates. Let G II and i i j P II , j = 0,...,15 represent the second level generates and propagates, and let G III and P III , k = 0,..,3 represent j k k the third level generates and propagates. The figure below shows a 64-bit carry lookahead adder with three levels of lookahead. Some signals are shown in the figure to give you an idea of the structure of the adder; however not all relevant signals are shown (for example the carries from the first level lookahead to the single-bit adders). Third Level Lookahead Second Level Lookahead G III 0 Single Bit Adders P III 0 Page 2 of 9 G II P II 0 0
First level Lookahead GI PI 0 0 C0 Signal C 12 P III 3 S 57 S2 Time Ready 13 3 5 5 Comments for P I ; for P II ; for P III i i i for G I , P I ; 2 in first level, and 2 in second level i i for G I , P I ; 10 up and down lookahead tree; and 2 for S 57 i i for G I , P I ; 2 for C 2 , and 2 for S 2 i i (ii) (i) (b) Name:_____________________________________ (2) Show the design of your barrel shifter. Be sure to show all the connections (for each bit), as well as the control signals. (6 points) What type of multiplexor will you use as your building block? Show its design. (4 points) Assume that each gate delay is time units, and all gates are available with up to 5 inputs. Further assume that all Xi , all Yi , and C 0 are ready at time 0. In the table below, show at what time the chosen signal value is ready. Use the comment column for any comments (comments will be used to determine partial credit in case of an incorrect answer). (8 points) Barrel Shifter (10 points) You are to design a barrel shifter, to carry out the following operations on 8-bit numbers: (i) rotate an input number left by B bit positions, and (ii) rotate an input number right by B bit positions, where 0 B 7. op s op s op s b1 b7 b5 b7 a0 a7 a6 a7 11 10 01 00 c3 c7 c3 c7 11 10 01 00 11 10 01 00 c7 r7 a7 b7 b0 b6 b4 b6 a7 a6 a5 a6 11 10 01 00 c2 c6 c2 c6 11 10 01 00 11 10 01 00 c6 r6 b6 a6 b7 b5 b3 b5 a6 a5 a4 a5 11 10 01 00 c1 c5 c1 c5 11 10 01 00 11 10 01 00 c5 r5 a5 b5 b6 b4 b2 b4 a5 a4 a3 a4 11 10 01 00 c0 c4 c0 c4 11 10 01 00 11 10 01 00 c4 r4 a4 b4
S0 S1 b5 b3 b1 b3 a4 a3 a2 a3 11 10 01 00 c7 c3 c7 c3 11 10 01 00 11 10 01 00 c3 r3 a3 b3
11 10 4 X 1 MUX b4 b2 b0 b2 a3 a2 a1 a2 11 10 01 00 c6 c2 c6 c2 11 10 01 00 11 10 01 00 01 c2 r2 a2 b2
00 a2 a1 a0 a1 b3 b1 b7 b1 11 10 01 00 c5 c1 c5 c1 11 10 01 00 11 10 01 00 c1 r1 a1 b1 a1 a0 a7 a0 b2 b0 b6 b0 11 10 01 00 c4 c0 c4 c0 11 10 01 00 11 10 01 00 c0 b0 r0 a0 op = 0 left rotate 1 right rotate S(2:0) = rotate count. Page 3 of 9 Name:_____________________________________ Page 4 of 9 (3) System Balance (12 points) Consider the computer system shown in the figure below.
BUS CPU MEMORY I/O DEVICE The bandwidths of the devices are: Device Bus Memory I/O Device 8107 8107 8105 (i) (ii) (iii) The CPU needs an average of 32 bits from memory, and 0.1 bits from the I/O device, to execute an instruction. What is the peak instruction execution rate of the CPU? Assume that all CPU functions can be carried out as fast as possible. Show your work. (6 points). Memory can sustain MIPS. 88107 88105 = 20 MIPS. I/O device can sustain = 64 32 0.1 7 8810 Bus can sustain sustain = 19.94 MIPS. 32.1 Bus is the bottleneck, and the peak instruction execution rate of the CPU is 19.94 MIPS Suppose that the CPU has a clock speed of 100MHz. What is the CPI when the system is executing at its peak instruction execution rate as in part i)? (3 points) CPI = 100106 = 5.02 19.94106 Define the utilization of a resource to be the bandwidth of the resource used divided by the bandwidth available. Which component has the lowest utilization when the CPU is executing at its peak execution rate? What is the utilization? (3 points) I/O device has the least utilization (it has the most spare capacity). 19.94106 0.1 Its utilization = = 0.312 88106 Bandwidth (bytes/sec) Cycles 1 2 3 4 5 6 7 8 9 10 11 12 IF LOAD1 ADD LOAD2 LOAD2 LOAD2 SUB ; LOAD1 ; ADD ; LOAD2 ; SUB LOAD1 ADD ADD ADD LOAD2 SUB SUB ID LOAD1 bubble bubble ADD LOAD2 bubble SUB EX LOAD1 bubble bubble ADD LOAD2 bubble SUB MEM LOAD1 bubble bubble ADD LOAD2 bubble SUB WB (b) (a) Name:_____________________________________ (4) Helps preventing an extra stall in the pipeline when the instruction in ID stage needs a register being produced by the instruction in the WB stage. cycles : 5 and 8. In the above, we assumed that the register file is written in the first half and read in the second half of the clock cycle. What purpose does this assumption serve? Where in the pipeline timing table above did you make use of this information (list the cycles) ? (4 points) load r2 <- mem(r1+0) r3 <- r3 + r2 load r4 <- mem(r2+r3) r4 <- r5 - r3
Stall Memory Inst For the following instruction sequence, show the pipeline timing in the table below. (10 points) Pipelining (14 points) Following is a simple pipeline, with a bypass from the output of the EX stage to the input of the EX stage. (Note: There is no bypass from output of the MEM stage to the EX stage). Assume that the register file is written in the first half and read in the second half of the clock cycle. IF insert 0s Hazard Detect Reg. File ID ALU EX bypass Mem Data MEM WB Reg. File Page 5 of 9 Name:_____________________________________ Page 6 of 9 (5) (i) Short Questions (20 points) Why is hard to build a pipelined CPU for an instruction set with variable-length instructions. (4 points) Don't know where next instruction starts until current instruction is decoded (and its length established). Hard to overlap fetch of next instruction with decode of current instruction. (ii) What is a Program Address Register used for? How is it related to a Program Counter? (4 points) A Program Address Register (PAR) is used to sequence through a program. A Program Counter (PC) is a special case of a PAR, and is used when instructions to be executed consecutively are placed consecutively in the memory. (iii) In the non-pipelined, multicycle datapath, we used one memory element, which held both instructions and data. When we decided to pipeline the datapath, we decided to have separate memories for instructions and data. Why is this so? (4 points) In a non-pipelined datapath we needed to make only a single memory reference (instruction or data) in a clock cycle, and a single memory element sufficed. In a pipelined datapath, two memory requests canbe made simultaneously, one for instructions (in the IF stage) and one for data (in the MEM stage), so separate memories are needed to prevent a performance loss. (iv) The design team for the Floating Point Unit (FPU) of a CPU project is faced with two design options : Option 1: Add special FP square root (FPSQR) hardware that will speedup this operation by a factor of 10. FPSQR is responsible for 20% of the execution time of a critical benchmark. Option 2: Make all FP instructions run faster. FP instructions are responsible for a total of 50% of the execution time. The team believes that they can make all FP instructions run 2 times faster with the same effort as required for fast square root of Option 1. Which option is better, and why? (8 points) Option 1 results in a new execution time of 0.8 + speedup of 1.22. Option 1 results in a new execution time of 0.5 + speedup of 1.33. 0.2 10 0.5 2 = 0.82, i.e., a = 0.75, i.e., a Option 2 results in more speedup, and therefore is the better option. Name:_____________________________________ Page 7 of 9 (6) Datapath (21 points) Consider the datapath shown in figure 1. It is almost the same as the one in the H&P textbook for the multicycle implementation. The actions corresponding to the values of the control signals are as follows (IR is used in place of "Instruction Register"): Signal MemRd MemWr ALUSelA Action if deasserted (0) None None PC is the first ALU operand None Disables the use of Zero RegWrite ZeroEnCond IorD PC supplies the address to memory None None None IRWrite PCWrite TargetWrite Action if asserted (1) Memdata <- Memory[Read Address] Memory[Write Address] <- Write Data The register whose number is given by bits [25-21] of the instruction register is the first ALU operand Register[Write Register] <- Write data Enables the use of Zero condition. ALU's output is the address supplied to memory The value from memory (MemData) is written into the Instruction register (IR) PC is written; PCSource controls the input Target buffer gets written. Table 1. 1-bit control lines and their actions Value 0 1 2 3 0 1 2 0 1 2 0 1 2 0 1 2 3 Signal ALUSeIB ALUOp WritedataSel RegDst PCSource Action Second ALU input is the register given by bits [25-21] of IR. Second ALU input is the register given by the bits [20-16] of IR Second ALU input is 4 Second ALU input is the lower 16 bits of IR sign-extended to 32 bits ALU does an add ALU does a subtract ALU does an OR. The value fed to the registers as write data, comes from the memory The value fed to the registers as write data, comes from the ALU output The value fed to the registers as write data, comes from the PC output. Bits [20-16] of IR specify which register is going to be written if WriteEnable is active Register 31 gets written if WriteEnable is active Bits [15-11] of IR specify which register is going to be written is WriteEnable is active Value stored in Target is sent to PC. Output of ALU is sent to PC. Bits [25-0] from IR left shifted by 2 with bits [31-28] from the PC concatenated as high order bits, are sent to PC. Data read on Read Data1 port of Registers is sent to PC. Table 2. 2-bit control lines and their actions Clk 1 2 3 4 5 6 7 8 Control Signals IorD=0, MemRd=1 IRWrite=1, AlUSelA=0, ALUSelB=2, ALUOp=0, PCSource=1, PCWrite=1 MemRd=0, PCwrite=0, IRWrite=0, ALUSelA=0, ALUSelB=3, ALUOp=0, TargetWrite=1 TargetWrite=0, ALUSelA=1, ALUSelB=3, ALUOp=0, MemWr=1, CondWr=1 MemWr=0, CondWr=0, RegDest=0, WritedataSel=2, RegWrite=1 RegWrite = 0 Clk 1 2 3 4 5 6 7 8 Control Signals IorD=0, MemRd=1 IRWrite=1, ALUSelA=0, ALUSelB=2, ALUOp=0, PCSource=1, PCWrite=1 MemRd=0, PCwrite=0, IRWrite=0, ALUSelA=0, ALUSelB=3, ALUOp=0, TargetWrite=1 TargetWrite=0, ALUSelA=1, ALUSelB=0, ALUOp = 0, RegDest=1, WritedataSel=3, Regwrite=1 Regwrite=0, ZeroEnCond = 1, PCSource = 0, PCWrite=1 PCWrite=0 (ii) (i) Name:_____________________________________ To reduce the number of signals you have to write assume that initially, at cycle 1, all signals are 0. Whenever you assert a signal during a cycle be sure to deassert during the next cycle if this is necessary. The number of cycles required by each instruction type may be different. You can use as many cycles as you need but not more than 8 for each of the instructions. Write the control sequence (i.e., the relevant control lines and their values) to fetch from memory and execute this instruction. (6 points) Changes are made on the datapath on the next page, in a darker line. Add any necessary datapaths and control signals to the datapath shown on the next page for implementing the "store conditional" instruction. You can make changes on the given datapath itself. (7 points) (b) SC Rt, offset(Rs) In this instruction IR[31-26] = opcode, IR[25-21] = Rs, IR[20-16] = Rt and IR[15-0] = offset. This is a "store conditional" instruction. We wish to add this instruction to the single cycle datapath shown on the next page. Assume that a signal named LLbit (not shown in the datapath) is available to you (do not concern yourself with how this signal is generated). The semantics of the instruction is as follows. If LLbit is '1' then the contents of Rt are stored in memory location (Rs + (signed)offset) and Rt is set to 1. If LLbit is '0' then store does not take place and a '0' is written into Rt. (Note : LLbit is just a single bit hence you can't simply write it into a 32-bit register.) (a) BLTZAL Rs, offset In this instruction IR[31-26] = opcode, IR[25-21] = Rs and IR[15-0] = offset. This is a "Branch on less than zero and link". PC+4 gets written into register 31. If the contents of Rs are less than zero then the execution continues at the instruction which is given by (PC + (signed)offset). Otherwise, the execution continues from the next instruction (PC+4). Write the control sequence (i.e., the relevant control lines and their values) to fetch from memory and execute this instruction. (8 points) Note that ALU sets Zero to 1 when the result of the ALU operation is zero. Also, note that since the instructions are word aligned, the 2 least significant bits of "Jump Address" are always zero. So bits 25 to 0 of the IR register are mapped to bits 27 to 2 of "Jump Address". Page 8 of 9 Targetwrite PCwrite
OR AND AND a n d ZeroEnCond
0 1 [31-28] [31-0] LLbit [25-0] 32
2 Target Zero MemWr 3 Shift left by 2 PCSource CondWr IRWrite RegWrite RegDest
0 1 Wr En MemRd IorD 0 1 AluSelA Wr enable 0 PC
Read Address [20-16] Instruction [25-0]
0 1 [25-21] WriteEnable Read reg 1 Read reg 2 Read Data1 Name:_____________________________________ Zero Memory
Memdata Write Address Registers
0 1 ALU 31
1 2 Write Data [15-0] Instr. Register
[15-11] Write Reg Read Data 2 Write Data 4 2 3 Aluop
0 1 2 3 Figure 1. The Datapath with its control signals
Zero Extend WritedataSel [5-0]
32 1 AluSelB ALU Ext.
control LLbit Page 9 of 9 ...
View Full Document
This test prep was uploaded on 03/29/2008 for the course ECE 552 taught by Professor Wood during the Spring '05 term at University of Wisconsin.
- Spring '05
- Computer Architecture