test1-2004-sol - Test #1 (Solutions) Last name of student...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Test #1 (Solutions) Last name of student Student ID number AM TA: First name of student Email address PM TA: Do This First: 1. Make sure that you have all the pages of the exam book. 2. Fill in the above information. Exam Rules: 1. 2. 3. 4. 5. 6. 7. This is a 1 hour and 50 minutes exam. The syllabus includes all material covered through Session #11 (inclusive). This is an Open Book and Notes Exam. You are not allowed to consult any other student. You may use a calculator, but not a laptop, palmtop or such other computer. The last page is blank for your use. If you need to make specific assumptions to answer a question, make sure to state them explicitly. 1 Number Representations 2 Instruction Set 3 Single- and multi-cycle CPU's 4 Processor Pipeline 5 Computer Performance Total 100 points 10 points 20 points 25 points 20 points 25 points ECSE-2660 Computer Architecture, Networks, and Operating Systems, Spring 2004 Page 1 of 11 1. (10 points) Number Representations a. (5 points) Subtract the two 8 bit signed integers below using binary arithmetic. Show your work, and indicate if an overflow occurs. Also, express the result both as a signed binary number and as a decimal number. 1001 10112 - 0101 01002 First, negate the 2nd number by converting it to its 2's complement: 10101100 1001 1011 -101(10) + 1010 1100 -84(10) _______________ 1 01000111 : overflow. -185(10). Grading Policy: 5 for correct answer, with overflow. 2 points for getting the correct 2's complement. 2 points for the correct decimal representation of all the numbers and the result. b. (5 points) Add the following 32-bit single-precision IEEE 754 standard floating-point numbers. 0x 8E60 0000 0x 9028 0000 1. Converting numbers to binary form: 0x 8E60 0000 0x 9028 0000 = = = = 1000 1110 0110 0000 - 1.11 x 228 127 1001 0000 0010 1000 - 1.0101 x 232 127 2. Then we equalize the exponents for the 2 numbers, so the significant of the number with the lesser exponent is shifted right till the two exponents are same: -1.11 x 228127 = - 0. 000111 x 232 127 3. Now add the significants: -0. 000111 x 232127 + (-1 .0101 x 232 127) = -1. 011011 x 232 127 4. Converting this number back to floating point: 1001 0000 0011 0110 0...... = 0x 9036 0000 Grading Policy: 1 point for each of steps #1, #3 and #4; 2 points for the step # 2 of equalizing the exponents. ECSE-2660 Computer Architecture, Networks, and Operating Systems, Spring 2004 Page 2 of 11 2. (25 points) MIPS Instruction Set Consider the following MIPS code residing in the memory: Line 1 2 3 4 5 6 7 8 0x 0x 0x 0x 0x 0x 0x 0x Address 0000 0000 0000 0000 0000 0000 0000 0000 0300 0304 0308 030C 0310 0314 0318 031C MIPS Mnemonic add $t1, $zero, $zero add $t3, $zero, $a1 foo: slt $s0, $a2, $t3 bne $s0, $zero, exit add $t1, $t1, $t3 addi $t3, $t3, 3 j foo exit: add $v0, $zero, $t1 2a. (10 points) Fill in the following table for each instruction corresponding to the line number. Provide the instruction format (i.e. I, J, or R), and also convert each instruction to binary. You will need to refer to Figure 3.13 on page 140. Line Inst. Format (I, J, R) R R R I R I J R Instruction in 32 bits binary 000000 00000 00000 01001 00000 100000 000000 00000 00101 01011 00000 100000 000000 00110 01011 10000 00000 101010 000101 10000 00000 00000 00011 000111 000000 01001 01011 01001 00000 100000 001000 01011 01011 00000 00000 000011 000010 00000 00000 00000 00011 000010 000000 00000 01001 00010 00000 100000 1 2 3 4 5 6 7 8 Grading: 1 pt for trying, +1 pt for each correct instruction format, +2pts for each correct binary. 2b. (15 points) Assuming that registers $a1 and $a2 corresponds to variables n and m respectively, explain what this MIPS code does. Hint: You may wish to write a C/C++ like pseudo-code first. C code: t1 = 0; t3 = n; while (t3 <= m) { t1 += t3; t3 += 3; } v0 = t1; This program calculates the sum n + (n+3) + (n+6) + ... + (n+3*k), where k is an integer and (n+3*k) < m. If n >= m, then the program returns n as the sum. ECSE-2660 Computer Architecture, Networks, and Operating Systems, Spring 2004 Page 3 of 11 Grading: 2pts for trying; +3pts for determining that the loop increases t3 by increments of 3; +3pts for mentioning (n+3*k); 10pts for totally correct answer. ECSE-2660 Computer Architecture, Networks, and Operating Systems, Spring 2004 Page 4 of 11 3. (20 points) Single-Cycle and Multi-Cycle Processor Implementations Program A sub beq add beq add beq $3, $4, $4, $3, $5, $2, $2, $3, $3, $2, $5, $3, $1 36 $4 20 $5 128 lw sw add beq add lw Program B $6, $3, $4, $3, $5, $6, 80($1) 100($6) $3, $4 $2, 50 $5, $5 400($5) Two programs A and B, shown in the above table, are executed on two different machines. One of the machines is a single-cycle machine (from Figure 5.29 in the text) and the other is a multicycle machine (from Figure 5.33 in the text), sketches of which are provided below. Assume that operational delay is 4ns for memory, 3ns for instruction register, 4ns for registers (register file), 4ns for ALUs, and zero for all the other components in the architecture. Also assume that none of the branches in the programs A and B are taken. SINGLE-CYCLE MACHINE (Figure 5.29 of the text) Instruction [25 0] 26 Shift left 2 Jump address [31 0] 28 0 M u x ALU Add result Add 4 Instruction [31 26] Control RegDst Jump Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite Read register 1 Shift left 2 1 1 M u x 0 PC+4 [31 28] PC Read address Instruction [31 0] Instruction memory Instruction [25 21] Instruction [20 16] 0 M u x 1 Read data 1 Read register 2 Registers Read Write data 2 register Write data Instruction [15 11] 0 M u x 1 Zero ALU ALU result Address Read data Data memory Write data Instruction [15 0] 16 Sign extend 32 ALU control 1 M u x 0 Instruction [5 0] ECSE-2660 Computer Architecture, Networks, and Operating Systems, Spring 2004 Page 5 of 11 MULTI-CYCLE MACHINE (Figure 5.33 of the text) PCWriteCond PCSource PCWrite ALUOp Outputs IorD ALUSrcB MemRead ALUSrcA Control MemWrite RegWrite MemtoReg Op RegDst IRWrite [5 0] Jump address [31-0] 0 Instruction [25 0] Instruction [31-26] Address Memory MemData Write data Instruction [25 21] Instruction [20 16] Instruction [15 0] Instruction register Instruction [15 0] Memory data register Read register 1 Read Read register 2 data 1 Registers Write Read register data 2 Write data A 0 M u x 1 0 1 M u 2 x 3 26 Shift left 2 28 1 u x 2 M PC 0 M u x 1 PC [31-28] 0 M Instruction u x [15 11] 1 0 M u x 1 B 4 Zero ALU ALU result ALUOut 16 Sign extend 32 Shift left 2 ALU control Instruction [5 0] a) (10 pts) Find the execution time (in ns) of the two programs A and B on the single-cycle machine? Write your final answers into the table provided below. CPU Execution time = Instruction count x CPI x Clock Cycle Time Clock-cycle time = delay produced by the longest instruction i.e. lw, which uses all the function units. So clock-cycle time for single cycle = 4+4+4+4+4 = 20 ns. Each program uses 6 instructions and the CPI is always one for a single cycle machine. CPU Execution timeA = CPU Execution timeB = 6x20 = 120 ns. Program A Program B 120 ns 120 ns Execution Time (in ns) on single-cycle machine Grading: 5 for each program: 5 for correct, 1 for trying, 2 if method is right direction, 3 if method is right direction and clock cycle time is correct. b) (10 pts) Find the execution time (in ns) of the two programs A and B on the multi-cycle machine? Write your final answers into the table provided below. CPU Execution time = Instruction count type x CPI x Clock Cycle Time ECSE-2660 Computer Architecture, Networks, and Operating Systems, Spring 2004 Page 6 of 11 Clock-cycle time = delay produced by the longest functional unit i.e. 4 ns for memory or ALUs or registers. CPI for R-type instructions: 4 CPI for beq: 3 CPI for lw: 5 CPI for sw: 4 Program A has 3 R-type & 3 beq. = 4 x (3x4+3x3) = 84 ns Program B has 2 R-type, 1 sw, 2 lw & 1 beq = 4 x (2x5+1x4+2x4+1x3) = 100 ns Program A Program B 84 ns 100 ns Execution Time (in ns) on multi-cycle machine Grading: 5 for each program: 5 for correct, 1 for trying, 2 if method is right direction, 3 if method is right direction and clock cycle time is correct. ECSE-2660 Computer Architecture, Networks, and Operating Systems, Spring 2004 Page 7 of 11 4. (20 points) Processor Pipeline Consider the following instructions executed on the processor of Figure 6.30 in the text. sub lw add beq sw $2, $1, $3, $5, $2, $1, $1 840($3) $1, $2 $3, 20 600($3) 5a. (7 points) Fill in the following table: Instruction sub $2, $1, $1 lw $1, 840($3) $2 $3 - Rd $1 $3 $1 $5 $3 Rs $1 $1 $2 $3 $2 Rt Immediate operand (in decimal) 840 5 600 add $3, $1, $2 beq $5, $3, 20 sw $2, 600($3) Grading: 2points for trying, +1pt for each correct row in the table. 5b. (8 points) Point out and classify (data or structural or control) the hazards in the above code. Four data hazards: - dependency between first and third instructions because of $2 - dependency between second and third instructions because of $1 - dependency between fourth and third instructions because of $3 - dependency between third and fifth instructions because of $3 One control hazard: - dependency between fifth and fourth instructions because execution of the fifth instruction depends on the result of the fourth instruction Grading: 0.5pts for trying; +1pt for each correctly classified hazard; +0.5pts for each correctly identified dependency including register. 5c. (5 points) Suppose that each pipeline stage takes one unit of time. Also assume that a nonpipelined implementation takes 5 units of time per instruction. For the above code, calculate the speedup gained by the pipelined architecture compared to the non-pipelined implementation if the following assumptions hold: Instructions are stalled just sufficiently to avoid hazards; The "beq" on line 4 is not taken; The register file cannot be read and written in the same cycle. Note that there is no forwarding unit! 1 2 3 4 5 6 7 8 9 1011121314151617 ECSE-2660 Computer Architecture, Networks, and Operating Systems, Spring 2004 Page 8 of 11 sub lw add beq sw $2, $1, $3, $5, $2, $1, $1 F D E M W 840($3) F D E M W $1, $2 F D D D E M W $3, 20 F D D D D E M W 600($3) F D D E M W No pipeline: 25 unit time With pipeline: 15 unit time Speed gain: 25/15 = 1.67 Grading: 1pt for trying, +2pts for correct execution time with pipeline, +2pts for correct speedup value. ECSE-2660 Computer Architecture, Networks, and Operating Systems, Spring 2004 Page 9 of 11 5. (25 points) Computer Performance Instruction mix for a machine F1 with a MIPS-like CPU is provided below. The clock rate of F1 is 800MHz. F1 can execute a benchmark named TOUGH in 10 milliseconds. TOUGH is composed of 1 million instructions. The frequencies of instructions in TOUGH are shown below. All instructions are 1 word long. Instruction Type R-type Memory Access Conditional Branch Others Frequency in TOUGH (%) 30 30 25 15 Cycles on F1 5 13 ? 4 Number of Data Words 0 1 0 0 1a. (5 points) What is the CPI for F1? Cycle time for F1 = 1/800MHz = 1.25 x 10-9 seconds/cycle = 1.25 ns/cycle Total clock cycles for the TOUGH program = 10 milliseconds / 1.25 x 10-9 seconds/cycle = 8 million cycles So, the CPI for F1 = 8 million cycles / 1 million instructions = 8 cycles/instruction Grading Policy: 2 points for the correct cycle time; 2 points for the correct number of total clock cycles; 5 points for the entire correct answer. 1b. (8 points) What is the number of cycles that takes to execute Conditional Branches on F1? Let the number of cycles to execute the conditional branch instruction be y. Average CPI = 0.3 x 5 + 0.3 x 13 + 0.25 x y + 0.15 x 4 = 8 y = 8 cycles Grading Policy: 1pt for trying, +5points for forming the correct equation; +2points for the entire correct answer. 1c. (8 points) Let's assume that hardware designers have found a way of reducing number of cycles per Memory Access instruction to 8, and named the new machine as F2. Based on the TOUGH evaluation, how much faster is F2 in comparison to F1? New CPI = 0.3 x 5 + 0.3 x 8 + 0.25 x 8 + 0.15 x 4 = 6.5 cycles/instruction = 6.5 cycles/inst. x 106 inst. x 1.25 x 10-9 seconds/inst. = 8.125 milliseconds Speedup is = 10 milliseconds / 8.125 milliseconds = 1.23 So, F2 is 23% faster than F1. Grading Policy: 3 points for the correct new CPI value; 3 points for the correct new execution time; 8 points for the entire correct answer; else 0. New execution time for TOUGH 1d. (4 points) Assume that it takes 2 months for the hardware designers to produce the design for machine F2. If performance of machines improves by a factor of 1.5 every 12 months, how much time does the company have available to produce F2? 1.5 = (1+i)12 1+i = 1.03437 1.23 = (1.03437)n n = log 1.23 / log 1.03437 = 6.1 months ECSE-2660 Computer Architecture, Networks, and Operating Systems, Spring 2004 Page 10 of 11 Excluding the design time, the time left for the production is: 6.1 2 = 4.1 months Grading Policy: 1 point for getting the correct (1 + i) value; 2 points for the correct value of n; full 4 points for the entire correct answer; else 0. ECSE-2660 Computer Architecture, Networks, and Operating Systems, Spring 2004 Page 11 of 11 ...
View Full Document

Ask a homework question - tutors are online