This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Summer 2009 Prof. Schimmel ECE 3055 Chapter 1 Solutions S3 1.2.4 2 microseconds from cache ==> 20 microseconds from DRAM. 20 microseconds from DRAM ==> 2 seconds from magnetic disk. 20 microseconds from DRAM ==> 2 ms from flash memory. Computer Architecture and OS
Homework 1 Solution Problem 1.3 Solution 1.3
1.3.1 P2 has the highest performance performance of P1 (instructions/sec) = 2 109/1.5 = 1.33 109 performance of P2 (instructions/sec) = 1.5 109/1.0 = 1.5 109 performance of P3 (instructions/sec) = 3 109/2.5 = 1.2 109 1.3.2 No. cycles = time clock rate cycles(P1) = 10 2 109 = 20 109 s cycles(P2) = 10 1.5 109 = 15 109 s cycles(P3) = 10 3 109 = 30 109 s time = (No. instr. CPI)/clock rate, then No. instructions = No. cycles/CPI instructions(P1) = 20 109/1.5 = 13.33 109 instructions(P2) = 15 109/1 = 15 109 instructions(P3) = 30 109/2.5 = 12 109 1.3.3 timenew = timeold 0.7 = 7 s CPI = CPI 1.2, then CPI(P1) = 1.8, CPI(P2) = 1.2, CPI(P3) = 3 = No. instr. CPI/time, then (P1) = 13.33 109 1.8/7 = 3.42 GHz (P2) = 15 109 1.2/7 = 2.57 GHz (P3) = 12 109 3/7 = 5.14 GHz 1.3.4 IPC = 1/CPI = No. instr./(time clock rate) IPC(P1) = 1.42 IPC(P2) = 2 IPC(P3) = 3.33 1.3.5 Timenew/Timeold = 7/10 = 0.7. So new = old/0.7 = 1.5 GHz/0.7 = 2.14 GHz. 1.3.6 Timenew/Timeold = 9/10 = 0.9. So Instructionsnew = Instructionsold 0.9 = 30 109 0.9 = 27 109. Chapter 1 Solutions Chapter 1 Solutions S15 S5 Problem 1.5 = Clock rate 106/CPI 1.14.3 MIPS Solution 1.5
MIPS(P1) 1.5.1 = 4 10
9 106/1.25 = 3200 MIPS(P2) = 3 109 106/0.75 = 4000 a. 1G, 0.75G inst/s MIPS(P1) < MIPS(P2), performance(P1) < performance(P2) in this case (from 1.14.1)
b. 1G, 1.5G inst/s 1.14.4 1.5.2
a. a.
b. b. FP op = 106 0.4 = 4 105, clock cylesfp = CPI No. FP instr. = 4 105 P2 is 1.33 times faster than P1 Tfp = 4 105 0.33 109 = 1.32 104 then MFLOPS = 3.03 103 P1 is 1.03 times faster than P2 FP op = 3 106 0.4 = 1.2 106, clock cylesfp = CPI No. FP instr. = 0.70 1.2 106
6 T 1.5.3fp = 0.84 10 0.33 109 = 2.77 104 then MFLOPS = 4.33 103 a. P2 1.14.5 is 1.31clock faster thanFP cycles + CPI(L/S) No. instr. (L/S) + CPI(Branch) CPU times cycles = P1 b. P1 is 1.00 times No. instr. (Branch) faster than P2 1.5.4 clock cycles = 4 105 + 0.75 5 105 + 1.5 105 = 9.25 105 CPU
a. b.
5 9 4 Tcpu =s 2.05 9.25 10 0.33 10 = 3.05 10 6 4 6 3 MIPS = 10 /(3.05 10 10 ) = 3.2 10 1.93 s a. 5 105 L/S instr., 4 105 FP instr. and 105 Branch instr. b. 1.2 106 L/S instr., 1.2 106 FP instr. and 0.6 106 Branch instr.
6 CPU 1.5.5 clock cycles = 0.84 10
a. b. + 1.25 1.2 106 + 1.25 0.6 106 = 3.09 106 Tcpu = 3.09 106 0.33 109 = 1.01 103 0.71 s3 106/(1.01 103 106) = 2.97 103 MIPS =
0.86 s 1.14.6 1.5.6
a. a. b. b. performance = 1/Tcpu = 3.2 103 1.30 times faster performance = 1/Tcpu = 9.9 102 1.40 times faster Problem 1.15
1.15.1
Chapter 1
b. b. a. a. Solution 1.6 1.6.1 Solution 1.15 The second program has the higher performance and the higher MFLOPS figure, but the first program has the higher MIPS figure. Compiler A CPI Compiler B CPI 1.00 1.17 Tfp = 35 0.8 = 28 s, Tp1 = 28 + 85 + 50 + 30 = 193 s. Reduction: 3.5% 0.80 0.58 Tfp = 50 0.8 = 40 s, Tp4 = 40 + 80 + 50 + 30 = 200 s. Reduction: 4.7% Solutions 1.15.2
a. b. Tp1 = 200 0.8 = 160 s, Tfp + Tl/s + Tbranch = 115 s, Tint = 45 s. Reduction time INT: 47% Tp4 = 210 0.8 = 168 s, Tfp + Tl/s + Tbranch = 130 s, Tint = 38 s. Reduction time INT: 52.4% 1.15.3
a. b. Tp1 = 200 0.8 = 160 s, Tfp + Tint + Tl/s = 170 s. NO Tp4 = 210 0.8 = 168 s, Tfp + Tint + Tl/s = 180 s. NO 1.15.4 Clock cyles = CPIfp No. FP instr. + CPIint No. INT instr. + CPIl/s No. L/S instr. + CPIbranch No. branch instr. 1.15.2
a. b. Tp1 = 200 0.8 = 160 s, Tfp + Tl/s + Tbranch = 115 s, Tint = 45 s. Reduction time INT: 47% Tp4 = 210 0.8 = 168 s, Tfp + Tl/s + Tbranch = 130 s, Tint = 38 s. Reduction time INT: 52.4% 1.15.3
a. b. Tp1 = 200 0.8 = 160 s, Tfp + Tint + Tl/s = 170 s. NO Tp4 = 210 0.8 = 168 s, Tfp + Tint + Tl/s = 180 s. NO 1.15.4 Clock cyles = CPIfp No. FP instr. + CPIint No. INT instr. + CPIl/s No. L/S instr. + CPIbranch No. branch instr. Tcpu = clock cycles/clock rate = clock cycles/2 109
a. b. 1 processor: clock cycles = 8192; Tcpu = 4.096 s 8 processors: clock cycles = 1024; Tcpu = 0.512 s To half the number of clock cycles by improving the CPI of FP instructions: CPIimproved fp No. FP instr. + CPIint No. INT instr. + CPIl/s No. L/S instr. + CPIbranch No. branch instr. = clock cycles/2 CPIimproved fp = (clock cycles/2  (CPIint No. INT instr. + CPIl/s No. L/S instr. + CPIbranch No. branch instr.))/No. FP instr.
a. b. 1 processor: CPIimproved fp = (4096 7632)/560 < 0 ==> not possible 8 processors: CPIimproved fp = (512 944)/80 < 0 ==> not possible 1.15.5 Using the clock cycle data from 1.15.4: To half the number of clock cycles improving the CPI of L/S instructions: CPIfp No. FP instr. + CPIint No. INT instr. + CPIimproved l/s No. L/S instr. + CPIbranch No. branch instr. = clock cycles/2
Chapter Solutions CPIimproved l/s = (clock cycles/2  (CPIfp No. FP instr. + CPIint No.1INT instr. + CPIbranch No. branch instr.))/No. L/S instr.
a. b. 1 processor: CPIimproved l/s = (4096 3072)/1280 = 0.8 8 processors: CPIimproved l/s = (512 384)/160 = 0.8 S17 1.15.6 Clock cyles = CPIfp No. FP instr. + CPIint No. INT instr. + CPIl/s No. L/S instr. + CPIbranch No. branch instr. Tcpu = clock cycles/clock rate = clock cycles/2 109 CPIint = 0.6 1 = 0.6; CPIfp = 0.6 1 = 0.6; CPIl/s = 0.7 4 = 2.8; CPIbranch = 0.7 2 = 1.4
a. b. 1 processor: Tcpu(before improv.) = 4.096 s; Tcpu(after improv.) = 2.739 s 8 processors: Tcpu(before improv.) = 0.512 s; Tcpu(after improv.) = 0.342 s Solution 1.16
1.16.1 Without reduction in any routine:
a. b. total time 2 proc = 185 ns total time 16 proc = 34 ns ...
View
Full
Document
This note was uploaded on 07/30/2009 for the course ECE 3055 taught by Professor Staff during the Spring '08 term at Georgia Institute of Technology.
 Spring '08
 Staff

Click to edit the document details