This preview shows pages 1–3. Sign up to view the full content.
HW1 Solutions
Problem 1
1.
Given the parameters of Problem 6 (note that int =35% and shift=5% to fix typo in book
problem), consider a strengthreducing optimization that converts multiplies by a compile
time constant into a sequence of shifts and adds. For this instruction mix, 50% of the multi
plies can be converted to shiftadd sequences with an average length of three instructions.
Assuming a fixed frequency, compute the change in instructions per program, cycles per
instruction, and overall program speedup.
There are 5% more instructions per program, the CPI is reduced by 12.5% to 1.88, and over
all speedup is 2.15/1.975 = 1.089 or 8.9%.
2.
Recent processors like the Pentium 4 processors do not implement singlecycle shifts. Given
the scenario of Problem 7, assume that
s
= 50% of the additional integer and shift instruc
tions introduced by strength reduction are shifts, and shifts now take four cycles to execute.
Recompute the cycles per instruction and overall program speedup. Is strength reduction still
a good optimization?
Speedup is now a slowdown: 2.15/2.2375 = 0.96 or 4% slowdown, hence strength reduction
is a bad idea.
3.
Given the assumptions of Problem 8, solve for the breakeven ratio
s
(percentage of addi
tional instructions that are shifts). That is, find the value of
s
(if any) for which program per
formance is identical to the baseline case without strength reduction (Problem 6).
2.15 = (0.15+0.50+0.60+0.35+0.05x4+0.25 + (1s)x0.075x1 + sx0.075x4
0.025 = 0.225s => s = 0.111 = 11.1%
TABLE
1
CPI computation
Type
Old Mix
New Mix
Cost
CPI
store
15%
15%
1
0.15
load
25%
25%
2
0.50
branch
15%
15%
4
0.60
integer & shift
40%
47.5%
1
0.475
multiply
5%
2.5%
10
0.25
Total
100%
105%
1.975/105% = 1.88
TABLE
2
CPI computation
Type
Old Mix
New Mix
Cost
CPI
store
15%
15%
1
0.15
load
25%
25%
2
0.50
branch
15%
15%
4
0.60
integer
35%
38.75%
1
0.3875
shift
5%
8.75%
4
0.35
multiply
5%
2.5%
10
0.25
Total
100%
105%
2.2375/105% = 2.131
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document 4.
Given the assumptions of Problem 8, assume you are designing the shift unit on the Pentium
4 processor. You have concluded there are two possible implementation options for the shift
unit: 4cycle shift latency at a frequency of 2 GHz, or 2cycle shift latency at 1.9 GHz.
Assume the rest of the pipeline could run at 2 GHz, and hence the 2cycle shifter would set
the entire processor’s frequency to 1.9 GHz. Which option will provide better overall perfor
mance?
4cycle shifter: Time per program = 1.05 IPP x 2.2375 CPI x 1/2.0GHz = 1.17e9
2cycle shifter: Time per program = 1.05 IPP x (2.23750.175) CPI x 1/1.9GHz = 1.13e9
Hence, 2cycle shifter is a better option if strength reduction is applied.
If there is no strength reduction (back to Problem 6):
4cycle shifter: Time per program = 1.00 IPP x 2.30 CPI x 1/2.0GHz = 1.150e9 s
2cycle shifter: Time per program = 1.00 IPP x 2.20 CPI x 1/1.9GHz = 1.157e9
Hence, the 4cycle shifter is a better option.
Overall, the best choice is still strength reduc
tion with a 2cycle shifter @1.9GHz.
Problem 2
5.
Consider that you would like to add a
loadimmediate
instruction to the TYP instruction set
and pipeline. This instruction extracts a 16bit immediate value from the instruction word,
signextends the immediate value to 32 bits, and stores the result in the destination register
specified in the instruction word. Since the extraction and signextension can be accom
This is the end of the preview. Sign up
to
access the rest of the document.
This note was uploaded on 03/02/2012 for the course ECE 752 taught by Professor Profgurisohi during the Spring '09 term at Wisconsin.
 Spring '09
 PROFGURISOHI

Click to edit the document details