hw2_sol_rev1 - EE108B Winter 2003-2004 Handout #20 EE108b...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
EE108B Winter 2003-2004 Handout #20 EE108b – Solution to Problem Set #2 (Total 56 points) Due Tues Feb 3, 5 PM in Gates 408 – No Late Day 1. (Total 11 points) The following C program is compiled into MIPS objects with no optimization and with –O2 optimization. int A[100], B[100]; main() { int i; int c = 10; for (i=0; i < 100; i++) A[i] = B[i] + c; } Unoptimized Code Optimized with –O2 0x0: lui gp, 0 0x4: addiu gp, gp, 0 0x8: addu gp, gp, t9 0xc: addiu sp, sp, -24 0x10: sw gp, 0(sp) 0x14: sw fp, 20(sp) 0x18: sw gp, 16(sp) 0x1c: move fp, sp 0x20: li v0, 10 0x24: sw v0, 12(fp) 0x28: sw zero, 8(fp) 0x2c: lw v0, 8(fp) 0x30: slti v1, v0, 100 0x34: bne v1, zero, 0x3c 0x38: b 0x88 0x3c: lw v0, 8(fp) 0x40: move v1, v0 0x44: sll v0, v1, 2 0x48: lw v1, 0(gp) 0x4c: addu v0, v0, v1 0x50: lw v1, 8(fp) 0x54: move a0, v1 0x58: sll v1, a0, 2 0x5c: lw a0, 0(gp) 0x60: addu v1, v1, a0 0x64: lw a0, 0(v1) 0x68: lw v1, 12(fp) 0x6c: addu a0, a0, v1 0x70: sw a0, 0(v0) 0x74: lw v1, 8(fp) 0x78: addiu v0, v1, 1 0x7c: move v1, v0 0x80: sw v1, 8(fp) 0x0: lui gp, 0 0x4: addiu gp, gp, 0 0x8: addu gp, gp, t9 0xc: li a2, 10 0x10: move a1, zero 0x14: lw a0, 0(gp) 0x18: lw v1, 0(gp) 0x1c: lw v0, 0(v1) 0x20: addiu v1, v1, 4 0x24: addiu a1, a1, 1 0x28: addu v0, v0, a2 0x2c: sw v0, 0(a0) 0x30: slti v0, a1, 100 0x34: addiu a0, a0, 4 0x38: bne v0, zero, 0x1c 0x3c: jr ra 1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
EE108B Winter 2003-2004 Handout #20 0x84: b 0x2c 0x88: move sp, s8 0x8c: lw fp, 20(sp) 0x90: addiu sp, sp, 24 0x94: jr ra a. (5 points) Assign a mark for each point below. Please identify the optimizations used by the compiler to transform the code from the unoptimized version into the optimized one and point out where they are applied. Solution: Copy propagation (1 point): Instructions 0x40, 0x54 and 0x7c are removed. Arithmetic identity/Algebraic simplification (1 point): Since (i+1)*4 == (i*4)+4, instructions 0x40 and 0x4c, and 0x54 and 0x60 that computes the new A[i] and B[i], are transformed to 0x34 and 0x20 respectively. Leaf routine optimization (1point): It is a leaf routine and there is no need to save and restore fp and gp. There is also no need to store i and c on the stack since they are only used locally. As a result no stack space need to be allocated. Thus instructions 0xc–0x18, 0x24, 0x3c, 0x50, 0x68, 0x74, 0x80 and 0x88-0x90 in the unoptimized code are removed, and 0x28-0x2c are reduced to instruction 0x10 in the optimized version. Loop invariant code Motion (1 point) : Since the arrays A and B are in static memory, instructions 0x48 and 0x5c that load the base address of A and B are moved above the loop (instructions 0x14- 0x18 in the optimized code) to reduce the number of dynamic instructions. Loop inversion (1 point): Since the lower and upper bound of the for loop are constants, the loop can be transformed into a while loop that has a lower loop overhead. Thus, instructions 0x30-0x38 and 0x84 are transformed to 0x30 and 0x38 in the optimized version. Note: The exact terms are not important, as long as the description of the
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 11/18/2011 for the course EE 108A taught by Professor Dally during the Winter '04 term at Stanford.

Page1 / 12

hw2_sol_rev1 - EE108B Winter 2003-2004 Handout #20 EE108b...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online