583L10 - EECS 583 Class 10 ILP Optimization and Intro. to...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: EECS 583 Class 10 ILP Optimization and Intro. to Code Generation University of Michigan October 10, 2011 - 1 - Reading Material + Announcements List of references for each project area available on course website Places to find papers on compilers Conferences: PLDI, CGO, VEE, CC, Micro, Asplos Journals: ACM Transactions on Computer Architecture and Code Optimization, Software Practice & Experience, IEEE Transactions on Computers, ACM Transactions on Programming Languages and Systems Todays class Machine Description Driven Compilers for EPIC Processors, B. Rau, V. Kathail, and S. Aditya, HP Technical Report, HPL- 98-40, 1998. (long paper but informative) Next class The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Processors, P. Chang et al., IEEE Transactions on Computers, 1995, pp. 353-370. - 2 - From Last Time: Class Problem Optimize this applying induction var str reduction r5 = r5 + 1 r11 = r5 * 2 r10 = r11 + 2 r12 = load (r10+0) r9 = r1 << 1 r4 = r9 - 10 r3 = load(r4+4) r3 = r3 + 1 store(r4+0, r3) r7 = r3 << 2 r6 = load(r7+0) r13 = r2 - 1 r1 = r1 + 1 r2 = r2 + 1 r1 = 0 r2 = 0 r13, r12, r6, r10 liveout r5 = r5 + 1 r111 = r111 + 2 r11 = r111 r10 = r11 + 2 r12 = load (r10+0) r9 = r109 r4 = r9 - 10 r3 = load(r4+4) r3 = r3 + 1 store(r4+0, r3) r7 = r3 << 2 r6 = load(r7+0) r13 = r113 r1 = r1 + 1 r109 = r109 + 2 r2 = r2 + 1 r113 = r113 + 1 r1 = 0 r2 = 0 r111 = r5 * 2 r109 = r1 << 1 r113 = r2 -1 r13, r12, r6, r10 liveout Note, after copy propagation, r10 and r4 can be strength reduced as well. - 3 - From Last Time: Class Problem Assume: + = 1, * = 3 0 r1 0 r2 0 r3 1 r4 2 r5 0 r6 operand arrival times r10 = r1 * r2 r11 = r10 + r3 r12 = r11 + r4 r13 = r12 r5 r14 = r13 + r6 Back susbstitute Re-express in tree-height reduced form Account for latency and arrival times Back substituted expression: r14 = r1*r2+r3+r4-r5+r6 Re-associate and parenthesize to reduce height: r14 = ((r1*r2)+(((r3+r6)+r4)-r5)) Final assembly code: t1 = r1 * r2 t2 = r3 + r6 t3 = t2 + r4 t4 = t3-r5 r14 =t1 + t4 - 4 - Optimizing Unrolled Loops r1 = load(r2) r3 = load(r4) r5 = r1 * r3 r6 = r6 + r5 r2 = r2 + 4 r4 = r4 + 4 if (r4 < 400) goto loop loop: r1 = load(r2) r3 = load(r4) r5 = r1 * r3 r6 = r6 + r5 r2 = r2 + 4 r4 = r4 + 4 r1 = load(r2) r3 = load(r4) r5 = r1 * r3 r6 = r6 + r5 r2 = r2 + 4 r4 = r4 + 4 r1 = load(r2) r3 = load(r4) r5 = r1 * r3 r6 = r6 + r5 r2 = r2 + 4 r4 = r4 + 4 if (r4 < 400) goto loop iter1 iter2 iter3 Unroll = replicate loop body n-1 times. Hope to enable overlap of operation execution from different iterations Not possible! loop: unroll 3 times - 5 - Register Renaming on Unrolled Loop r1 = load(r2) r3 = load(r4) r5 = r1 * r3 r6 = r6 + r5 r2 = r2 + 4 r4 = r4 + 4 r1 = load(r2) r3 = load(r4) r5 = r1 * r3 r6 = r6 + r5 r2 = r2 + 4 r4 = r4 + 4 r1 = load(r2) r3 = load(r4) r5 = r1 * r3 r6 = r6 + r5 r2 = r2 + 4 r4 = r4 + 4 if (r4 < 400) goto loop iter1 iter2 iter3 loop: r1 = load(r2) r3 = load(r4)...
View Full Document

Page1 / 29

583L10 - EECS 583 Class 10 ILP Optimization and Intro. to...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online