# 583L13 - EECS 583 Class 13 Software Pipelining University...

This preview shows pages 1–8. Sign up to view the full content.

EECS 583 – Class 13 Software Pipelining University of Michigan October 24, 2011

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
- 1 - Announcements + Reading Material No class on Wednesday (reserved for project proposals) Each group needs to signup for a 15 min slot this week » Signup sheet on my door (4633 CSE) » Slots on Tues, Wednes, Thurs and Fri » Informal class project proposal discussion Homework 2 deadline » Today at midnight, or tomorrow midnight if you have not used your late day » Daya will have office hours today 3-5pm if you are stuck Today’s class reading » “Iterative Modulo Scheduling: An Algorithm for Software Pipelining Loops”, B. Rau, MICRO-27, 1994, pp. 63-74. Wed class reading » "Code Generation Schema for Modulo Scheduled Loops", B. Rau, M. Schlansker, and P. Tirumalai, MICRO-25, Dec. 1992.
- 2 - Class Problem from Last Time 1: r1 = r7 + 4 2: branch p1 Exit1 3: store (r1, -1) 4: branch p2 Exit2 5: r2 = load(r7) 6: r3 = r2 – 4 7: branch p3 Exit3 8: r4 = r3 / r8 {r4} {r1} {r4, r8} {r2} 1. Starting with the graph assuming restricted speculation, what edges can be removed if general speculation support is provided? 2. With more renaming, what dependences could be removed? 2 3 8 4 1 7 6 5 Edges not drawn: 2 4, 2 7, 4 7 There is no edge from 3 to 5 if you assume 32-bit load/store instructions since r1 and r7 are 4 different. . Answer 1: 2 5, 4 5 since r2 is not live out; 4 8, 7 8 since r4 is not live out, but 2 8 must remain; Answer 2: 2 8

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
- 3 - Class Problem from Last Time 1: r1 = r7 + 4 2: branch p1 Exit1 3: store (r1, -1) 4: branch p2 Exit2 5: r2 = load(r7) 6: r3 = r2 – 4 7: branch p3 Exit3 8: r4 = r3 / r8 {r4} {r1} {r4, r8} {r2} 1. Move ops 5, 6, 8 as far up in the SB as possible assuming sentinel speculation support 2. Insert the necessary checks and recovery code (assume ld, st, and div can cause exceptions) 5(S): r2 = load(r7) 6(S): r3 = r2 – 4 1: r1 = r7 + 4 2: branch p1 Exit1 8(S): r4 = r3 / r8 3: store (r1, -1) 4: branch p2 Exit2 9: check_ex(r3) 7: branch p3 Exit3 10: check_ex(r4) {r4} {r1} {r4, r8} {r2} 8’’: r4 = r3 / r8 12: jump back2 back2: back1: 5’: r2 = load(r7) 6’: r3 = r2 – 4 8’(S): r4 = r3 / r8 12: jump back1
- 4 - Review: Overlap Iterations Using Pipelining 1 2 3 n Iteration time 1 2 3 n With hardware pipelining, while one instruction is in fetch, another is in decode, another in execute. Same thing here, multiple iterations are processed simultaneously, with each instruction in a separate stage. 1 iteration still takes the same time, but time to complete n iterations is reduced!

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
- 5 - A B A C B A D C B A D C B A D C B A D C B D C D Review: A Software Pipeline A B C D Loop body with 4 ops Prologue - fill the pipe Epilogue - drain the pipe Kernel – steady state time Steady state: 4 iterations executed simultaneously, 1 operation from each iteration. Every cycle, an iteration starts and finishes when the pipe is full .
- 6 - Creating Software Pipelines Lots of software pipelining techniques out there

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

## 583L13 - EECS 583 Class 13 Software Pipelining University...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online