l8 - Lecture 8 Software Pipelining I . Introduction II....

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon
Carnegie Mellon Lecture 8 Software Pipelining I. Introduction II. Problem Formulation III. Algorithm Reading: Chapter 10.5 – 10.6 M. Lam CS243: Software Pipelining 1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Carnegie Mellon I. Example of DoAll Loops Machine: Per clock: 1 read , 1 write , 1 (2-stage) arithmetic op , with hardware loop op and auto-incrementing addressing mode. Source code: For i = 1 to n D[i] = A[i] * B[i]+ c Code for one iteration: 1. LD R5,0(R1++) 2 . L D R 6 , 0 ( R 2 + + ) 3 . M U L R 7 , R 5 , R 6 4. 5. ADD R8,R7,R4 6 . 7 . S T 0 ( R 3 + + ) , R 8 No parallelism in basic block M. Lam CS243: Software Pipelining 2
Background image of page 2
Carnegie Mellon Unrolling 1.L: LD 2. LD 3. LD 4. MUL LD 5. MUL LD 6. ADD LD 7. ADD LD 8. ST MUL LD 9. MUL 10. ST ADD 11. ADD 12. ST 13. ST BL (L) Let u be the degree of unrolling : Length of u iterations = 7+2( u -1) Execution time per source iteration = (7+2( u -1)) / u = 2 + 5/ u M. Lam CS243: Software Pipelining 3
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Carnegie Mellon Software Pipelined Code 1. LD 2. LD 3. MUL LD 4. LD 5. MUL LD 6. ADD LD 7. MUL LD 8. ST ADD LD 9. MUL LD 10. ST ADD LD 11. MUL 12. ST ADD 13. 14. ST ADD 15. 16. ST Unlike unrolling, software pipelining can give optimal result. Locally compacted code may not be globally optimal DOALL: Can fill arbitrarily long pipelines with infinitely many iterations M. Lam CS243: Software Pipelining 4
Background image of page 4
Carnegie Mellon Example of DoAcross Loop Loop: Sum = Sum + A[i]; B[i] = A[i] * c; Software Pipelined Code 1 . L D 2 . M U L 3. ADD LD 4. ST MUL 5 . A D D 6 . S T Doacross loops Recurrences can be parallelized Harder to fully utilize hardware with large degrees of parallelism M. Lam CS243: Software Pipelining 5 1. LD 2. MUL 3. ADD 4. ST
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Carnegie Mellon II. Problem Formulation Goals: maximize throughput small code size Find: an identical relative schedule S(n) for every iteration a constant initiation interval (T) such that the initiation interval is minimized Complexity: NP-complete in general M. Lam CS243: Software Pipelining 6 S 0 LD 1 MUL 2 ADD LD 3 ST MUL ADD ST T=2
Background image of page 6
Image of page 7
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 03/12/2012.

Page1 / 21

l8 - Lecture 8 Software Pipelining I . Introduction II....

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online