583L20 - EECS 583 Class 20 Research Topic 2 Stream Compilation Stream Graph Modulo Scheduling University of Michigan Guest Speaker Today Daya

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon
EECS 583 – Class 20 Research Topic 2: Stream Compilation, Stream Graph Modulo Scheduling University of Michigan November 30, 2011 Guest Speaker Today: Daya Khudia
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
- 1 - Announcements & Reading Material This class » “Orchestrating the Execution of Stream Programs on Multicore Platforms,” M. Kudlur and S. Mahlke, Proc. ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, Jun. 2008. Next class – GPU compilation » “Program optimization space pruning for a multithreaded GPU,” S. Ryoo, C. Rodrigues, S. Stone, S. Baghsorkhi, S. Ueng, J. Straton, and W. Hwu, Proc. Intl. Sym. on Code Generation and Optimization, Mar. 2008.
Background image of page 2
- 2 - Stream Graph Modulo Scheduling (SGMS) Coarse grain software pipelining » Equal work distribution » Communication/computation overlap » Synchronization costs Target : Cell processor » Cores with disjoint address spaces » Explicit copy to access remote data DMA engine independent of PEs Filters = operations, cores = function units SPU 256 KB LS MFC(DMA) SPU 256 KB LS MFC(DMA) SPU 256 KB LS MFC(DMA) EIB PPE (Power PC) DRAM SPE0 SPE1 SPE7
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
- 3 - Preliminaries Synchronous Data Flow (SDF) [Lee ’87] StreamIt [Thies ’02] int->int filter FIR(int N, int wgts[N]) { work pop 1 push 1 { int i, sum = 0; for(i=0; i<N; i++) sum += peek(i)*wgts[i]; push(sum); pop(); } } Push and pop items from int wgts[N]; wgts = adapt(wgts);
Background image of page 4
- 4 - SGMS Overview PE0 PE1 PE2 PE3 PE0 T 1 T 4 T 4 T 1 ≈ 4 DMA DMA DMA DMA DMA DMA DMA DMA Prologue Epilogue
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
- 5 - SGMS Phases Fission + Processor assignment Stage assignment Code generation Load balance Causality DMA overlap
Background image of page 6
- 6 - Processor Assignment: Maximizing Throughputs i ij j ij II a i w a ) ( 1 for all filter i = 1, …, N for all PE j = 1,…,P Minimize II E W: 20 W: 20 W: 20 W: 30 W: 50 W: 30 B C D F A A D B C E F Minimum II: 50 Balanced workload! Maximum throughput T2 = 50 A B C D E F PE0 T1 = 170 T1/T2 = 3. 4 PE0 PE1 PE2 PE3 Four Processing Elements • Assigns each filter to a processor PE1 PE0 PE2 PE3 W: workload
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
- 7 - Need More Than Just Processor Assignment Assign filters to processors » Goal : Equal work distribution Graph partitioning?
Background image of page 8
Image of page 9
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 12/26/2011 for the course EECS 583 taught by Professor Flinn during the Fall '08 term at University of Michigan.

Page1 / 30

583L20 - EECS 583 Class 20 Research Topic 2 Stream Compilation Stream Graph Modulo Scheduling University of Michigan Guest Speaker Today Daya

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online