cs140-multicoresort

cs140-multicoresort - 1 CS 140 : Non-numerical Examples...

Info iconThis preview shows pages 1–11. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 CS 140 : Non-numerical Examples with Cilk++ • Divide and conquer paradigm for Cilk++ • Quicksort • Mergesort Thanks to Charles E. Leiserson for some of these slides 2 TP = execution time on P processors T1 = work T∞ = span * * Also called critical-path length or computational depth . Speedup on p processors ∙ T1/Tp Parallelism ∙ T1/T ∞ Work and Span (Recap) 3 Scheduling ∙ Cilk++ allows the programmer to express potential parallelism in an application. ∙ The Cilk++ scheduler maps strands onto processors dynamically at runtime. ∙ Since on-line schedulers are complicated, we’ll explore the ideas with an off-line scheduler. … Memory I/O P P P P $ $ $ 4 Greedy Scheduling I DEA : Do as much as possible on every step. Definition: A strand is ready if all its predecessors have executed. 5 Greedy Scheduling I DEA : Do as much as possible on every step. Definition: A strand is ready if all its predecessors have executed. Complete step ∙ ≥ P strands ready. ∙ Run any P . P = 3 6 Greedy Scheduling I DEA : Do as much as possible on every step. Definition: A strand is ready if all its predecessors have executed. Complete step ∙ ≥ P strands ready. ∙ Run any P . P = 3 Incomplete step ∙ < P strands ready. ∙ Run all of them. 7 Theorem [G68, B75, BL93] . Any greedy scheduler achieves TP £ T1/P + T∞ . Analysis of Greedy Proof . ∙ # complete steps £ T1/P , since each complete step performs P work. ∙ # incomplete steps £ T∞ , since each incomplete step reduces the span of the unexecuted dag by 1 . ■ P = 3 8 Optimality of Greedy Corollary. Any greedy scheduler achieves within a factor of 2 of optimal. Proof . Let TP* be the execution time produced by the optimal scheduler. Since TP* ≥ max{T1/P, T∞} by the Work and Span Laws , we have TP ≤ T1/P + T∞ ≤ 2 max{T1/P, T∞} ⋅ ≤ 2TP* . ■ 9 Linear Speedup Corollary. Any greedy scheduler achieves near-perfect linear speedup whenever P T1/T∞ ≪ . Proof. Since P T1/T∞ ≪ is equivalent to T∞ T1/P ≪ , the Greedy Scheduling Theorem gives us TP ≤ T1/P + T∞ ≈ T1/P . Thus, the speedup is T1/TP P ≈ . ■ Definition. The quantity T1/PT∞ is called the parallel slackness . 10 Sorting ∙ Sorting is possibly the most frequently executed operation in computing!...
View Full Document

This note was uploaded on 12/27/2011 for the course CMPSC 140 taught by Professor Gilbert during the Fall '11 term at UCSB.

Page1 / 39

cs140-multicoresort - 1 CS 140 : Non-numerical Examples...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online