This preview shows pages 1–11. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: 1 CS 140 : Nonnumerical Examples with Cilk++ • Divide and conquer paradigm for Cilk++ • Quicksort • Mergesort Thanks to Charles E. Leiserson for some of these slides 2 TP = execution time on P processors T1 = work T∞ = span * * Also called criticalpath length or computational depth . Speedup on p processors ∙ T1/Tp Parallelism ∙ T1/T ∞ Work and Span (Recap) 3 Scheduling ∙ Cilk++ allows the programmer to express potential parallelism in an application. ∙ The Cilk++ scheduler maps strands onto processors dynamically at runtime. ∙ Since online schedulers are complicated, we’ll explore the ideas with an offline scheduler. … Memory I/O P P P P $ $ $ 4 Greedy Scheduling I DEA : Do as much as possible on every step. Definition: A strand is ready if all its predecessors have executed. 5 Greedy Scheduling I DEA : Do as much as possible on every step. Definition: A strand is ready if all its predecessors have executed. Complete step ∙ ≥ P strands ready. ∙ Run any P . P = 3 6 Greedy Scheduling I DEA : Do as much as possible on every step. Definition: A strand is ready if all its predecessors have executed. Complete step ∙ ≥ P strands ready. ∙ Run any P . P = 3 Incomplete step ∙ < P strands ready. ∙ Run all of them. 7 Theorem [G68, B75, BL93] . Any greedy scheduler achieves TP £ T1/P + T∞ . Analysis of Greedy Proof . ∙ # complete steps £ T1/P , since each complete step performs P work. ∙ # incomplete steps £ T∞ , since each incomplete step reduces the span of the unexecuted dag by 1 . ■ P = 3 8 Optimality of Greedy Corollary. Any greedy scheduler achieves within a factor of 2 of optimal. Proof . Let TP* be the execution time produced by the optimal scheduler. Since TP* ≥ max{T1/P, T∞} by the Work and Span Laws , we have TP ≤ T1/P + T∞ ≤ 2 max{T1/P, T∞} ⋅ ≤ 2TP* . ■ 9 Linear Speedup Corollary. Any greedy scheduler achieves nearperfect linear speedup whenever P T1/T∞ ≪ . Proof. Since P T1/T∞ ≪ is equivalent to T∞ T1/P ≪ , the Greedy Scheduling Theorem gives us TP ≤ T1/P + T∞ ≈ T1/P . Thus, the speedup is T1/TP P ≈ . ■ Definition. The quantity T1/PT∞ is called the parallel slackness . 10 Sorting ∙ Sorting is possibly the most frequently executed operation in computing!...
View
Full
Document
This note was uploaded on 12/27/2011 for the course CMPSC 140 taught by Professor Gilbert during the Fall '11 term at UCSB.
 Fall '11
 GILBERT

Click to edit the document details