speedupBasic

# S 005 thus on 64 processors 64 1 64005 64

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: mount of useful work can be done, except on P-1 processors there is time wasted due to the sequential part that must be subtracted out from the useful work. s = 0.05, thus on 64 processors Ψ = 64 + (1-64)(0.05) = 64 - 3.15 = 60.85 Tuesday, February 14, 12 Another example You have money to buy a 16K (16,384) core distributed memory system, but you only want to spend the money if you can get decent performance on your application. Allowing the problem to scale with increasing numbers of processors, what can s be to get a scaled speedup of 15,000 on the machine, i.e. what percentage of the application's parallel execution time can be devoted to inherently serial computation? 15,000 = 16,384 - 16,383s ⇒s = 1,384 / 16,383 ⇒s = 0.084 Tuesday, February 14, 12 Comparison with Amdahl’s Law result ψ(n,p) ≤ p + (1 - p)s 15,000 = 16,384 - 16,383s ⇒s = 1,384 / 16,383 ⇒s = 0.084 ! Tuesday, February 14, 12 Comparison with Amdahl’s Law result ψ(n,p) ≤ p + (1 - p)s 15,000 = 16,384 - 16,383s ⇒s = 1,384 / 16,383 ⇒s = 0.084 But then Amdahl's law doesn't allow the problem size to scale. Tuesday, February 14, 12 ! Non-scaled performance σ(1) = σ(p); ϕ(1) = ϕ(p) Chart 7 30 23 Work is constant, speedup levels off at ~256 processors 15 8 0 1 serial Tuesday, February 14, 12 2 4 8 16 32 64 128 256 par work non-scaled 512 1024 2048 4096 sp non-scaled performance σ(1) = σ(p); p⋅ϕ(1) = ϕ(p) Chart 8 90000 67500 45000 22500 0 1 serial Tuesday, February 14, 12 2 4 8 16 32 64 128 256 par work scaled 512 1024 2048 4096 Even though it is hard to see, as the parallel work increases proportionally to the number of processors, the speedup scales proportionally to the number of processors speedup scaled performance σ(1) = σ(p); p⋅ϕ(1) = ϕ(p) Chart 8 90000 Note that the parallel work may (and usually does) increase faster than the problem size 67500 45000 22500 0 1 serial Tuesday, February 14, 12 2 4 8 16 32 64 128 256 par work scaled 512 1024 2048 4096 speedup scaled Scaled speedups, log scales σ(1) = σ(p); p⋅ϕ(1) = ϕ(1) Chart 9 20 The same chart as before, except log scales for parallel work and speedup. 15 10 5 0 Scaled speedup close to ideal 1 2 serial Tuesday, February 14, 12 4 8 16 32 64 128 256 log 2 par work scaled 512 1024 2048 4096 log 2 scaled speedup Scaled and non-scaled speedups, log2 axis for scaled σ(1) = σ(p); p⋅ϕ(1) = ϕ(p) 30 23 15 8 0 1 2 4 8 16 serial log 2 par work scaled log 2 scaled speedup Tuesday, February 14, 12 32 64 128 256 512 1024 2048 4096 par work non-scaled sp non-scaled The effect of un-modeled communication 150000 This is clearly an important effect that is not being modeled. 112500 75000 speedup scaled scaled w/communication 37500 0 Tuesday, February 14, 12 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 The Karp-Flatt Metric • Takes into account communication costs • T(n,p) = σ(n) + ϕ(n)/p + κ(n,p) • Serial time T(n,1) = σ(n) + ϕ(n) • The experimentally determined serial fraction e of the parallel computation is e = (σ(n) + κ(n,p))/T(n,1) Tuesday, February 14, 12 e = (σ(n) + κ(n,p))/T(n,1) Essentially a measure of total work • e is the fraction of the one processor execution time that is serial on all p processors • Communication cost m...
View Full Document

Ask a homework question - tutors are online