speedupBasic

Speedupbasic

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Comments about matrix multiply • Self initialize • Gets rid of I/O bottleneck for timing Tuesday, February 14, 12 Performance analysis Goals are 1. to be able to understand better why your program has the performance it has, and 2. what could be preventing its performance from being better. Tuesday, February 14, 12 The typical speedup curve - fixed problem size Speedup Number of processors Tuesday, February 14, 12 The typical speedup curve - large, fixed number of processors Speedup Problem size Tuesday, February 14, 12 Speedup • • Tuesday, February 14, 12 Parallel time TP(p) is the time it takes the parallel form of the program to run on p processors Sequential time Ts is more problematic a. Can be TP(1), but this carries the overhead of extra code needed for parallelization. Even with one thread, OpenMP code will call libraries for threading. One way to “cheat” on benchmarking. b. Should be the best possible sequential implementation: tuned, good or best compiler switches, etc. What is execution time? • Execution time can be modeled as the sum of: 1. Inherently sequential computation σ(n) 2. Potentially parallel computation ϕ(n) 3. Communication time κ(n,p) Tuesday, February 14, 12 Components of execution time Sequential time execution time number of processors Tuesday, February 14, 12 Components of execution time Parallel time execution time number of processors Tuesday, February 14, 12 Components of execution time Communication time and other parallel overheads execution time κ(P) α⎡ྎlog2P⎤ྏ number of processors Tuesday, February 14, 12 Components of execution time Sequential time execution time At some point decrease in parallel execution time of the parallel part is less than increase in communication costs, leading to the knee in the curve speedup = 1 maximum speedup number of processors Tuesday, February 14, 12 Speedup as a function of these Tcomponents sequential time S • • Tuesday, February 14, 12 Sequential time is TP(p)i. the sequential computation parallel time ii. the parallel computation Parallel time is i. the sequential computation ii. the (parallel computation) / (number of processors) iii. the communication cost Efficiency 0 < ε(n,p) < 1 all terms > 0, ε(n,p) > 0 numerator < denominator < 1 Intuitively, efficiency is how effectively the machines are being used by the parallel computation If the number of processors is doubled, for the efficiency to stay the same the parallel execution time must be halved. Tuesday, February 14, 12 Efciency by amount of work σ=1, κ = 1 when p = 1, κ increases by log2 P 1.00 Φ: amount of computation that can be done in parallel 0.75 0.50 κ: communication overhead 0.25 0 σ: sequential computation 1 8 16 24 32 ϕ=1000 Tuesday, February 14, 12 40 48 56 64 72 80 88 96 ϕ=10000 104 112 120 128 ϕ=100000 Amdahl’s Law • Developed by Gene Amdahl • Basic idea: the parallel performance of a program is limited by the sequential portion of the program • argument for fewer, faster processors • Can be used to model performance on various...
View Full Document

Ask a homework question - tutors are online