speedupBasic

speedupBasic - Comments about matrix multiply Self...

Info iconThis preview shows pages 1–15. Sign up to view the full content.

View Full Document Right Arrow Icon
Comments about matrix multiply Self initialize Gets rid of I/O bottleneck for timing Tuesday, February 14, 12
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Performance analysis Goals are 1. to be able to understand better why your program has the performance it has, and 2. what could be preventing its performance from being better. Tuesday, February 14, 12
Background image of page 2
The typical speedup curve - fxed problem size Speedup Number oF processors Tuesday, February 14, 12
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
The typical speedup curve - large, fxed number oF processors Speedup Problem size Tuesday, February 14, 12
Background image of page 4
Speedup Parallel time T P (p) is the time it takes the parallel form of the program to run on p processors Sequential time Ts is more problematic a. Can be T P (1) , but this carries the overhead of extra code needed for parallelization. Even with one thread, OpenMP code will call libraries for threading. One way to “cheat” on benchmarking. b. Should be the best possible sequential implementation: tuned, good or best compiler switches, etc. Tuesday, February 14, 12
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
What is execution time? Execution time can be modeled as the sum of: 1. Inherently sequential computation σ (n) 2. Potentially parallel computation ϕ (n) 3. Communication time κ (n,p) Tuesday, February 14, 12
Background image of page 6
Components of execution time Sequential time execution time number of processors Tuesday, February 14, 12
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Components of execution time Parallel time execution time number of processors Tuesday, February 14, 12
Background image of page 8
Components of execution time Communication time and other parallel overheads execution time number of processors κ (P) α Ρ log 2 P Τ Tuesday, February 14, 12
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Components of execution time Sequential time execution time number of processors At some point decrease in parallel execution time of the parallel part is less than increase in communication costs, leading to the knee in the curve speedup = 1 maximum speedup Tuesday, February 14, 12
Background image of page 10
Speedup as a function of these components Sequential time is i. the sequential computation ii. the parallel computation Parallel time is i. the sequential computation ii. the (parallel computation) / (number of processors) iii. the communication cost T S - sequential time T P (p)- parallel time Tuesday, February 14, 12
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
EfFciency Intuitively, efFciency is how effectively the machines are being used by the parallel computation If the number of processors is doubled, for the efFciency to stay the same the parallel execution time must be halved. 0 < ε (n,p) < 1 all terms > 0, ε (n,p) > 0 numerator < denominator < 1 Tuesday, February 14, 12
Background image of page 12
E f ciency by amount of work 0 0.25 0.50 0.75 1.00 1 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 σ =1, κ = 1 when p = 1, κ increases by log2 P ϕ =1000 ϕ =10000 ϕ =100000 Φ : amount of computation that can be done in parallel κ : communication overhead σ : sequential computation Tuesday, February 14, 12
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Amdahl’s Law Developed by Gene Amdahl Basic idea: the parallel performance of a program is limited by the sequential portion of the program argument for fewer, faster processors Can be used to model performance on various sizes of machines, and to derive other useful relations.
Background image of page 14
Image of page 15
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 3

speedupBasic - Comments about matrix multiply Self...

This preview shows document pages 1 - 15. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online