{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

speedupBasic - Comments about matrix multiply Self...

Info icon This preview shows pages 1–15. Sign up to view the full content.

View Full Document Right Arrow Icon
Comments about matrix multiply Self initialize Gets rid of I/O bottleneck for timing Tuesday, February 14, 12
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Performance analysis Goals are 1. to be able to understand better why your program has the performance it has, and 2. what could be preventing its performance from being better. Tuesday, February 14, 12
Image of page 2
The typical speedup curve - fixed problem size Speedup Number of processors Tuesday, February 14, 12
Image of page 3

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
The typical speedup curve - large, fixed number of processors Speedup Problem size Tuesday, February 14, 12
Image of page 4
Speedup Parallel time T P (p) is the time it takes the parallel form of the program to run on p processors Sequential time Ts is more problematic a. Can be T P (1) , but this carries the overhead of extra code needed for parallelization. Even with one thread, OpenMP code will call libraries for threading. One way to “cheat” on benchmarking. b. Should be the best possible sequential implementation: tuned, good or best compiler switches, etc. Tuesday, February 14, 12
Image of page 5

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
What is execution time? Execution time can be modeled as the sum of: 1. Inherently sequential computation σ (n) 2. Potentially parallel computation ϕ (n) 3. Communication time κ (n,p) Tuesday, February 14, 12
Image of page 6
Components of execution time Sequential time execution time number of processors Tuesday, February 14, 12
Image of page 7

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Components of execution time Parallel time execution time number of processors Tuesday, February 14, 12
Image of page 8
Components of execution time Communication time and other parallel overheads execution time number of processors κ (P) α Ρ log 2 P Τ Tuesday, February 14, 12
Image of page 9

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Components of execution time Sequential time execution time number of processors At some point decrease in parallel execution time of the parallel part is less than increase in communication costs, leading to the knee in the curve speedup = 1 maximum speedup Tuesday, February 14, 12
Image of page 10
Speedup as a function of these components Sequential time is i. the sequential computation ii. the parallel computation Parallel time is i. the sequential computation ii. the (parallel computation) / (number of processors) iii. the communication cost T S - sequential time T P (p)- parallel time Tuesday, February 14, 12
Image of page 11

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Efficiency Intuitively, efficiency is how effectively the machines are being used by the parallel computation If the number of processors is doubled, for the efficiency to stay the same the parallel execution time must be halved. 0 < ε (n,p) < 1 all terms > 0, ε (n,p) > 0 numerator < denominator < 1 Tuesday, February 14, 12
Image of page 12
E ffi ciency by amount of work 0 0.25 0.50 0.75 1.00 1 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 σ =1, κ = 1 when p = 1, κ increases by log2 P ϕ =1000 ϕ =10000 ϕ =100000 Φ : amount of computation that can be done in parallel κ : communication overhead σ : sequential computation Tuesday, February 14, 12
Image of page 13

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Amdahl’s Law Developed by Gene Amdahl Basic idea: the parallel performance of a program is limited by the sequential portion of the program argument for fewer, faster processors Can be used to model performance on various sizes of machines, and to derive other useful relations.
Image of page 14
Image of page 15
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern