Special thanks to Chris Takahashi for preparing many of these slides and to Blaise Barney at Lawrence Livermore National Laboratory for the material borrowed for this lecture. What is Parallel Computing?
Typical scenario: serial case
Single problem
Broken into smaller discrete steps called instructions What is Parallel Computing?
Parallel Case
Single or group of problems
Broken down into independent parts that can be solved concurrently Single central processing element
One instruction is executed at a time in sequence Multiple processing elements
Multiple independent instructions executed in parallel Why Parallel Computing?
Performance increases are no longer coming from more processors rather than faster processors.
This trend is accelerating. Why Parallel Computing?
Because the world is inherently parallel! To engineers and general scientists: So now that we have more processors, what do we do?
Run more programs. Have single programs do more than one thing at once. Many problems they encounter turn out to be parallel. To computer scientists:
Unfortunately, having single programs do more at the same time is very hard to do...
It's cool. 1 The World is Parallel
What kind of problems are parallel computing used for?
Bioscience, Biotechnology, Genetics Chemistry, Molecular Sciences Mechanical Engineering  from prosthetics to spacecraft Medical imaging and diagnosis, Pharmaceutical design The World is Parallel
More examples
Electrical Engineering, Circuit Design, Microelectronics Computer Science, Mathematics Financial and economic modeling Geology, Seismology, Oil exploration Physics  nuclear, particle, fusion How does parallelism help us?
It saves time and money It enables new technologies and solutions to larger problems
Real time ray tracing Complex models of largescale systems large What computers are parallel?
Historically
Mainframes High end servers Supercomputers Today
Personal computers
Core 2 Duo/Quad, Core i7, general purpose GPUs It enables the use of nonlocal resources nonFolding at Home distributed.net Game consoles
XBox, XBox, PS3 Types of parallelism
Instruction level
Single Instruction Multiple Data (SIMD)
1. 2. 3. The Good News
Modern CPUs do some parallelism for you
e=a+b f=c+d g=e*f 1. e = a + b, f = c + d 2. g = e * f
a b c d a b c d Multiple Instructions Multiple data
1. e = a + b 2. f = c + d 3. g = e * f e f e f g g 2 The Bad News
Often times you have to handle parallelism yourself... Consider the following:
void apply(vector <int> & v) { <int> for (int i = 0; i < v.size(); i++) (int v.size(); v.at( v.at(i) = v.at(i) + pow(v.at(i)); v.at( pow(v.at( } Exercise
Consider the following:
You have 20 processors It takes 2 seconds for a processor to add 2 numbers It takes 5 seconds for a processor to execute the pow() pow() function There are 100 integers in the vector, v, seen below Assuming you can convert a loop into a parallel version, approximately how long would the following code take to execute?
void apply(vector <int> & v) { <int> for (int i = 0; i < v.size(); i++) (int v.size(); v.at( v.at(i) = v.at(i) + pow(v.at(i)); v.at( pow(v.at( } 1) 7 seconds 3) 100 seconds 2) 35 seconds 4) 700 seconds Example: Calculating Pi
The value of PI can be calculated in a number of ways. Consider the following method of approximating PI:
1. 2. 3. 4. 5. Example: Calculating Pi
PseudoPseudoCode
npoints = 10000 count = 0 for j = 1:npoints x = random # between 0 and 1 y = random # between 0 and 1 if (x, y) inside circle count = count + 1 end PI = 4.0*count/npoints 4.0*count/npoints Inscribe a circle in a square Randomly generate points in the square Determine the number of points in the square that are also in the circle Let r be the number of points in the circle divided by the number of points in the square PI ~ 4 r Note that the more points generated, the better the approximation Things to think about...
There are lots of interesting things left to think about, we are just scratching the surface. For example, 1. What happens if each element doesn't "cost the same" to compute as each of the others? 2. Suppose we have more processors than elements. How small a range is "too small"? Is there even such a thing? 3. What would you do with this loop? void apply2(vector <int> & v) { <int> for (int i = 1; i < v.size(); i++) (int v.size(); v.at( v.at(i) = v.at(i1) + v.at(i); v.at(iv.at( } Conclusion
