This preview shows pages 1–11. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Parallel Prefix Algorithms, or Tricks with Trees Some slides from Jim Demmel, Kathy Yelick, Alan Edelman, and a cast of thousands Parallel Vector Operations Vector add: z = x + y Embarrassingly parallel if vectors are aligned DAXPY: z = a*x + y (a is scalar) Broadcast a, followed by independent * and + DDOT: s = x T y = j x[j] * y[j] Independent * followed by + reduction Broadcast and reduction Broadcast of 1 value to p processors in log p time Reduction of p values to 1 in log p time Takes advantage of associativity in +, *, min, max, etc. a 8 1 3 1 0 4 6 3 2 Addreduction Broadcast A theoretical secret for turning serial into parallel Surprising parallel algorithms: If there is no way to parallelize this algorithm! its probably a variation on parallel prefix! Parallel Prefix Algorithms Example of a prefix Sum Prefix Input x = (x1, x2, . . ., xn) Output y = (y1, y2, . . ., yn) y i = j=1:i x j Example x = ( 1, 2, 3, 4, 5, 6, 7, 8 ) y = ( 1, 3, 6, 10, 15, 21, 28, 36) Prefix Functions outputs depend upon an initial string What do you think? Can we really parallelize this? It looks like this kind of code: y(0) = 0; for i = 1:n y(i) = y(i1) + x(i); The ith iteration of the loop depends completely on the (i1)st iteration. Work = n, span = n, parallelism = 1. Impossible to parallelize, right? A clue? x = ( 1, 2, 3, 4, 5, 6, 7, 8 ) y = ( 1, 3, 6, 10, 15, 21, 28, 36) Is there any value in adding, say, 4+5+6+7? If we separately have 1+2+3, what can we do? Suppose we added 1+2, 3+4, etc. pairwise  what could we do? 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 3 7 11 15 19 23 27 31 (Recursively compute prefix sums) 3 10 21 36 55 78 105 136 1 3 6 10 15 21 28 36 45 55 66 78 91 105 120 136 Prefix sum in parallel Algorithm: 1. Pairwise sum 2. Recursive prefix 3. Pairwise sum 12/27/11 9 Whats the total work? 1 2 3 4 5 6 7 8 Pairwise sums 3 7 11 15 Recursive prefix 3 10 21 36 Update odds 1 3 6 10 15 21 28 36 T 1 (n) = n/2 + n/2 + T 1 (n/2) = n + T 1 (n/2) = 2n 1 at the cost of more work! Parallel prefix cost 12/27/11 10 Whats the total work? 1 2 3 4 5 6 7 8 Pairwise sums 3 7 11 15 Recursive prefix 3 10 21 36 Update odds 1 3 6 10 15 21 28 36 T 1 (n) = n/2 + n/2 + T 1 (n/2) = n + T 1 (n/2) = 2n 1 Parallelism at the cost of more work!...
View
Full
Document
This note was uploaded on 12/27/2011 for the course CMPSC 240A taught by Professor Gilbert during the Fall '09 term at UCSB.
 Fall '09
 GILBERT

Click to edit the document details