cs240a-pprefix

cs240a-pprefix - CS 240A: Parallel Prefix Algorithms or...

Info iconThis preview shows pages 1–11. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS 240A: Parallel Prefix Algorithms or Tricks with Trees Some slides from Jim Demmel, Kathy Yelick, Alan Edelman, and a cast of thousands Parallel Vector Operations Vector add: z = x + y Embarrassingly parallel if vectors are aligned DAXPY: z = a*x + y (a is scalar) Broadcast a, followed by independent * and + DDOT: s = xTy = j x[j] * y[j] Independent * followed by + reduction Broadcast and reduction Broadcast of 1 value to p processors with log p span Reduction of p values to 1 with log p span Takes advantage of associativity in +, *, min, max, etc. a 8 1 3 1 0 4 -6 3 2 Add-reduction Broadcast A theoretical secret for turning serial into parallel Surprising parallel algorithms: If there is no way to parallelize this algorithm! its probably a variation on parallel prefix! Parallel Prefix Algorithms Example of a prefix Sum Prefix Input x = (x1, x2, . . ., xn) Output y = (y1, y2, . . ., yn) yi = j=1:i xj Example x = ( 1, 2, 3, 4, 5, 6, 7, 8 ) y = ( 1, 3, 6, 10, 15, 21, 28, 36) Prefix Functions-- outputs depend upon an initial string What do you think? Can we really parallelize this? It looks like this kind of code: y(0) = 0; for i = 1:n y(i) = y(i-1) + x(i); The ith iteration of the loop depends completely on the (i-1)st iteration. Work = n, span = n, parallelism = 1. A clue? x = ( 1, 2, 3, 4, 5, 6, 7, 8 ) y = ( 1, 3, 6, 10, 15, 21, 28, 36) Is there any value in adding, say, 4+5+6+7? If we separately have 1+2+3, what can we do? Suppose we added 1+2, 3+4, etc. pairwise -- what could we do? 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 3 7 11 15 19 23 27 31 (Recursively compute prefix sums) 3 10 21 36 55 78 105 136 1 3 6 10 15 21 28 36 45 55 66 78 91 105 120 136 Prefix sum in parallel Algorithm: 1. Pairwise sum 2. Recursive prefix 3. Pairwise sum Whats the total work? 1 2 3 4 5 6 7 8 Pairwise sums 3 7 11 15 Recursive prefix 3 10 21 36 Update odds 1 3 6 10 15 21 28 36 T1(n) = n/2 + n/2 + T1 (n/2) = n + T1 (n/2) = 2n 1 at the cost of more work! 9 Parallel prefix cost Whats the total work? 1 2 3 4 5 6 7 8 Pairwise sums 3 7 11 15 Recursive prefix 3 10 21 36 Update odds 1 3 6 10 15 21 28 36 T1(n) = n/2 + n/2 + T1 (n/2) = n + T1 (n/2) = 2n 1 Parallelism at the cost of more work!...
View Full Document

This note was uploaded on 12/27/2011 for the course CMPSC 240A taught by Professor Gilbert during the Fall '09 term at UCSB.

Page1 / 31

cs240a-pprefix - CS 240A: Parallel Prefix Algorithms or...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online