lec7 - CS575 Parallel Processing Lecture seven: Dense...

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon
Lecture seven: Dense Matrix Algorithms Linear equations Wim Bohm, Colorado State University CS575 Parallel Processing Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 license.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
CS575 lecture 7 2 Mapping n x n matrix to p PEs Striped: allocate rows (or columns) on PEs Block striped: consecutive rows to one PE, e.g.: PE# 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 Row 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Cyclic striped: interleaving rows onto Pes PE# 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 Row 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Hybrid PE# 0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3 Row 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Finest granularity One row (or column) per PE, (p = n)
Background image of page 2
CS575 lecture 7 3 Mapping n x n matrix to p PEs (cont.) Checkerboard Map n/sqrt(p) x n/sqrt(p) blocks onto Pes Maps well on a 2D mesh Finest granularity 1 element per PE, (p = n*n) Many matrix algorithms allow block formulation Matrix add Matrix multiply
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
CS575 lecture 7 4 Matrix Transpose for i = 0 to n-1 for j = i+1 to to n-1 swap(A, i, j) Striped: (almost) all-to-all personal communication Checkerboard (p = n*n) Upper triangle element travels down to diagonal then left Lower triangle element travels up to diagonal then right Checkerboard (p < n*n) Do above communication but with blocks: 2*sqrt(p) * (n*n)/p traffic Transpose blocks at destination: O(n*n/p) swaps
Background image of page 4
CS575 lecture 7 5 Recursive Transpose for hypercube View the matrix as 2 x 2 block matrix View hypercube as four sub-cubes of p/4 processors Exchange upper-right and lower-left blocks On a hypercube this goes via one intermediate node Recursively transpose the blocks First (log p)/2 transposes require communication n=16, p=16: first two transposes involve communication In each transpose, pairs of PEs exchange their blocks, via one
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 6
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 21

lec7 - CS575 Parallel Processing Lecture seven: Dense...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online