This preview shows page 1. Sign up to view the full content.
Unformatted text preview: s:
 Place related threads (cooperating threads) on the same processor
(maximize locality, data sharing, minimize costs of comm/sync)  Place unrelated threads on the same processor (one might be bandwidth limited
and another might be compute limited) to use machine more eﬃciently CMU 15418, Spring 2014 Decomposing computation or data? N N Often, the reason a problem requires lots of computation (and needs to be parallelized) is
that it involves manipulating a lot of data.
I’ve described the process of parallelizing programs as an act of partitioning computation.
Often, it’s equally valid to think of partitioning data. (computations go with the data)
But there are many computations where the correspondence between worktodo (“tasks”)
and data is less clear. In these cases it’s natural to think of partitioning computation.
CMU 15418, Spring 2014 A parallel programming example CMU 15418, Spring 2014 A 2Dgrid based solver
▪ Solve partial diﬀerential equation on N+2 x N+2 grid
▪ Iterative solution  Perform GaussSeidel sweeps over grid until convergence
N
A[i,j] = 0.2 * (A[i,j] + A[i,j 1] + A[i 1,j] + A[i,j+1] + A[i+1,j]); N Example from: Culler, Singh, and Gupta CMU 15418, Spring 2014 Grid solver algorithm
(generic pseudocode for sequential algorithm is provided below) Example from: Culler, Singh, and Gupta CMU 15418, Spring 2014 Step 1: identify dependencies
(problem decomposition phase)
N Each row element depends on element to left.
Each column depends on previous column. ... N ... CMU 15418, Spring 2014 Step 1: identify dependencies
(problem decomposition phase)
N There is independent work along the diagonals!
Good: parallelism exists! ... N ... Possible strategy:
1. Partition grid cells on a diagonal into tasks
2. Update values in parallel
3. When complete, move to next diagonal
Bad: hard to exploit
Early in computation: not much parallelism
Frequent synchronization (each diagonal) CMU 15418, Spring 2014 Key idea: change algorithm to one that is
more easily parallelized
▪ Change the order grid cell cells are updated
▪ New algorithm iterates to (approximately) same solution, but
converges to solution diﬀerently
 Note: oatingpoint values computed are diﬀerent, but solution still converges
to within error threshold ▪ Domain knowledge: needed knowledge of GaussSeidel
iteration to realize this change is okay for application’s needs CMU 15418, Spring 2014 Exploit application knowledge
Reorder grid traversal: redblack coloring N Update all red cells in parallel
When done updating red cells ,
update all black cells in parallel
(respect dependency on red cells)
N Repeat u...
View
Full
Document
This document was uploaded on 03/19/2014 for the course CMP 15418 at Carnegie Mellon.
 Spring '14
 KayvonFatahalian

Click to edit the document details