Carnegie Mellon Parralel Computing Notes on Lecture 4

In these cases its natural to think of partitioning

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: s: - Place related threads (cooperating threads) on the same processor (maximize locality, data sharing, minimize costs of comm/sync) - Place unrelated threads on the same processor (one might be bandwidth limited and another might be compute limited) to use machine more efficiently CMU 15-418, Spring 2014 Decomposing computation or data? N N Often, the reason a problem requires lots of computation (and needs to be parallelized) is that it involves manipulating a lot of data. I’ve described the process of parallelizing programs as an act of partitioning computation. Often, it’s equally valid to think of partitioning data. (computations go with the data) But there are many computations where the correspondence between work-to-do (“tasks”) and data is less clear. In these cases it’s natural to think of partitioning computation. CMU 15-418, Spring 2014 A parallel programming example CMU 15-418, Spring 2014 A 2D-grid based solver ▪ Solve partial differential equation on N+2 x N+2 grid ▪ Iterative solution - Perform Gauss-Seidel sweeps over grid until convergence N A[i,j] = 0.2 * (A[i,j] + A[i,j- 1] + A[i- 1,j] + A[i,j+1] + A[i+1,j]); N Example from: Culler, Singh, and Gupta CMU 15-418, Spring 2014 Grid solver algorithm (generic pseudocode for sequential algorithm is provided below) Example from: Culler, Singh, and Gupta CMU 15-418, Spring 2014 Step 1: identify dependencies (problem decomposition phase) N Each row element depends on element to left. Each column depends on previous column. ... N ... CMU 15-418, Spring 2014 Step 1: identify dependencies (problem decomposition phase) N There is independent work along the diagonals! Good: parallelism exists! ... N ... Possible strategy: 1. Partition grid cells on a diagonal into tasks 2. Update values in parallel 3. When complete, move to next diagonal Bad: hard to exploit Early in computation: not much parallelism Frequent synchronization (each diagonal) CMU 15-418, Spring 2014 Key idea: change algorithm to one that is more easily parallelized ▪ Change the order grid cell cells are updated ▪ New algorithm iterates to (approximately) same solution, but converges to solution differently - Note: oating-point values computed are different, but solution still converges to within error threshold ▪ Domain knowledge: needed knowledge of Gauss-Seidel iteration to realize this change is okay for application’s needs CMU 15-418, Spring 2014 Exploit application knowledge Reorder grid traversal: red-black coloring N Update all red cells in parallel When done updating red cells , update all black cells in parallel (respect dependency on red cells) N Repeat u...
View Full Document

This document was uploaded on 03/19/2014 for the course CMP 15-418 at Carnegie Mellon.

Ask a homework question - tutors are online