Carnegie Mellon Parralel Computing Notes on Lecture 4

0 allocate grid use block decomposition across

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ntil convergence CMU 15-418, Spring 2014 Assignment Which is better? Does it matter? CMU 15-418, Spring 2014 Consider dependencies (data ow) 1. Perform red update in parallel Compute red cells 2. Wait until all processors done Wait 3. Communicate updated red cells to other processors 4. Perform black update in parallel Compute black cells Wait 5. Wait until all processors done 6. Communicate updated black cells to other processors 7. Repeat P1 P2 P3 P4 CMU 15-418, Spring 2014 Assignment = data that must be sent to P2 each iteration Blocked assignment requires less data to be communicated between processors CMU 15-418, Spring 2014 Grid solver: data-parallel expression To simplify pseudocode: just showing red-cell update int n; // grid size bool done = false; float diff = 0.0; // allocate grid, use block decomposition across processors float **A = allocate(n+2, n+2, BLOCK_Y, NUM_PROCESSORS); assignment: speci ed explicitly (blocks of consecutive rows for same processor) void solve(float** A) { while (!done) { for_all (red cells (i,j)) { float prev = A[i,j]; A[i,j] = 0.2f * (A[i- 1,j] + A[i,j- 1] + A[i,j] + A[i+1,j], A[i,j+1]); reduceAdd(diff, abs(A[i,j] - prev)); } if (diff/(n*n) < TOLERANCE) done = true; } } Orchestration: handled by system decomposition: independent tasks are individual elements Orchestration: handled by system (End of for_all block is implicit wait for all workers before returning to sequential control) (builtin communication primitive: reduceAdd) Example from: Culler, Singh, and Gupta CMU 15-418, Spring 2014 Shared address space solver SPMD execution model ▪ Programmer is responsible for synchronization ▪ Common synchronization primitives: Compute Wait - Locks (mutual exclusion): only one thread in Compute the critical region at a time Wait - Barriers: wait for threads to reach this point P1 P2 P3 P4 CMU 15-418, Spring 2014 Barriers ▪ Barrier(nthreads) Compute ▪ Barriers are a conservative way to express Barrier dependencies ▪ Barriers divide computation into phases ▪ All computations by all threads before the barrier Compute Barrier complete before any computation in any thread after the barrier begins P1 P2 P3 P4 CMU 15-418, Spring 2014 Shared address space solver (SPMD execution model) int n; // grid size bool done = false; float diff = 0.0; LOCK myLock; BARRIER myBarrier; // allocate grid float **A = allocate(n+2, n+2); Value of threadId is different for each SPMD instance: use value to compute region of grid to work on void solve(float** A) { float myDiff; int threadId = getThreadId(); int myMin = 1 + (threadId * n / NUM_PROCESSORS); int myMax...
View Full Document

This document was uploaded on 03/19/2014 for the course CMP 15-418 at Carnegie Mellon.

Ask a homework question - tutors are online