Carnegie Mellon Parralel Computing Notes on Lecture 4

Shared address space solver one barrier int n

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: n this row) { float prev = A[i,j]; A[i,j] = 0.2f * (A[i- 1,j] + A[i,j- 1] + A[i,j] + partial sum A[i+1,j], A[i,j+1]); myDiff += abs(A[i,j] - prev)); } Now only only lock per thread, not per i,j loop iteration! lock(myLock); diff += myDiff; unlock(myLock); barrier(myBarrier, NUM_PROCESSORS); if (diff/(n*n) < TOLERANCE) // check convergence, all threads get same answer done = true; barrier(myBarrier, NUM_PROCESSORS); } } CMU 15-418, Spring 2014 Example from: Culler, Singh, and Gupta Shared address space solver (SPMD execution model) int n; // grid size bool done = false; float diff = 0.0; LOCK myLock; BARRIER myBarrier; // allocate grid float **A = allocate(n+2, n+2); void solve(float** A) { float myDiff; int threadId = getThreadId(); int myMin = 1 + (threadId * n / NUM_PROCESSORS); int myMax = myMin + (n / NUM_PROCESSORS) while (!done) { float myDiff = diff = 0.f; barrier(myBarrier, NUM_PROCESSORS); for (j=myMin to myMax) { for (i = red cells in this row) { float prev = A[i,j]; A[i,j] = 0.2f * (A[i- 1,j] + A[i,j- 1] + A[i,j] + A[i+1,j], A[i,j+1]); myDiff += abs(A[i,j] - prev)); } lock(myLock); diff += myDiff; unlock(myLock); barrier(myBarrier, NUM_PROCESSORS); if (diff/(n*n) < TOLERANCE) // check convergence, all threads get same answer done = true; barrier(myBarrier, NUM_PROCESSORS); } } CMU 15-418, Spring 2014 Example from: Culler, Singh, and Gupta Why are there so many barriers? Shared address space solver: one barrier int n; // grid size bool done = false; LOCK myLock; BARRIER myBarrier; float diff[3]; // global diff, but now 3 copies Idea: Remove dependencies by using different diff variables in successive loop iterations float **A = allocate(n+2, n+2); void solve(float** A) { float myDiff; // thread local variable int index = 0; // thread local variable Trade off footprint for reduced synchronization! (common parallel programming technique) diff[0] = 0.0f; barrier(myBarrier, NUM_PROCESSORS); // one- time only: just for init while (!done) { myDiff = 0.0f; // // perform computation (accumulate l...
View Full Document

This document was uploaded on 03/19/2014 for the course CMP 15-418 at Carnegie Mellon.

Ask a homework question - tutors are online