Carnegie Mellon Parralel Computing Notes on Lecture 4

Stores 1 to diff diff r1 diff r1 t0 stores 1 to diff

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: = myMin + (n / NUM_PROCESSORS) while (!done) { float diff = 0.f; each thread computes rows it is responsible for barrier(myBarrier, NUM_PROCESSORS); for (j=myMin to myMax) { for (i = red cells in this row) { float prev = A[i,j]; A[i,j] = 0.2f * (A[i- 1,j] + A[i,j- 1] + A[i,j] + A[i+1,j], A[i,j+1]); lock(myLock) diff += abs(A[i,j] - prev)); unlock(myLock); } barrier(myBarrier, NUM_PROCESSORS); if (diff/(n*n) < TOLERANCE) // check convergence, all threads get same answer done = true; barrier(myBarrier, NUM_PROCESSORS); } } Example from: Culler, Singh, and Gupta CMU 15-418, Spring 2014 Review: need for mutual exclusion ▪ Each thread executes - Load the value of diff into register r1 Add the register r2 to register r1 Store the value of register r1 into diff ▪ One possible interleaving: (let starting value of diff=0, r2=1) T0 T1 T0 reads value 0 r1 ← diff r1 ← diff T1 reads value 0 T0 sets value of its r1 to 1 r1 ← r1 + r2 r1 ← r1 + r2 T1 sets value of its r1 to 1 T0 stores 1 to diff diff ←r1 diff ←r1 T0 stores 1 to diff ▪ Need set of three instructions to be atomic CMU 15-418, Spring 2014 Mechanisms for atomicity ▪ Lock/Unlock mutex variable around critical section LOCK(mylock); // critical section UNLOCK(mylock); ▪ Some languages have rst-class support atomic { // critical section } ▪ Intrinsics for hardware-supported atomic read-modify-write operations atomicAdd(x, 10); ▪ Access to critical section will be serialized across all threads - High contention will cause performance problems (recall Amdahl’s Law) - Note partial accumulation into private myDiff reduces contention CMU 15-418, Spring 2014 Shared address space solver (SPMD execution model) int n; // grid size bool done = false; float diff = 0.0; LOCK myLock; BARRIER myBarrier; // allocate grid float **A = allocate(n+2, n+2); void solve(float** A) { float myDiff; int threadId = getThreadId(); int myMin = 1 + (threadId * n / NUM_PROCESSORS); int myMax = myMin + (n / NUM_PROCESSORS) while (!done) { float myDiff = diff = 0.f; barrier(myBarrier, NUM_PROCESSORS); for (j=myMin to myMax) { for (i = red cells i...
View Full Document

This document was uploaded on 03/19/2014 for the course CMP 15-418 at Carnegie Mellon.

Ask a homework question - tutors are online