Carnegie Mellon Parralel Computing Notes on Lecture 4

Computation performance dierences dierent assignments

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ocally into myDiff) // lock(myLock); diff[index] += myDiff; // atomically update global diff unlock(myLock); diff[(index+1) % 3] = 0.0f; barrier(myBarrier, NUM_PROCESSORS); if (diff[index]/(n*n) < TOLERANCE) break; index = (index + 1) % 3; } } CMU 15-418, Spring 2014 More on specifying dependencies ▪ Barriers: simple, but conservative (coarse granularity) - Everything done up until now must nish, then before next phase ▪ Specifying speci c dependencies can increase performance (by revealing more parallelism) - Example: two threads. One produces a result, the other consumes it. T0 T1 // produce x, then let T1 know // do stuff independent x = 1; // of x here flag = 1; while (flag == 0); print x; ▪ We just implemented a message queue (of length 1) T0 T1 CMU 15-418, Spring 2014 Solver implementation in two programming models ▪ Data-parallel programming model - Synchronization: - Single logical thread of control, but iterations of forall loop can be parallelized (implicit barrier at end of outer forall loop body) - Communication - Implicit in loads and stores (like shared address space) - Special built-in primitives: e.g., reduce ▪ Shared address space - Synchronization: - Mutual exclusion required for shared variables - Barriers used to express dependencies (between phases of computation) - Communication - Implicit in loads/stores to shared variables CMU 15-418, Spring 2014 Summary ▪ Amdahl’s Law - Overall speedup limited by amount of serial execution in code ▪ Steps in creating a parallel program - Decomposition, assignment, orchestration, mapping We’ll talk a lot about making good decisions in each of these phases in coming lectures (in practice, they are very inter-related) ▪ Focus today: identifying dependencies ▪ Focus soon: identifying locality CMU 15-418, Spring 2014 Message passing model ▪ No shared address space abstraction (i.e., no shared variables) ▪ Each thread has its own address space ▪ Threads communicate & synchronize by sending/receiving messages One possible message passing implementation: cluster of workstations (recall lecture 3) Processor Processor Local Cache Local Cache Memory Memory Network CMU 15-418, Spring 2014 Recall: assignment in a shared address space ▪ Grid data resided in a single array in shared address space (array was accessible to all threads) ▪ Assignment partitioned elements to processors to divide up the computation - Performance differences Different assignments may yield different amounts o...
View Full Document

Ask a homework question - tutors are online