Carnegie Mellon Parralel Computing Notes on Lecture 6

Cmu 15 418 spring 2014 finally lets nish up the solver

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: EADS_PER_BLK>>>(N, devInput, devOutput); Some developers write CUDA code that makes assumptions about number of cores in the underlying GPU’s implementation: Programmer launches exactly as many thread-blocks as will ll the GPU (Exploit knowledge of implementation: that GPU will in fact run all blocks concurrently) Work assignment to blocks is implemented entirely by the application (circumvents GPU thread block scheduler, and intended CUDA thread block semantics) Now programmer’s mental model is that *all* threads are concurrently running on the machine at once. CMU 15-418, Spring 2014 Finally... let’s nish up the solver discussion from lecture 4... CMU 15-418, Spring 2014 Recall basic 2D grid solver Solver example: described in terms of data parallelism and SPMD programming models N A[i,j] = 0.2 * (A[i,j] + A[i,j- 1] + A[i- 1,j] + A[i,j+1] + A[i+1,j]); N CMU 15-418, Spring 2014 Review: data-parallel solver implementation ▪ Synchronization: - ▪ forall loop iterations are independent (can be parallelized) Implicit barrier at end of outer forall loop body Communication - Implicit in loads and stores (like shared address space) - Special built-in primitives: e.g., reduce int n; // grid size bool done = false; float diff = 0.0; // allocate grid, use block decomposition across processors f...
View Full Document

This document was uploaded on 03/19/2014 for the course CMP 15-418 at Carnegie Mellon.

Ask a homework question - tutors are online