Carnegie Mellon Parralel Computing Notes on Lecture 6

Cuda threads can atomically update shared variables

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: iltin operations like swizzle and vote CMU 15-418, Spring 2014 Implications of CUDA global memory atomics ▪ Last class I pointed out that GPUs schedule CUDA thread blocks in any order. ▪ CUDA threads can atomically update shared variables in global memory - Example: build a histogram of values in an array ▪ Observe how this use of atomics does not impact implementation’s ability to schedule blocks in any order (I’m using atomics for mutual exclusion, and nothing more) atomicSum(&bins[A[idx]], 1) ... block 0 atomicSum(&bins[A[idx]], 1) block N Global memory int bins[10] CMU 15-418, Spring 2014 Implications of CUDA atomics ▪ But what about this? ▪ Consider a single core GPU with execution resources for one block per core - What are the possible outcomes of different schedules? // do stuff atomicAdd(&myFlag, 1); ... block 0 while(atomicAdd(&myFlag, 0) == 0) {} // do stuff block N Global memory int myFlag (assume initialized to 0) CMU 15-418, Spring 2014 “Persistent thread” technique for CUDA programming #define THREADS_PER_BLK 128 #define BLOCKS_PER_CHIP 15 * 12 // specific to a certain GTX 480 GPU __device__...
View Full Document

This document was uploaded on 03/19/2014 for the course CMP 15-418 at Carnegie Mellon.

Ask a homework question - tutors are online