Lecture 13 Notes

X localqhead qqueueglobalqindex threadidxx

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: its own queue. If it is empty, it randomly chooses another thread and tries to steal one from that threads’ queue. As each task tries to take a new task first from its own task and there is no single hot spot, it eliminates the performance degradation. CUDA supports atomic operation on shared memory as well as on global memory. As shared memory is local to each core in GPU, we can implement a local distributed queues using atomic operation on shared memory. Figure 6 illustrates this. Instead of passing shared queue head in global memory (q.head in Figure 2) to atomicAdd, pass local queue head defined in shared memory (localQ_head in Figure 6). __shared__ int localQ[MAX_SIZE] ; __shared__ int localQ_head; __shared__ int globalQ_index; // put vertexID into the local queue int index = atomicAdd(&localQ_head, 1); localQ [index] = vertexID; // Master thread in a thread block obtains the index of global queue // where the local queue contents need to be copied to if (threadIdx.x==0) globalQ_index = atomicAdd(q.head, localQ_head); __syncthreads(); // Copy the local qeuue content to the global queue...
View Full Document

This document was uploaded on 03/17/2014 for the course CS 4800 at Northeastern.

Ask a homework question - tutors are online