Carnegie Mellon Parralel Computing Notes on Lecture 6

X 0 startingindex atomicincworkcounter threadsperblk

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: int workCounter = 0; // global mem variable __global__ void convolve(int N, float* input, float* output) { __shared__ int startingIndex; __shared__ float support[THREADS_PER_BLK+2]; // shared across block while (1) { if (threadIdx.x == 0) startingIndex = atomicInc(workCounter, THREADS_PER_BLK); __syncthreads(); if (startingIndex >= N) break; int index = startingIndex + threadIdx.x; // thread local support[threadIdx.x] = input[index]; if (threadIdx.x < 2) support[THREADS_PER_BLK+threadIdx.x] = input[index+THREADS_PER_BLK]; __syncthreads(); float result = 0.0f; // thread- local variable for (int i=0; i<3; i++) result += support[threadIdx.x + i]; output[index] = result; __syncthreads(); } } // host code ////////////////////////////////////////////////////// int N = 1024 * 1024; cudaMalloc(&devInput, N+2); // allocate array in device memory cudaMalloc(&devOutput, N); // allocate array in device memory // properly initialize contents of devInput here ... convolve<<<BLOCKS_PER_CHIP, THR...
View Full Document

Ask a homework question - tutors are online