Ch07-AdvCompArch-ManycoresAndGPUs-PaulKelly-V03

Code parallel gpu code kernels gpu kernel is a c

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ally around the OpenGL/DirectX rendering pipeline, with separate vertex ­ and pixel ­shader stages •  “Unified” architecture arose from increased sophis(ca(on of shader programs •  TESLA s(ll has some features specific to graphics: –  Work distribu(on, load distribu(on –  Texture cache, pixel interpola(on –  Z ­buffering and alpha ­blending (the ROP units, see diagram) CUDA Execu(on Model •  CUDA is a C extension –  Serial CPU code –  Parallel GPU code (kernels) •  GPU kernel is a C func(on –  Each thread executes kernel code –  A group of threads forms a thread block (1D, 2D or 3D) –  Thread blocks are organised into a grid (1D or 2D) –  Threads within the same thread block can synchronise execu(on, and share access to local scratchpad memory Key idea: hierarchy of parallelism, to handle thousands of threads __global__ void matAdd(float A[N][N], float B[N][N], float C[N][N]) { int i = blockIdx.x * blockDim.x + threadIdx.x; int j = blockIdx.y * blockDim.y + threadIdx.y; if (i < N && j < N) C[i][j] = A[i][j] + B[i][j]; } Example: Matrix addi(on int main() { // Kernel setup dim3 blockDim(16, 16); dim3 gridDim((N + blockDim.x – 1) / blockDim.x, (N + blockDim.y – 1) / blockDim.y); // Kernel invocation matAdd<<<gridDim, blockDim>>>(A, B, C); }   ...
View Full Document

Ask a homework question - tutors are online