lec03-cuda_programming

lec03-cuda_programming - GPU Programming Lecture 3: The...

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon
GPU Programming Lecture 3: he CUDA Programming Model The CUDA Programming Model 1 © Nvidia 2009
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
arallel Programming Basics Parallel Programming Basics •T h i n gs we need to consider: – Control – Synchronization – Communication • Parallel programming languages offer different ways of dealing with above 2 © Nvidia 2009
Background image of page 2
An Example of Physical Reality Behind CUDA CPU (host) GPU w/ local DRAM (device) 3 © Nvidia 2009
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
verview Overview CUDA programming model – basic concepts and data types CUDA application programming interface - basic Simple examples to illustrate basic concepts and functionalities Performance features will be covered later 4 © Nvidia 2009
Background image of page 4
CUDA – C with no shader limitations! Integrated host+device app C program – Serial or modestly parallel parts in host C code – Highly parallel parts in device SPMD kernel C code Serial Code (host) arallel Kernel (device) . . . Parallel Kernel (device) KernelA<<< nBlk, nTid >>>(args); Serial Code (host) arallel Kernel (device) 5 . . . Parallel Kernel (device) KernelB<<< nBlk, nTid >>>(args); © Nvidia 2009
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
CUDA Devices and Threads A compute device Is a coprocessor to the CPU or host Has its own DRAM ( device memory ) Runs many threads in parallel typically a PU ut can also be another type of parallel processing Is typically a GPU but can also be another type of parallel processing device Data-parallel portions of an application are expressed as device kernels which run on many threads Differences between GPU and CPU threads PU threads are extremely lightweight GPU threads are extremely lightweight Very little creation overhead GPU needs 1000s of threads for full efficiency 6 Multi-core CPU needs only a few © Nvidia 2009
Background image of page 6
G80 CUDA mode – A Device Example Processors execute computing threads New operating mode/HW interface for computing hread Execution Manager Input Assembler Host Thread Execution Manager exture Texture Texture Texture Texture Texture Texture Texture Texture Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache oad/store Texture oad/store oad/store oad/store oad/store oad/store 7 Load/store Global Memory Load/store Load/store Load/store Load/store Load/store © Nvidia 2009
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
xtended C Extended C Type Qualifiers global, device, shared, cal constant __device__ float filter[N]; global void convolve (float *image) { local, constant Keywords __global__ void convolve (float *image) __shared__ float region[M]; ... threadIdx, blockIdx Intrinsics __syncthreads region[threadIdx] = image[i]; __syncthreads() ... Runtime API Memory, symbol, image[j] = result; } // Allocate GPU memory execution management Function launch void *myimage = cudaMalloc(bytes) // 100 blocks, 10 threads per block convolve<<<100, 10>>> (myimage); 8 , (y g) ; © Nvidia 2009
Background image of page 8
Arrays of Parallel Threads • A CUDA kernel is executed by an array of threads All threads run the same code (SPMD) – Each thread has an ID that it uses to compute
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 10
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 10/31/2011 for the course EE 101 taught by Professor Gibbons during the Spring '09 term at Michigan State University.

Page1 / 37

lec03-cuda_programming - GPU Programming Lecture 3: The...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online