lec05-cuda-threads-part2

lec05-cuda-threads-part2 - GPU Programming Lecture 5 CUDA...

Info iconThis preview shows pages 1–5. Sign up to view the full content.

View Full Document Right Arrow Icon
PU Programming GPU Programming Lecture 5: CUDA Threads art Part 2 © nVidia 1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
lock IDs and Thread IDs Host Device Block IDs and Thread IDs ach thread uses IDs to Kernel 1 Grid 1 Block (0, 0) Block (1, 0) Each thread uses IDs to decide what data to work on Block ID: 1D or 2D Block (0, 1) Block (1, 1) rid 2 Thread ID: 1D, 2D, or 3D •S i m plifies memory Kernel 2 Grid 2 Block (1, 1) (0,0,1) (1,0,1) (2,0,1) (3,0,1) addressing when processing multidimensional data hread hread hread hread Thread (0,0,0) Thread (1,0,0) Thread (2,0,0) Thread (3,0,0) Image processing Solving PDEs on volumes 2 Courtesy: NDVIA Thread (0,1,0) Thread (1,1,0) Thread (2,1,0) Thread (3,1,0) © nVidia
Background image of page 2
CUDA Thread Block All threads in a block execute the same kernel program (SPMD) rogrammer declares block UDA Thread Block Programmer declares block: Block size 1 to 512 concurrent threads Block shape 1D, 2D, or 3D lock dimensions in threads CUDA Thread Block Thread Id #: 0 1 2 3 … m Block dimensions in threads Threads have thread id numbers within block Thread program uses thread id to select work and address shared data hread program Threads in the same block share data and synchronize while doing their share of the Thread program work Threads in different blocks cannot cooperate Each block can execute in any order relative other blocs! Courtesy: John Nickolls, NVIDIA 3 to other blocs! © nVidia
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Transparent Scalability • Hardware is free to assigns blocks to any rocessor at any time processor at any time – A kernel scales across any number of arallel processors parallel processors Device Kernel grid Block 0 Block 1 Device Block 0 Block 1 lock lock Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 Block 0 Block 1 Block 2 Block 3 time Block 2 Block 3 Block 4 Block 5 Block 4 Block 5 Block 6 Block 7 Each block can execute in any order relative 4 Block 6 Block 7 to other blocks.
Background image of page 4
Image of page 5
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 10/31/2011 for the course EE 101 taught by Professor Gibbons during the Spring '09 term at Michigan State University.

Page1 / 16

lec05-cuda-threads-part2 - GPU Programming Lecture 5 CUDA...

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online