cs8803SC_lecture7 - CS8803SC Software and Hardware...

Info iconThis preview shows pages 1–5. Sign up to view the full content.

View Full Document Right Arrow Icon
1 CS8803SC Software and Hardware Cooperative Computing Introduction to G80/CUDA Programming Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Programming model: Block and Thread IDs A kernel is executed as a grid of thread blocks Threads and blocks have IDs So each thread can decide what data to work on Block ID: 1D or 2D Thread ID: 1D, 2D, or 3D Simplifies memory addressing when processing multidimensional data Image processing Solving PDEs on volumes Device Grid 1 Block (0, 0) Block (1, 0) Block (2, 0) Block (0, 1) Block (1, 1) Block (2, 1) Block (1, 1) Thread (0, 1) Thread (1, 1) Thread (2, 1) Thread (3, 1) Thread (4, 1) Thread (0, 2) Thread (1, 2) Thread (2, 2) Thread (3, 2) Thread (4, 2) Thread (0, 0) Thread (1, 0) Thread (2, 0) Thread (3, 0) Thread (4, 0) Courtesy: NDVIA © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, UIUC
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 CUDA Device Memory Space Overview Each thread can: R/W per-thread registers R/W per-thread local memory R/W per-block shared memory R/W per-grid global memory Read only per-grid constant memory Read only per-grid texture memory (Device) Grid Constant Memory Texture Memory Global Memory Block (0, 0) Shared Memory Local Memory Thread (0, 0) Registers Local Memory Thread (1, 0) Registers Block (1, 0) Shared Memory Local Memory Thread (0, 0) Registers Local Memory Thread (1, 0) Registers Host The host can R/W global , constant , and texture memories © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, UIUC Execution Model Thread identified by threadIdx Thread block Identified by blockIdx Grid of Thread Blocks Multiple levels of parallelism -Thread block -Up to 512 threads per block -Communicate through shared memory -Threads guaranteed to be resident -threadIdx, blockIdx -__syncthreads() -Grid of thread blocks -F <<< nblocks, nthreads >>> (a, b, c)
Background image of page 2
3 16 highly threaded SM’s, >128 FPU’s, 367 GFLOPS, 768 MB DRAM, 86.4 GB/S Mem BW, 4GB/S BW to CPU Load/store Global Memory Thread Execution Manager Input Assembler Host Texture Texture Texture Texture Texture Texture Texture Texture Texture Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache Load/store Load/store Load/store Load/store Load/store GeForce 8800 GTX © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, UIUC Load/store Global Memory Thread Execution Manager Input Assembler Host Texture Texture Texture Texture Texture Texture Texture Texture Texture Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache Parallel Data Cache Load/store Load/store Load/store Load/store Load/store GeForce 8800 GTX © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, UIUC TPC
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
4 GeForce-8 Series HW Overview TPC TPC TPC TPC TPC TPC TEX SM SP SP SP SP SFU SP SP SP SP SFU Instruction Fetch/Dispatch
Background image of page 4
Image of page 5
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 10/06/2010 for the course CS 8803 taught by Professor Staff during the Spring '08 term at Georgia Institute of Technology.

Page1 / 14

cs8803SC_lecture7 - CS8803SC Software and Hardware...

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online