Nvidia corporation 2012 threadblock size and

Info icon This preview shows pages 67–76. Sign up to view the full content.

© NVIDIA Corporation 2012 Threadblock Size and Occupancy Threadblock size is a multiple of warp size (32) Even if you request fewer threads, HW rounds up Threadblocks can be too small Kepler SM can run up to 16 threadblocks concurrently SM may reach the block limit before reaching good occupancy - Example: 1-warp blocks -> 16 warps per Kepler SM (frequently not enough) Threadblocks can be too big Quantization effect: - Enough SM resources for more threads, not enough for another large block - A threadblock isn’t started until resources are available for all of its threads
Image of page 67

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

© NVIDIA Corporation 2012 Threadblock Sizing SM resources: Registers Shared memory Number of warps allowed by SM resources Too few threads per block Too many threads per block
Image of page 68