g80-control-flow - ECE 498AL Lecture 10: Control Flow David...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lecture 10: Control Flow © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 2 Objective • To understand the implications of control flow on – Branch divergence overhead – SM execution resource utilization • To learn better ways to write code with control flow • To understand compiler/HW predication designed to reduce the impact of control flow – There is a cost involved.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 3 Quick terminology review Thread : concurrent code and associated state executed on the CUDA device (in parallel with other threads) The unit of parallelism in CUDA Warp : a group of threads executed physically in parallel in G80 Block : a group of threads that are executed together and form the unit of resource assignment Grid : a group of thread blocks that must all complete before the next phase of the program can begin © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 4 How thread blocks are partitioned Thread blocks are partitioned into warps Thread IDs within a warp are consecutive and increasing Warp 0 starts with Thread ID 0 Partitioning is always the same Thus you can use this knowledge in control flow However, the exact size of warps may change from generation to generation (Covered next) However, DO NOT rely on any ordering between warps If there are any dependencies between threads, you must __syncthreads() to get correct results
Background image of page 2
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 5 Control Flow Instructions Main performance concern with branching is divergence Threads within a single warp take different paths Different execution paths are serialized in G80 The control paths taken by the threads in a warp are traversed one at a time until there is no more. A common case: avoid divergence when branch condition is a function of thread ID Example with divergence: • If (threadIdx.x > 2) { } This creates two different control paths for threads in a block Branch granularity < warp size; threads 0 and 1 follow different path
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 4
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 10/03/2011 for the course CDA 6938 taught by Professor Zou,c during the Spring '08 term at University of Central Florida.

Page1 / 11

g80-control-flow - ECE 498AL Lecture 10: Control Flow David...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online