Ch07-AdvCompArch-ManycoresAndGPUs-PaulKelly-V03

Warp instruction operates on 32 data items 16 17

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ch SM executes a pool of “warps”, with a separate instruc(on pointer for each warp. Instruc(ons are issued from each ready ­to ­run warp in turn (SMT, hyperthreading) SIMT: Single ­Instruc(on, Mul(ple ­Thread    SIMT: each warp instruction operates on 32 data items: 8 9 16 17 … … … … 0 1 24 25 Thread 12’s state    Each thread’s state is represented by a “lane” of the SIMD register set    CUDA programmers write code for each thread as if it runs independently – but actually, instructions are broadcast to 32 threads in parallel 31 SIMT: Single ­Instruc(on, Mul(ple ­Thread    SIMT: each warp instruction operates on 32 data items: 8 9 16 17 … … … … 0 1 24 25 Predicate bits: enable/disable each lane Thread 12’s state : : if (x == 10) c = c + 1; : : : LDR r5, X p1 <- r5 eq 10 <p1> LDR r1 <- C <p1> ADD r1, r1, 1 <p1> STR r1 -> C : 31 SIMT: Single ­Instruc(on, Mul(ple ­Thread    SIMT: each warp instruction operates on 32 data items: 16 17 … … … … 8 9 0 1 24 25 Predicate bits: enable/disable each lane    We can jump          If the whole warp’s predicate bits are the same and the branch skips enough (7?) instructions Control-flow coherence: every thread goes the same way (a for...
View Full Document

This document was uploaded on 03/18/2014 for the course CO 332 at Imperial College.

Ask a homework question - tutors are online