{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Carnegie Mellon Parralel Computing Notes on Lecture 6

Carnegie Mellon Parralel Computing Notes on Lecture 6 -...

Info icon This preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon
Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2014 Lecture 6: Programming for Performance, Part 1: Work Distribution
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
CMU 15-418, Spring 2014 The Heavy Colleen (Great Vengeance and Furious Fire) Tunes “Colleen? Ha, that wasn’t about a girl. We wrote that one about the dangers of premature program optimization. It burns everyone, and it’s certainly burned me.” - Kelvin Swaby
Image of page 2
CMU 15-418, Spring 2014 Today Review mapping of CUDA programming to GPUs Solver example in the message passing model Begin discussing techniques for optimizing parallel programs
Image of page 3

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
CMU 15-418, Spring 2014 Finishing up a few CUDA concepts...
Image of page 4
CMU 15-418, Spring 2014 Review: executing warps on GTX 680 = SIMD functional unit, control shared across 32 units (1 MUL-ADD per clock) = “special” SIMD functional unit, control shared across 32 units (operations like sin/cos) = SIMD load/store unit (handles warp loads/stores, gathers/scatters) SMX core operation each clock: - Select up to four runnable warps from up to 64 resident on core (thread-level parallelism) - Select up to two runnable instructions per warp (instruction-level parallelism) - Execute instructions on available groups of SIMD ALUs, special-function ALUs, or LD/ST units
Image of page 5

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
CMU 15-418, Spring 2014 Review: scheduling threads in a thread block Imagine a thread block with 256 CUDA threads Only 4 warps worth of parallel execution in HW Why not just have a pool of four “worker” warps? (e.g., run 4x32=128 threads in block to completion, then run next 128 threads in block) #define THREADS_PER_BLK 256 __global__ void convolve(int N, float* input, float* output) { __shared__ float support[THREADS_PER_BLK+2]; int index = blockIdx.x * blockDim.x + threadIdx.x; support[ threadIdx.x ] = input[index]; if (threadIdx.x < 2) { support[THREADS_PER_BLK+threadIdx.x] = input[index+THREADS_PER_BLK]; } __syncthreads(); float result = 0.0f; // thread-­‐local variable for (int i=0; i<3; i++) result += support[ threadIdx.x + i]; output[index] = result; } CUDA kernels may create dependencies between threads in a block Simplest example is __syncthreads() Threads in a block cannot be executed by the system in any order when dependencies exist. CUDA semantics: threads in a block ARE running concurrently. If a thread in a block is runnable it will eventually be run! (no deadlock)
Image of page 6
CMU 15-418, Spring 2014 CUDA execution semantics Thread blocks can be scheduled in any order by the system - System assumes no dependencies - A lot like ISPC tasks, right? Threads in a block DO run concurrently - When block begin execution, all threads are running concurrently (these semantics impose a scheduling constraint on the system) - A CUDA thread block is itself an SPMD program (like an ISPC gang of program instances) - Threads in thread-block are concurrent, cooperating “workers” CUDA implementation: - A Kepler GPU warp has performance characteristics akin to an ISPC gang of instances (but unlike an ISPC gang, the warp concept is not CUDA programming model concept *) - All warps in a thread block are scheduled onto the same core, allowing for high-BW/low latency communication through shared memory variables -
Image of page 7

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 8
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern