12.1-gpus-2 - CS6230 HPC Tools and Applications...

Info iconThis preview shows pages 1–10. Sign up to view the full content.

View Full Document Right Arrow Icon
CS6230 HPC Tools and Applications Heterogeneous Computing with GPUs Jeffrey S. Vetter Computational Science and Engineering College of Computing Georgia Institute of Technology http://ft.ornl.gov/~vetter [email protected] Many slides in this lecture are courtesy of NVIDIA.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Thinking about Parallelism (2) 2 Assembler SIMD, AVX Compiler Libraries, Frameworks Core Threads Pthreads, OpenMP Distributed memory model like MPI, or GAS Languages Libraries, Frameworks Socket: Multicore Threads Pthreads, OpenMP Distributed memory model like MPI, or GAS Languages Memory-Thread affinity becomes much more important Libraries, Frameworks Node Distributed memory model like MPI, GAS Languages Libraries, Frameworks System Heterogeneity can exist at all levels Existing GPU, FPGA Archs
Background image of page 2
NVIDIA’s Fermi 3B transistors ECC 8x the peak double precision arithmetic performance over NVIDIA's last generation GPU. 512 CUDA Cores featuring the new IEEE 754-2008 floating-point standard NVIDIA Parallel DataCache NVIDIA GigaThread Engine Debuggers, language support 3
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Compiling C for CUDA Applications void serial_function(… ) { ... } void other_function(int . .. ) { ... } void saxpy_serial(float . .. ) { for (int i = 0; i < n; ++i) y[i] = a*x[i] + y[i]; } void main( ) { float x; saxpy_serial(. .); ... } NVCC (Open64) CPU Compiler C CUDA Key Kernels CUDA object files Rest of C Application CPU object files Linker CPU-GPU Executable Modify into Parallel CUDA code 4
Background image of page 4
C for CUDA : C with a few keywords void saxpy_serial(int n, float a, float *x, float *y) { for (int i = 0; i < n; ++i) y[i] = a*x[i] + y[i]; } // Invoke serial SAXPY kernel saxpy_serial(n, 2.0, x, y); __global__ void saxpy_parallel(int n, float a, float *x, float *y) { int i = blockIdx .x* blockDim .x + threadIdx .x; if (i < n) y[i] = a*x[i] + y[i]; } // Invoke parallel SAXPY kernel with 256 threads/block int nblocks = (n + 255) / 256; saxpy_parallel <<<nblocks, 256>>> (n, 2.0, x, y); Standard C Code Parallel C Code 5
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
6 8x Higher Linpack 80.1 656. 1 0 150 300 450 600 750 CPU Server GPU-CPU Server Performance Gflops 11 60 0 10 20 30 40 50 60 70 CPU Server GPU-CPU Server Performance / $ Gflops / $K 146 656 0 200 400 600 800 CPU Server GPU-CPU Server Performance / watt Gflops / kwatt CPU 1U Server: 2x Intel Xeon X5550 (Nehalem) 2.66 GHz, 48 GB memory, $7K, 0.55 kw GPU-CPU 1U Server: 2x Tesla C2050 + 2x Intel Xeon X5550, 48 GB memory, $11K, 1.0 kw 8x 5x 4.5x
Background image of page 6
OPENCL 1.0 TECHNICAL OVERVIEW 7
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
OpenCL Platform Model (Section 3.1) One Host + one or more Compute Devices Each Compute Device is composed of one or more Compute Units Each Compute Unit is further divided into one or more Processing Elements 8
Background image of page 8
OpenCL Execution Model (Section 3.2) OpenCL Program: Kernels Basic unit of executable code similar to a C function Data-parallel or task-parallel Host Program Collection of compute kernels and internal functions Analogous to a dynamic library Kernel Execution The host program invokes a kernel over an index space called an NDRange NDRange = “N - Dimensional Range” NDRange can be a 1, 2, or 3-dimensional space A single kernel instance at a point in the index space is called a work-item
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 10
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 42

12.1-gpus-2 - CS6230 HPC Tools and Applications...

This preview shows document pages 1 - 10. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online