nvcc_2.3

nvcc_2.3 - Last modified on: <07-29-2009> The CUD A...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Last modified on: <07-29-2009> The CUD A Com piler D river N VCC nvcc.pdf v2.3 1 July 2009 Introduction Overview CUDA programming model The CUDA Toolkit targets a class of applications whose control part runs as a process on a general purpose computer (Linux, Windows), and which use one or more NVIDIA GPUs as coprocessors for accelerating SIMD parallel jobs. Such jobs are self- contained, in the sense that they can be executed and completed by a batch of GPU threads entirely without intervention by the host process, thereby gaining optimal benefit from the parallel graphics hardware. Dispatching GPU jobs by the host process is supported by the CUDA Toolkit in the form of remote procedure calling. The GPU code is implemented as a collection of functions in a language that is essentially C, but with some annotations for distinguishing them from the host code, plus annotations for distinguishing different types of data memory that exists on the GPU. Such functions may have parameters, and they can be called using a syntax that is very similar to regular C function calling, but slightly extended for being able to specify the matrix of GPU threads that must execute the called function. During its life time, the host process may dispatch many parallel GPU tasks. See Figure 1. CUDA sources Hence, source files for CUDA applications consist of a mixture of conventional C++ host code, plus GPU device (i.e. GPU-) functions. The CUDA compilation trajectory separates the device functions from the host code, compiles the device functions using proprietary NVIDIA compilers/assemblers, compiles the host code using a general purpose C/C++ compiler that is available on the host platform, and afterwards embeds the compiled GPU functions as load images in the host object file. In the linking stage, specific CUDA runtime libraries are added for supporting remote SIMD procedure calling and for providing explicit GPU manipulation such as allocation of GPU memory buffers and host-GPU data transfer. Purpose of nvcc This compilation trajectory involves several splitting, compilation, preprocessing, and merging steps for each CUDA source file, and several of these steps are subtly different for different modes of CUDA compilation (such as compilation for device emulation, or the generation of device code repositories). It is the purpose of the CUDA compiler driver nvcc to hide the intricate details of CUDA compilation from developers. Additionally, instead of being a specific CUDA compilation driver, nvcc mimics the behavior of the GNU compiler gcc: it accepts a range of conventional compiler options, such as for defining macros and include/library The CUDA compiler driver nvcc nvcc.pdf v2.3 2 July 2009 paths, and for steering the compilation process. All non-CUDA compilation steps are forwarded to a general purpose C compiler that is supported by nvcc, and on Windos platforms, where this compiler is an instance of the Microsoft Visual Studio...
View Full Document

This note was uploaded on 02/09/2010 for the course CS 152 taught by Professor Staff during the Spring '98 term at UCLA.

Page1 / 39

nvcc_2.3 - Last modified on: <07-29-2009> The CUD A...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online