CUDA_OpenCL_report

CUDA_OpenCL_report - CUDA and OpenCL CUDA Introduction:...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
CUDA and OpenCL CUDA Introduction: CUDA stands for Compute Unified Device Architecture and is new hardware and software archi- tecture (from NVIDIA) for issuing and managing computations on the GPU as data-parallel computing device without the need of mapping them to a graphics API. The CUDA API comprises an extension to the C programming language for a minimum learning curve. [1] The CUDA software stack is composed of several layers as illustrated in Figure: a hardware driver, an application programming interface (API) and its runtime, and two higher-level mathematical libraries of common usage, CUFFT and CUBLAS. The hardware has been designed to support lightweight driver and runtime layers, resulting in high performance. CUDA software stack CUDA provides general DRAM memory addressing for more programming flexibility: both scatter and gather memory operations. From a programming perspective, this translates into the ability to read and write data at any location in DRAM, just like on a CPU. It features a parallel data cache or on-chip shared memories with very fast general read and write access, that threads use to share data with each other. Applications can take advantage of it by minimizing over fetch and round-trips to DRAM and therefore becoming less dependent on DRAM memory bandwidth. More Details can be found on Programming Guide [1].
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
The advent of high performance computing by the use of Nvidia Graphics card GeForce 8800 GTX with the CUDA libraries enhances users to run applications with Giga Flops speed. The GPU, with its multithreaded processor have offered immensurable power to speed up computation. Environment like met computing needs information in a large dataset to be extracted and processed using the parallel graphics processing units (GPUs). Concurrent execution of parallel threads and parallel data cache enable these threads to share the datasets. It is specialized for compute intensive highly data-parallel computations. Parallel GPU is especially well suited to address problems that can be expressed as a data-parallel computations - the same program is executed on many data elements in parallel with high arithmetic intensity - the ratio of arithmetic operations to memory operations. Because the same program is executed for each data element, there is lower requirement for sophisticated flow control and because it is executed on many data element and has high arithmetic intensity, the memory access latency can be hidden with calculations instead of big data caches. To harness these powers, the parallel programming techniques, this remained to be tricky due to high imposition from graphics API. This document tries to eases by illustrating how to program the GPU for general purpose computing tasks by taking easy examples and explaining them. Programming Model:
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 12

CUDA_OpenCL_report - CUDA and OpenCL CUDA Introduction:...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online