Chapter 6
Volume Ray Casting on CUDA
The performance of graphics processors (GPUs) is improving at a rapid rate,
almost doubling every year.
Such an evolution has been made possible because
the GPU is specialized for highly parallel compute-intensive applications, primarily
graphics rendering, and thus designed such that more transistors are devoted to
computation rather than caching and branch prediction units.
Due to compute-
intensive applications’ high arithmetic intensity (the ratio of arithmetic operations to
memory operations), the memory latency can be hidden with computations instead
of using caches on GPUs.
In addition, since the same instructions are executed
on many data elements in parallel, sophisticated flow control units such as branch
prediction units in CPUs are not required on GPUs as much.
Although the performance of 3D graphics rendering achieved by dedicating
graphics hardware to it far exceeds the performance achievable from just using CPU,
graphics programmers had up to now to give up programmability in exchange for
speed. They were limited to using a fixed set of graphics operations. On the other
hand, instead of using GPUs, images for films and videos are rendered using an
off-line rendering system that uses general purpose CPUs to render a frame in hours
because the general purpose CPUs give graphics programmers a lot of flexibility to
95
This
preview
has intentionally blurred sections.
Sign up to view the full version.
create rich effects. The generality and flexibility of CPUs are what the GPU has
been missing until very recently.
In order to reduce the gap, graphics hardware designers have continuously
introduced more programmability through several generations of GPUs. Up until
2000, no programmability was supported in GPUs. However, in 2001, vertex-level
programmability started to appear, and in 2002, pixel-level programmability also
started being provided on GPUs such as NVIDIA’s GeForce FX family and ATI’s
Radeon 9700 series.
This level of programmability allows programmers to have
considerably more configurability by making it possible to specify a sequence of
instructions for processing both vertex and fragment processors.
However, accessing the computational power of GPUs for non-graphics appli-
cations or global illumination rendering such as ray tracing often requires ingenious
efforts. One reason is that GPUs could only be programmed using a graphics API
such as OpenGL, which imposes a significant overhead to the non-graphics appli-
cations. Programmers had to express their algorithms in terms of the inadequate
APIs, which required sometimes heroic efforts to make an efficient use of the GPU.
Another reason is the limited writing capability of the GPU. The GPU program
could gather data element from any part of memory, but could not scatter data to
arbitrary locations, which removes lots of the programming flexibility available on
the CPU.

This is the end of the preview.
Sign up
to
access the rest of the document.
- Spring '09
- Shinyeonggil
- Computer Graphics, Central processing unit, CUDA, cell processor
-
Click to edit the document details