This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Chapter 6 Volume Ray Casting on CUDA The performance of graphics processors (GPUs) is improving at a rapid rate, almost doubling every year. Such an evolution has been made possible because the GPU is specialized for highly parallel compute-intensive applications, primarily graphics rendering, and thus designed such that more transistors are devoted to computation rather than caching and branch prediction units. Due to compute- intensive applications’ high arithmetic intensity (the ratio of arithmetic operations to memory operations), the memory latency can be hidden with computations instead of using caches on GPUs. In addition, since the same instructions are executed on many data elements in parallel, sophisticated flow control units such as branch prediction units in CPUs are not required on GPUs as much. Although the performance of 3D graphics rendering achieved by dedicating graphics hardware to it far exceeds the performance achievable from just using CPU, graphics programmers had up to now to give up programmability in exchange for speed. They were limited to using a fixed set of graphics operations. On the other hand, instead of using GPUs, images for films and videos are rendered using an off-line rendering system that uses general purpose CPUs to render a frame in hours because the general purpose CPUs give graphics programmers a lot of flexibility to 95 create rich effects. The generality and flexibility of CPUs are what the GPU has been missing until very recently. In order to reduce the gap, graphics hardware designers have continuously introduced more programmability through several generations of GPUs. Up until 2000, no programmability was supported in GPUs. However, in 2001, vertex-level programmability started to appear, and in 2002, pixel-level programmability also started being provided on GPUs such as NVIDIA’s GeForce FX family and ATI’s Radeon 9700 series. This level of programmability allows programmers to have considerably more configurability by making it possible to specify a sequence of instructions for processing both vertex and fragment processors. However, accessing the computational power of GPUs for non-graphics appli- cations or global illumination rendering such as ray tracing often requires ingenious efforts. One reason is that GPUs could only be programmed using a graphics API such as OpenGL, which imposes a significant overhead to the non-graphics appli- cations. Programmers had to express their algorithms in terms of the inadequate APIs, which required sometimes heroic efforts to make an efficient use of the GPU. Another reason is the limited writing capability of the GPU. The GPU program could gather data element from any part of memory, but could not scatter data to arbitrary locations, which removes lots of the programming flexibility available on the CPU....
View Full Document
- Spring '09
- Computer Graphics, Central processing unit, CUDA, cell processor