CUFFT_Library_2.3

CUFFT_Library_2.3 - CUDA CUFFT Library PG-00000-003_V2.3...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CUDA CUFFT Library PG-00000-003_V2.3 June, 2009 CUFFT Library PG-00000-003_V2.3 Confidential Information Published by NVIDIA Corporation 2701 San Tomas Expressway Santa Clara, CA 95050 Notice This source code is subject to NVIDIA ownership rights under U.S. and international Copyright laws. This software and the information contained herein is PROPRIETARY and CONFIDENTIAL to NVIDIA and is being provided under the terms and conditions of a NonDisclosure Agreement. Any reproduction or disclosure to any third party without the express written consent of NVIDIA is prohibited. NVIDIA MAKES NO REPRESENTATION ABOUT THE SUITABILITY OF THIS SOURCE CODE FOR ANY PURPOSE. IT IS PROVIDED "AS IS" WITHOUT EXPRESS OR IMPLIED WARRANTY OF ANY KIND. NVIDIA DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOURCE CODE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY, NONINFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL NVIDIA BE LIABLE FOR ANY SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOURCE CODE. U.S. Government End Users. This source code is a "commercial item" as that term is defined at 48 C.F.R. 2.101 (OCT 1995), consisting of "commercial computer software" and "commercial computer software documentation" as such terms are used in 48 C.F.R. 12.212 (SEPT 1995) and is provided to the U.S. Government only as a commercial end item. Consistent with 48 C.F.R.12.212 and 48 C.F.R. 227.72021 through 227.72024 (JUNE 1995), all U.S. Government End Users acquire the source code with only those rights set forth herein. Trademarks NVIDIA, CUDA, and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation in the United States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated. Copyright 20062009 by NVIDIA Corporation. All rights reserved. NVIDIA Corporation Table of Contents CUFFT Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 CUFFT Types and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Type cufftHandle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Type cufftResult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Type cufftReal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Type cufftDoubleReal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Type cufftComplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Type cufftDoubleComplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CUFFT Transform Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CUFFT Transform Directions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CUFFT API Functions . . . . . . . Function cufftPlan1d(). . . . Function cufftPlan2d(). . . . Function cufftPlan3d(). . . . Function cufftDestroy() . . . Function cufftExecC2C() . . Function cufftExecR2C() . . Function cufftExecC2R() . . Function cufftExecZ2Z() . . Function cufftExecD2Z() . . Function cufftExecZ2D() . . .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ... ... ... ... ... ... ... ... ... ... ... . . . . . . . . . . . . . . . . . . .... .... .... .... .... .... .... .... .... .... .... . . . . . . . . . . . . . . . . . . . . . . . . 1 2 2 2 3 3 3 3 4 . 5 . 6 . 7 . 7 . 8 . 8 . 9 10 11 12 12 15 15 16 16 17 18 Accuracy and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 CUFFT Code Examples. . . . . . . . . . . . . . 1D Complex-to-Complex Transforms . . 1D Real-to-Complex Transforms . . . . . 2D Complex-to-Complex Transforms . . 2D Complex-to-Real Transforms . . . . . 3D Complex-to-Complex Transforms . . PG-00000-003_V2.3 NVIDIA iii CUFFT Library This document describes CUFFT, the NVIDIA CUDATM (compute unified device architecture) Fast Fourier Transform (FFT) library. The FFT is a divideandconquer algorithm for efficiently computing discrete Fourier transforms of complex or realvalued data sets, and it is one of the most important and widely used numerical algorithms, with applications that include computational physics and general signal processing. The CUFFT library provides a simple interface for computing parallel FFTs on an NVIDIA GPU, which allows users to leverage the floatingpoint power and parallelism of the GPU without having to develop a custom, GPUbased FFT implementation. FFT libraries typically vary in terms of supported transform sizes and data types. For example, some libraries only implement Radix2 FFTs, restricting the transform size to a power of two, while other implementations support arbitrary transform sizes. This version of the CUFFT library supports the following features: 1D, 2D, and 3D transforms of complex and realvalued data Batch execution for doing multiple 1D transforms in parallel 2D and 3D transform sizes in the range [2, 16384] in any dimension 1D transform sizes up to 8 million elements Inplace and outofplace transforms for real and complex data Doubleprecision transforms on compatible hardware (GT200 and later GPUs) CUFFT Types and Definitions The next sections describe the CUFFT types and transform directions: "Type cufftHandle" on page 2 "Type cufftResult" on page 2 "Type cufftReal" on page 2 PG-00000-003_V2.3 1 NVIDIA CUDA CUFFT Library "Type cufftDoubleReal" on page 3 "Type cufftComplex" on page 3 "Type cufftDoubleComplex" on page 3 "CUFFT Transform Types" on page 3 "CUFFT Transform Directions" on page 4 Type cufftHandle typedef unsigned int cufftHandle; is a handle type used to store and access CUFFT plans. For example, the user receives a handle after creating a CUFFT plan and uses this handle to execute the plan. Type cufftResult typedef enum cufftResult_t cufftResult; is an enumeration of values used exclusively as API function return values. The possible return values are defined as follows: Return Values CUFFT_SUCCESS CUFFT_INVALID_PLAN CUFFT_ALLOC_FAILED CUFFT_INVALID_TYPE CUFFT_INVALID_VALUE CUFFT_INTERNAL_ERROR CUFFT_EXEC_FAILED CUFFT_SETUP_FAILED CUFFT_SHUTDOWN_FAILED CUFFT_INVALID_SIZE Any CUFFT operation is successful. CUFFT is passed an invalid plan handle. CUFFT failed to allocate GPU memory. The user requests an unsupported type. The user specifies a bad memory pointer. Used for all internal driver errors. CUFFT failed to execute an FFT on the GPU. The CUFFT library failed to initialize. The CUFFT library failed to shut down. The user specifies an unsupported FFT size. Type cufftReal typedef float cufftReal; is a singleprecision, floatingpoint real data type. PG-00000-003_V2.3 NVIDIA 2 CUDA CUFFT Library Type cufftDoubleReal typedef double cufftDoubleReal; is a doubleprecision, floatingpoint real data type. Type cufftComplex typedef cuComplex cufftComplex; is a singleprecision, floatingpoint complex data type that consists of interleaved real and imaginary components. Type cufftDoubleComplex typedef cuDoubleComplex cufftDoubleComplex; is a doubleprecision, floatingpoint complex data type that consists of interleaved real and imaginary components. CUFFT Transform Types The CUFFT library supports complex and realdata transforms. The cufftType data type is an enumeration of the types of transform data supported by CUFFT: typedef enum cufftType_t { CUFFT_R2C = 0x2a, // Real to complex (interleaved) CUFFT_C2R = 0x2c, // Complex (interleaved) to real CUFFT_C2C = 0x29, // Complex to complex, interleaved CUFFT_D2Z = 0x6a, // Double to double-complex CUFFT_Z2D = 0x6c, // Double-complex to double CUFFT_Z2Z = 0x69 // Double-complex to double-complex } cufftType; For complex FFTs, the input and output arrays must interleave the real and imaginary parts (the cufftComplex type). The transform size in each dimension is the number of cufftComplex elements. The CUFFT_C2C constant can be passed to any plan creation function to configure a singleprecision complextocomplex FFT. Pass the CUFFT_Z2Z constant to configure a doubleprecision complexto complex FFT. 3 NVIDIA PG-00000-003_V2.3 CUDA CUFFT Library For realtocomplex FFTs, the output array holds only the non redundant complex coefficients. So for an Nelement transform, the output array holds N/2+1 cufftComplex terms. For higher dimensional real transforms of the form N0N1...Nn, the last dimension is cut in half such that the output data is N0N1...(Nn/ 2+1) complex elements. Therefore, in order to perform an inplace FFT, the user has to pad the input array in the last dimension to (Nn/ 2+1) complex elements or 2*(N/2+1) real elements. Note that the realtocomplex transform is implicitly forward. Passing the CUFFT_R2C constant to any plan creation function configures a single precision realtocomplex FFT. Passing the CUFFT_D2Z constant configures a doubleprecision realtocomplex FFT. The requirements for complextoreal FFTs are similar to those for real tocomplex. In this case, the input array holds only the nonredundant, N/2+1 complex coefficients from a realtocomplex transform. The output is simply N elements of type cufftReal. However, for an in place transform, the input size must be padded to 2*(N/2+1) real elements. The complextoreal transform is implicitly inverse. Passing the CUFFT_C2R constant to any plan creation function configures a singleprecision complextoreal FFT. Passing CUFFT_Z2D constant configures a doubleprecision complextoreal FFT. For 1D complextocomplex transforms, the stride between signals in a batch is assumed to be the number of cufftComplex elements in the logical transform size. However, for realdata FFTs, the distance between signals in a batch depends on whether the transform is in place or outofplace. For inplace FFTs, the input stride is assumed to be 2*(N/2+1) cufftReal elements or N/2+1 cufftComplex elements. For outofplace transforms, the input and output strides match the logical transform size (N) and the nonredundant size (N/2+1), respectively. CUFFT Transform Directions The CUFFT library defines forward and inverse Fast Fourier Transforms according to the sign of the complex exponential term: #define CUFFT_FORWARD -1 #define CUFFT_INVERSE 1 PG-00000-003_V2.3 NVIDIA 4 CUDA CUFFT Library For higherdimensional transforms (2D and 3D), CUFFT performs FFTs in rowmajor or C order. For example, if the user requests a 3D transform plan for sizes X, Y, and Z, CUFFT transforms along Z, Y, and then X. The user can configure columnmajor FFTs by simply changing the order of the size parameters to the plan creation API functions. CUFFT performs unnormalized FFTs; that is, performing a forward FFT on an input data set followed by an inverse FFT on the resulting set yields data that is equal to the input scaled by the number of elements. Scaling either transform by the reciprocal of the size of the data set is left for the user to perform as seen fit. CUFFT API Functions The CUFFT API is modeled after FFTW (see http://www.fftw.org), which is one of the most popular and efficient CPUbased FFT libraries. FFTW provides a simple configuration mechanism called a plan that completely specifies the optimal--that is, the minimum floatingpoint operation (flop)--plan of execution for a particular FFT size and data type. The advantage of this approach is that once the user creates a plan, the library stores whatever state is needed to execute the plan multiple times without recalculation of the configuration. The FFTW model works well for CUFFT because different kinds of FFTs require different thread configurations and GPU resources, and plans are a simple way to store and reuse configurations. The CUFFT library initializes internal data upon the first invocation of an API function. Therefore, all API functions could return the CUFFT_SETUP_FAILED error code if the library fails to initialize. CUFFT shuts down automatically when all usercreated FFT plans are destroyed. The CUFFT functions are as follows: "Function cufftPlan1d()" on page 6 "Function cufftPlan2d()" on page 7 "Function cufftPlan3d()" on page 7 "Function cufftDestroy()" on page 8 5 NVIDIA PG-00000-003_V2.3 CUDA CUFFT Library "Function cufftExecC2C()" on page 8 "Function cufftExecR2C()" on page 9 "Function cufftExecC2R()" on page 10 "Function cufftExecZ2Z()" on page 11 "Function cufftExecD2Z()" on page 12 "Function cufftExecZ2D()" on page 12 Function cufftPlan1d() cufftResult cufftPlan1d( cufftHandle *plan, int nx, cufftType type, int batch ); creates a 1D FFT plan configuration for a specified signal size and data type. The batch input parameter tells CUFFT how many 1D transforms to configure. Input plan nx type batch Pointer to a cufftHandle object The transform size (e.g., 256 for a 256-point FFT) The transform data type (e.g., CUFFT_C2C for complex to complex) Number of transforms of size nx Contains a CUFFT 1D plan handle value CUFFT library failed to initialize. The nx parameter is not a supported size. The type parameter is not supported. Allocation of GPU resources for the plan failed. CUFFT successfully created the FFT plan. Output plan Return Values CUFFT_SETUP_FAILED CUFFT_INVALID_SIZE CUFFT_INVALID_TYPE CUFFT_ALLOC_FAILED CUFFT_SUCCESS PG-00000-003_V2.3 NVIDIA 6 CUDA CUFFT Library Function cufftPlan2d() cufftResult cufftPlan2d( cufftHandle *plan, int nx, int ny, cufftType type ); creates a 2D FFT plan configuration according to specified signal sizes and data type. This function is the same as cufftPlan1d() except that it takes a second size parameter, ny, and does not support batching. Input plan nx ny type Pointer to a cufftHandle object The transform size in the X dimension (number of rows) The transform size in the Y dimension (number of columns) The transform data type (e.g., CUFFT_C2R for complex to real) Contains a CUFFT 2D plan handle value CUFFT library failed to initialize. The nx or ny parameter is not a supported size. The type parameter is not supported. Allocation of GPU resources for the plan failed. CUFFT successfully created the FFT plan. Output plan Return Values CUFFT_SETUP_FAILED CUFFT_INVALID_SIZE CUFFT_INVALID_TYPE CUFFT_ALLOC_FAILED CUFFT_SUCCESS Function cufftPlan3d() cufftResult cufftPlan3d( cufftHandle *plan, int nx, int ny, int nz, cufftType type ); creates a 3D FFT plan configuration according to specified signal sizes and data type. This function is the same as cufftPlan2d() except that it takes a third size parameter nz. : Input plan nx ny Pointer to a cufftHandle object The transform size in the X dimension The transform size in the Y dimension 7 NVIDIA PG-00000-003_V2.3 CUDA CUFFT Library Input (continued) nz type The transform size in the Z dimension The transform data type (e.g., CUFFT_R2C for real to complex) Contains a CUFFT 3D plan handle value CUFFT library failed to initialize. Parameter nx, ny, or nz is not a supported size. The type parameter is not supported. Allocation of GPU resources for the plan failed. CUFFT successfully created the FFT plan. Output plan Return Values CUFFT_SETUP_FAILED CUFFT_INVALID_SIZE CUFFT_INVALID_TYPE CUFFT_ALLOC_FAILED CUFFT_SUCCESS Function cufftDestroy() cufftResult cufftDestroy( cufftHandle plan ); frees all GPU resources associated with a CUFFT plan and destroys the internal plan data structure. This function should be called once a plan is no longer needed to avoid wasting GPU memory. Input plan The cufftHandle object of the plan to be destroyed. CUFFT library failed to initialize. CUFFT library failed to shut down. The plan parameter is not a valid handle. CUFFT successfully destroyed the FFT plan. Return Values CUFFT_SETUP_FAILED CUFFT_SHUTDOWN_FAILED CUFFT_INVALID_PLAN CUFFT_SUCCESS Function cufftExecC2C() cufftResult cufftExecC2C( cufftHandle plan, cufftComplex *idata, cufftComplex *odata, int direction ); executes a CUFFT singleprecision complextocomplex transform plan as specified by direction. CUFFT uses as input data the GPU PG-00000-003_V2.3 8 NVIDIA CUDA CUFFT Library memory pointed to by the idata parameter. This function stores the Fourier coefficients in the odata array. If idata and odata are the same, this method does an inplace transform. Input The cufftHandle object for the plan to update idata Pointer to the single-precision complex input data (in GPU memory) to transform odata Pointer to the single-precision complex output data (in GPU memory) direction The transform direction: CUFFT_FORWARD or CUFFT_INVERSE plan Output odata Contains the complex Fourier coefficients CUFFT library failed to initialize. The plan parameter is not a valid handle. The idata, odata, and/or direction parameter is not valid. CUFFT failed to execute the transform on GPU. CUFFT successfully executed the FFT plan. Return Values CUFFT_SETUP_FAILED CUFFT_INVALID_PLAN CUFFT_INVALID_VALUE CUFFT_EXEC_FAILED CUFFT_SUCCESS Function cufftExecR2C() cufftResult cufftExecR2C( cufftHandle plan, cufftReal *idata, cufftComplex *odata ); executes a CUFFT singleprecision realtocomplex (implicitly forward) transform plan. CUFFT uses as input data the GPU memory pointed to by the idata parameter. This function stores the non redundant Fourier coefficients in the odata array. If idata and odata are the same, this method does an inplace transform (See "CUFFT Transform Types" on page 3 for details on real data FFTs.) Input plan The cufftHandle object for the plan to update 9 NVIDIA PG-00000-003_V2.3 CUDA CUFFT Library Input (continued) idata odata Pointer to the single-precision real input data (in GPU memory) to transform Pointer to the single-precision complex output data (in GPU memory) Contains the complex Fourier coefficients CUFFT library failed to initialize. The plan parameter is not a valid handle. The idata and/or odata parameter is not valid. CUFFT failed to execute the transform on GPU. CUFFT successfully executed the FFT plan. Output odata Return Values CUFFT_SETUP_FAILED CUFFT_INVALID_PLAN CUFFT_INVALID_VALUE CUFFT_EXEC_FAILED CUFFT_SUCCESS Function cufftExecC2R() cufftResult cufftExecC2R( cufftHandle plan, cufftComplex *idata, cufftReal *odata ); executes a CUFFT singleprecision complextoreal (implicitly inverse) transform plan. CUFFT uses as input data the GPU memory pointed to by the idata parameter. The input array holds only the non redundant complex Fourier coefficients. This function stores the real output values in the odata array. If idata and odata are the same, this method does an inplace transform. (See "CUFFT Transform Types" on page 3 for details on real data FFTs.) Input plan idata odata The cufftHandle object for the plan to update Pointer to the single-precision complex input data (in GPU memory) to transform Pointer to the single-precision real output data (in GPU memory) Contains the real-valued output data Output odata PG-00000-003_V2.3 NVIDIA 10 CUDA CUFFT Library Return Values CUFFT_SETUP_FAILED CUFFT_INVALID_PLAN CUFFT_INVALID_VALUE CUFFT_EXEC_FAILED CUFFT_SUCCESS CUFFT library failed to initialize. The plan parameter is not a valid handle. The idata and/or odata parameter is not valid. CUFFT failed to execute the transform on GPU. CUFFT successfully executed the FFT plan. Function cufftExecZ2Z() cufftResult cufftExecZ2Z( cufftHandle plan, cufftDoubleComplex *idata, cufftDoubleComplex *odata, int direction ); executes a CUFFT doubleprecision complextocomplex transform plan as specified by direction. CUFFT uses as input data the GPU memory pointed to by the idata parameter. This function stores the Fourier coefficients in the odata array. If idata and odata are the same, this method does an inplace transform. Input The cufftHandle object for the plan to update Pointer to the double-precision complex input data (in GPU memory) to transform odata Pointer to the double-precision complex output data (in GPU memory) direction The transform direction: CUFFT_FORWARD or CUFFT_INVERSE plan idata Output odata Contains the complex Fourier coefficients CUFFT library failed to initialize. The plan parameter is not a valid handle. The idata, odata, and/or direction parameter is not valid. CUFFT failed to execute the transform on GPU. CUFFT successfully executed the FFT plan. Return Values CUFFT_SETUP_FAILED CUFFT_INVALID_PLAN CUFFT_INVALID_VALUE CUFFT_EXEC_FAILED CUFFT_SUCCESS 11 NVIDIA PG-00000-003_V2.3 CUDA CUFFT Library Function cufftExecD2Z() cufftResult cufftExecD2Z( cufftHandle plan, cufftDoubleReal *idata, cufftDoubleComplex *odata ); executes a CUFFT doubleprecision realtocomplex (implicitly forward) transform plan. CUFFT uses as input data the GPU memory pointed to by the idata parameter. This function stores the non redundant Fourier coefficients in the odata array. If idata and odata are the same, this method does an inplace transform (See "CUFFT Transform Types" on page 3 for details on real data FFTs.) Input plan idata odata The cufftHandle object for the plan to update Pointer to the double-precision real input data (in GPU memory) to transform Pointer to the double-precision complex output data (in GPU memory) Contains the complex Fourier coefficients CUFFT library failed to initialize. The plan parameter is not a valid handle. The idata and/or odata parameter is not valid. CUFFT failed to execute the transform on GPU. CUFFT successfully executed the FFT plan. Output odata Return Values CUFFT_SETUP_FAILED CUFFT_INVALID_PLAN CUFFT_INVALID_VALUE CUFFT_EXEC_FAILED CUFFT_SUCCESS Function cufftExecZ2D() cufftResult cufftExecZ2D( cufftHandle plan, cufftDoubleComplex *idata, cufftDoubleReal *odata ); executes a CUFFT doubleprecision complextoreal (implicitly inverse) transform plan. CUFFT uses as input data the GPU memory pointed to by the idata parameter. The input array holds only the nonredundant complex Fourier coefficients. This function stores the PG-00000-003_V2.3 NVIDIA 12 CUDA CUFFT Library real output values in the odata array. If idata and odata are the same, this method does an inplace transform. (See "CUFFT Transform Types" on page 3 for details on real data FFTs.) Input plan idata odata The cufftHandle object for the plan to update Pointer to the double-precision complex input data (in GPU memory) to transform Pointer to the double-precision real output data (in GPU memory) Contains the real-valued output data CUFFT library failed to initialize. The plan parameter is not a valid handle. The idata and/or odata parameter is not valid. CUFFT failed to execute the transform on GPU. CUFFT successfully executed the FFT plan. Output odata Return Values CUFFT_SETUP_FAILED CUFFT_INVALID_PLAN CUFFT_INVALID_VALUE CUFFT_EXEC_FAILED CUFFT_SUCCESS Accuracy and Performance The CUFFT library implements several FFT algorithms, each having different performance and accuracy. The best performance paths correspond to transform sizes that meet two criteria: 1. Fit in CUDAs shared memory 2. Are powers of a single factor (for example, powers of two) These transforms are also the most accurate due to the numeric stability of the chosen FFT algorithm. For transform sizes that meet the first criterion but not second, CUFFT uses a more general mixedradix FFT algorithm that is usually slower and less numerically accurate. Therefore, if possible it is best to use sizes that are powers of two or four, or powers of other small primes (such as, three, five, or seven). In addition, the poweroftwo FFT algorithm in CUFFT makes maximum use of shared memory by blocking subtransforms for signals that do not meet the first criterion. 13 NVIDIA PG-00000-003_V2.3 CUDA CUFFT Library For transform sizes that do not meet either criteria above, CUFFT uses an outofplace, mixedradix algorithm that stores all intermediate results in CUDAs global GPU memory. Although this algorithm uses optimized transform modules for many factors, it has generally lower performance because global memory has less bandwidth than shared memory. The one exception is large 1D transforms, where CUFFT uses a distributed algorithm that performs a 1D FFT using a 2D FFT, where the dimensions of the 2D transform are factors of the 1D size. This path attempts to utilize the faster transforms mentioned above even if the signal size is too large to fit in CUDAs shared memory. Many FFT algorithms for real data exploit the conjugate symmetry property to reduce computation and memory cost by roughly half. However, CUFFT does not implement any specialized algorithms for real data, and so there is no direct performance benefit to using realto complex (or complextoreal) plans instead of complextocomplex. For this release, the real data API exists primarily for convenience, so that users do not have to build interleaved complex data from a real data source before using the library. For 1D transforms, the performance for real data will either match or be less than the complex equivalent (due to an extra copy in come cases). However, there is usually a performance benefit to using real data for 2D and 3D FFTs, since all transforms but the last dimension operate on roughly half the logical signal size PG-00000-003_V2.3 NVIDIA 14 CUDA CUFFT Library CUFFT Code Examples This section provides simple examples of 1D, 2D, and 3D complex and real data transforms that use the CUFFT to perform forward and inverse FFTs. 1D Complex-to-Complex Transforms #define NX 256 #define BATCH 10 cufftHandle plan; cufftComplex *data; cudaMalloc((void**)&data, sizeof(cufftComplex)*NX*BATCH); /* Create a 1D FFT plan. */ cufftPlan1d(&plan, NX, CUFFT_C2C, BATCH); /* Use the CUFFT plan to transform the signal in place. */ cufftExecC2C(plan, data, data, CUFFT_FORWARD); /* Inverse transform the signal in place. */ cufftExecC2C(plan, data, data, CUFFT_INVERSE); /* Note: (1) Divide by number of elements in data set to get back original data (2) Identical pointers to input and output arrays implies in-place transformation */ /* Destroy the CUFFT plan. */ cufftDestroy(plan); cudaFree(data); 15 NVIDIA PG-00000-003_V2.3 CUDA CUFFT Library 1D Real-to-Complex Transforms #define NX 256 #define BATCH 10 cufftHandle plan; cufftComplex *data; cudaMalloc((void**)&data, sizeof(cufftComplex)*(NX/2+1)*BATCH); /* Create a 1D FFT plan. */ cufftPlan1d(&plan, NX, CUFFT_R2C, BATCH); /* Use the CUFFT plan to transform the signal in place. */ cufftExecR2C(plan, (cufftReal*)data, data); /* Destroy the CUFFT plan. */ cufftDestroy(plan); cudaFree(data); 2D Complex-to-Complex Transforms #define NX 256 #define NY 128 cufftHandle plan; cufftComplex *idata, *odata; cudaMalloc((void**)&idata, sizeof(cufftComplex)*NX*NY); cudaMalloc((void**)&odata, sizeof(cufftComplex)*NX*NY); /* Create a 2D FFT plan. */ cufftPlan2d(&plan, NX, NY, CUFFT_C2C); /* Use the CUFFT plan to transform the signal out of place. */ cufftExecC2C(plan, idata, odata, CUFFT_FORWARD); /* Note: idata != odata indicates an out-of-place transformation to CUFFT at execution time. */ PG-00000-003_V2.3 NVIDIA 16 CUDA CUFFT Library /* Inverse transform the signal in place */ cufftExecC2C(plan, odata, odata, CUFFT_INVERSE); /* Destroy the CUFFT plan. */ cufftDestroy(plan); cudaFree(idata); cudaFree(odata); 2D Complex-to-Real Transforms #define NX 256 #define NY 128 cufftHandle plan; cufftComplex *idata; cufftReal *odata; cudaMalloc((void**)&idata, sizeof(cufftComplex)*NX*NY); cudaMalloc((void**)&odata, sizeof(cufftReal)*NX*NY); /* Create a 2D FFT plan. */ cufftPlan2d(&plan, NX, NY, CUFFT_C2R); /* Use the CUFFT plan to transform the signal out of place. */ cufftExecC2R(plan, idata, odata); /* Destroy the CUFFT plan. */ cufftDestroy(plan); cudaFree(idata); cudaFree(odata); 17 NVIDIA PG-00000-003_V2.3 CUDA CUFFT Library 3D Complex-to-Complex Transforms #define NX 64 #define NY 64 #define NZ 128 cufftHandle plan; cufftComplex *data1, *data2; cudaMalloc((void**)&data1, sizeof(cufftComplex)*NX*NY*NZ); cudaMalloc((void**)&data2, sizeof(cufftComplex)*NX*NY*NZ); /* Create a 3D FFT plan. */ cufftPlan3d(&plan, NX, NY, NZ, CUFFT_C2C); /* Transform the first signal in place. */ cufftExecC2C(plan, data1, data1, CUFFT_FORWARD); /* Transform the second signal using the same plan. */ cufftExecC2C(plan, data2, data2, CUFFT_FORWARD); /* Destroy the CUFFT plan. */ cufftDestroy(plan); cudaFree(data1); cudaFree(data2); PG-00000-003_V2.3 NVIDIA 18 ...
View Full Document

Ask a homework question - tutors are online