NMath Premium Tuning - CenterSpace

NMath Premium is the new CenterSpace GPU-accelerated math and statistics library for the .NET platform. The supported NVIDIA GPU routines include both a range of dense linear algebra algorithms and 1D & 2D Fast Fourier Transforms (FFTs). NMath Premium is designed to be a near drop-in replacement for NMath, however there are a few important configuration differences and additional logging capabilities that are specific to the premium product that I will discuss in this article.

NMath Premium will be released June 11. For immediate access, sign up here to join the beta program.

Crossover Thresholds

NMath Premium makes it very easy to take advantage of the GPU’s performance benefits by hiding the complexities of data formatting, GPU memory management, algorithms, and diagnostics. Because there is a memory transfer overhead for any type of GPU computation NMath automatically routes computations between the CPU and GPU as appropriate for best performance. Additionally there are configuration options that can globally force all computations (regardless of problem size) to either the CPU or GPU. This can be useful for debugging or performance profiling. Let’s continue with an example.

   NMathConfiguration.ProcessorSharingMethod = ProcessorManagement.ProblemSize;

   FloatComplexForward1DFFT fft = new FloatComplexForward1DFFT( 1024*1024 );
   FloatComplexVector signal = new FloatComplexVector( 1024*1024, new RandGenUniform( -1, 1, seed) );

   fft.FFTInPlace( signal ); // Execute the million point FFT

The first line directs NMath Premium to route GPU-enabled routines automatically between the CPU and GPU dependent on problem size. Small problems remain on the CPU and large problems are off-loaded to the GPU. The ProblemSize setting is the default behavior and this line of code is not strictly required. The last three lines of code which build the FFT object, populate the random signal vector, and execute the million-point FFT are standard NMath code. Except for configuration options, the NMath Premium API is unchanged from NMath.

The problem-size cross-over thresholds can be tuned to control the threshold for every GPU-enabled algorithm. The optimal cross-over threshold is primarily dependent on the computational precision of the problem (Double or Float) and the installed hardware. Frequently applications need to solve similarly sized problems repeatably and the threshold can be adjusted to place the computation where needed. As a default, 1D FFT’s with a length over 16384 execute on the GPU and 2D FFT’s with a size larger than 256*256.

The cross-over threshold for any GPU-enabled algorithm can be set with the following code.

NMathConfiguration.SetCrossoverThreshold( 
   NMathConfiguration.GraphicsProcessorFunctions.FFT1D, 2000);
   ... 
// Now execute a 1D FFT on a 2100 point signal on the GPU.
fft = new FloatComplexForward1DFFT( 2100 );
signal = new FloatComplexVector( 2100, new RandGenUniform( -1, 1, seed) );
fft.FFTInPlace( signal ); // Execute the 2100 point FFT

With this setting all (complex) 1D FFT’s with a length greater that 2000 will execute on the GPU.

Logging and Troubleshooting

Because NMath Premium automatically falls back to the CPU-execution if there any problems with the installed NVIDIA GPU (or if there isn’t a NVIDIA GPU installed at all), we often found ourselves wanting to verify that our code was actually executing on the GPU. To verify that our small 2100-point FFT did indeed run on the GPU, we can enable GPU logging, run the example, and then check the log file, NMathConfiguration.log. The log file will reside next to the executable unless the LogLocation property has been set to a different directory. The following line of code will enable GPU logging.

NMathConfiguration.EnableGPULogging = true;

Logging should only be used while debugging and must be turned on before any NMath classes are created. In the current release of NMath Premium logging cannot be dynamically turned on or off, but this will change in the future to allow specific sections of code to create log entries. Currently, either the entire program is logging or the entire program is not logging.

Running our 2100-point FFT above we will see the following entries near the end of the log file (many lines have been trimmed from the head of the log file for clarity here).

   ...
Instantiating GPUManagerKernel: class CenterSpace.NMath.Kernel.GPUKern....

GPU Kernel: GeForce GT 525M CUDA hardware installed and ready to use by NMath Premium.
GPU Kernel: CUDA Driver Version 5.0 detected.
GPU Kernel: CUDA Runtime Version 5.0 detected.

Instantiating FFTManagerKernelGPU: class CenterSpace.NMath.Kernel.FFTMan....
NMath created GPU executing 1-D, FLOAT REAL, 2100-point FFT object.
Instantiating FFTKernelInstantiator: class CenterSpace.NMath.Kernel.FF.....

The bold face line (bold added) reports that we have successfully created a FFT object that will execute its 2100-point FFT’s on the GPU. Every time such an GPU-active FFT object is created a similar line will be added to this log file. The three lines starting with “GPU Kernel:” are reporting the type of GPU hardware found and that the correct NVIDIA CUDA driver and runtime have been detected. If any hardware or driver configuration problems are detected, preventing NMath Premium from using the GPU, the various errors will be reported in this section of the log file. Further, additional GPU hardware and driver setup information can be found by running a diagnostic program, deviceQuery.exe, bundled with NMath Premium (found in the Assemblies/x64 and Assemblies/x86 directories).

Summary

With a few of lines of code .NET developers can now write optimally executing GPU software with NMath Premium. Applications currently using NMath can easily be accelerated by installing NMath Premium with few if any code changes. Small problems remain on the CPU and large problems are routed to the GPU and the programmer has control over the cross-over thresholds for all GPU-enabled classes in NMath Premium. A logging capability is provided to help with any GPU hardware or driver issues and to verify that your FFT’s are executing on the installed NVIDIA GPU.

For more information on NMath Premium tuning, see the chapter on NMath Premium in the NMath User’s Guide.

Happy Computing,

-Paul Shirkey

Crossover Thresholds

Logging and Troubleshooting

Summary

Leave a Reply