An Introduction to Linear Algebra on the GPU

NMath Premium was designed to provide an easy-to-follow path for .NET developers to leverage the performance of the GPU without having to wade through the complexities of GPU programming and their attendant details. NMath Premium allows developers to build once and run anywhere without concerning themselves with their users’ installed GPU models and versions, or even the existence of a GPU. NMath Premium is designed with fail-safe CPU fallbacks based on problem size, installed GPU hardware, and configuration settings. NMath Premium supports a complete set of dense linear algebra operations that execute on a wide class NVIDIA GPU’s, all using the intuitive, easy-to-use NMath API. As a result, NMath Premium not only offers superior performance with GPU-enabled linear algebra functions but also leverages these GPU-enabled classes internally in a wide range of algorithms.

NMath Premium will be released June 11, 2013. For immediate access, sign up here to join the beta program.

An SVD example

After installing and adding the NMath Premium assemblies to your project, the following example demonstrates the computation of a large SVD on the GPU. The NMath API has been largely preserved in NMath Premium so the following code example will be familiar to current NMath users as its syntax is identical to NMath. Nearly all NMath code will run correctly without edits and can be dropped into a NMath Premium project.

   // Build a dense random float matrix
   var A = new FloatMatrix( 5000, 5000, new RandGenUniform( -1, 1, seed));

   // Build the SVD server and request the right vectors
   var server = new FloatSVDecompServer();
   server.ComputeLeftVectors = false;
   server.ComputeRightVectors = true;
   server.ComputeFull = false;

   // Do the SVD
   FloatSVDecomp svd = server.GetDecomp( A );

Running in a NMath Premium project this 5000x5000 SVD decomposition can execute either on the GPU and CPU depending on the installed hardware and configuration settings, so most new users will want to immediately verify that their decomposition did indeed run on the GPU. To accomplish this NMath Premium provides a logging feature that allows programmers track where their GPU-aware classes routed their computation. The line of code below enables the logging feature – but because of the associated file writes logging should only be used while debugging and avoided in production code.

   NMathConfiguration.EnableGPULogging = true;

The log file will be written to a file named NMathGPULapack.log located next to the built executable. The location and name of this log file can be modified with the NMathConfiguration configuration class. The NMathConfiguration contains a number of new features and is worth a perusal. It’s important to note that in the current release the logging must be configured before any computational operations take place otherwise an exception will be thrown. Logging cannot currently be turn on or off once NMath Premium has loaded it’s dependent dlls and started running computational algorithms.

Having run the simple example above, I see the following in my NMathGPULapack log file.

  cula info:  sgesvd (N, S, 5000, 5000, ... , 5000)
  cula info:  issuing to CPU (work query)
  cula info:  CPU library is lapackcpu.dll
  cula info:  work query returned 654872
  cula info:  done
  cula info:  sgesvd (N, S, 5000, 5000, ... , 5000)
  cula info:  issuing to GPU (over threshold)
  cula info:  done

The first five lines record a query to the LAPACK library to determine the total memory requirements for this operation. This is simply a work query and does not run on the GPU. The final three lines record the operation name (sgesvd), size, where it ran and why. In this case the 5000x5000 SVD ran on the GPU because its size was over the cross-over threshold.


The raison d’être of GPU computation is performance and new users need to be aware of the various factors that impact GPU performance. For GPU developers leveraging a product like NMath Premium the three most important factors that determined the performance of a particular algorithm are:

  1. Installed GPU hardware
  2. Size of problem
  3. Computational precision, Single or Double

There are many other reasons that a particular algorithm will run well on a GPU including its computational complexity or how well it can be bent to run on highly parallel GPU architecture, but the three reasons above are paramount to our library users. Following the example above, if you happen to have run the SVD on an early model GeForce laptop GPU, the performance may not have been much better than running it in a CPU bound manner; although doing so would have freed your CPU for other tasks – an important collateral benefit of GPU computation. Likewise computational precision can have a major impact on GPU performance, and recognizing the single-precision graphics origin of GPU architecture, it’s important to match the required computational precision with the proper NVIDIA hardware, particularly if double precision is needed. Lastly, problem size is the primary determinate used by NMath Premium to route problems between the CPU and GPU; Because there is memory-transfer overhead involved in transferring data to the GPU, small problems are retained on the CPU and large problems are shifted to the GPU. This optimal routing can be controlled by the developer if fine control is needed, however most developers will use NMath Premium in its default configuration with great success.

-Happy Computing,

Paul Shirkey

Leave a Reply

Your email address will not be published. Required fields are marked *