Intel Software Partner

For many computations, NMath uses the Intel® Math Kernel Library (MKL), which contains highly-optimized, extensively-threaded versions of the C and FORTRAN public domain computing packages known as the BLAS (Basic Linear Algebra Subroutines) and LAPACK (Linear Algebra PACKage). This gives NMath classes performance levels comparable to C,and often results in performance an order of magnitude faster than non-platform-optimized implementations.

NMath Benchmarks

In the example above, tests were performed on square matrices of varying sizes, and with varying numbers of repetitions. Each test was run 10 times and the average time was computed.(The machine used was a 2.8GHz Intel Core i7-930 quad core, with 8GB PC3 8500 DDR3 SDRAM, running 64-bit Microsoft Windows 7 Ultimate.)

NMath offers significantly higher performance than a straight C# implementation, especially for larger matrices. For example, the average time for multiplying 1000×1000 matrices using the C# matrix code averages over 43 times slower than NMath running single-threaded, and 30 times slower than NMath running multithreaded. The data also show the negligible overhead relative to straight C or C++ involved in invoking MKL from managed .NET code.

For more detailed timing data, and complete code samples, see our performance whitepaper.



For even greater acceleration, the Premium Editions of NMath and NMath Stats leverage the power of the NVIDIA CUDA™ architecture for GPU-accelerated mathematics on the .NET platform. NMath Premium automatically detects the presence of a CUDA-enabled GPU at runtime and seamlessly redirects appropriate computations to it. The library can be configured to specify which problems should be solved by the GPU, and which by the CPU. If a GPU is not present at runtime, the computation automatically falls back to the CPU without error.

GPU acceleration provides a 2-4x speed-up for many NMath functions. With large data sets running on high-performance GPUs, the speed-up can exceed 10x.


GPU: (1) NVIDIA Tesla M2090: 1 Fermi GPU, 512 CUDA cores, 6GB GDDR5 memory
CPU: Intel Xeon X5670, 2.93 GHz, 6-core with Hyper-Threading (12 threads), 12 MB L3 cache, 32 nm manufacturing process (Westmere)

Learn more by downloading our free whitepaper: NMath Premium: GPU-Accelerated Math Libraries for .NET.