Blog

Archive for the ‘MKL’ Category

Absolute value of complex numbers

Tuesday, March 8th, 2011

Max Hadley from Schlumberger in Southampton, UK came to us with an interesting bug report regarding the MaxAbsValue() and MaxAbsIndex() functions as applied to complex vectors in the NMathFunctions class.  Most of the time these methods worked as expected, but they would intermittently fail to correctly identify the maximum element in large vectors with similar elements.

In researching the MKL documentation we found that this was in fact not a problem from MKL’s perspective. MKL uses the L1-norm, or Manhattan distance from 0,  as a metric to compute the absolute value of a complex number. This simply means that it adds together the absolute values of the real and imaginary components:

Absolute value of a complex number according to BLAS.

We had expected the absolute value to be computed via the L2-norm, or Euclidean distance from zero, which is referred to in places as the magnitude metric. Interestingly, MKL uses the L1-norm because that is the norm defined by the underlying BLAS standard, and apparently the original designers of BLAS choose that norm for computational efficiency. This means that all BLAS-based linear algebra packages compute the norm of a complex vector in this way – and it’s probably not what most people expect.

This was a tricky bug to find for two reasons. First, substituting one norm for the other did not elicit incorrect behavior often because the real component generally dominates the magnitude. Second, the actual calculation of the absolute value of a complex number (rather than the maximum absolute value of a complex vector) has always been calculated using the L2-norm.

Now that we found the problem, we faced the unenviable task of trying to make our API consistent while interfacing with MKL and how it deals with finding the maximum absolute value element in a vector of complex numbers.  We started by suffixing all complex versions of min and max abs methods that use MKL and therefore use the L1-norm to compute the absolute value of complex numbers with a ’1′:

public static int MaxAbs1Index( FloatComplexVector v )
public static int MaxAbs1Value( FloatComplexVector v )
public static int MinAbs1Index( FloatComplexVector v )
public static int MinAbs1Value( FloatComplexVector v )
public static int MaxAbs1Index( DoubleComplexVector v )
public static int MaxAbs1Value( DoubleComplexVector v )
public static int MinAbs1Index( DoubleComplexVector v )
public static int MinAbs1Value( DoubleComplexVector v )

And we have subsequently written new methods that compute the maximum and minimum absolute values of a complex vector according to the L2-norm, or Euclidean distance, of its elements.  Users should be aware that these methods do not use MKL:

public static int MaxAbsIndex( FloatComplexVector v )
public static int MaxAbsValue( FloatComplexVector v )
public static int MinAbsIndex( FloatComplexVector v )
public static int MinAbsValue( FloatComplexVector v )
public static int MaxAbsIndex( DoubleComplexVector v )
public static int MaxAbsValue( DoubleComplexVector v )
public static int MinAbsIndex( DoubleComplexVector v )
public static int MinAbsValue( DoubleComplexVector v )

We hope the change is intuitive and useful.

Darren

MKL Memory Leak?

Wednesday, January 28th, 2009

We recently heard from an NMath user:

I am seeing a memory accumulation in my application (which uses NMath Core 2.5). From my memory profiler it looks like it could be an allocation in DotNetBlas.Product(), within the MKL dgemm() function.

I understand that MKL is designed such that memory is not released until the application closes. However, as this application runs in new worker threads all the time, I wondering if each new thread is holding onto it’s own memory for Product().

I’ve tried setting the system variable MKL_DISABLE_FAST_MM – but this seems to have made no difference – would I expect this to have an immediate effect (after re-starting the application)? Is there any other way within NMath to force MKL to release memory?

It’s true that for performance reasons, memory allocated by the Intel Math Kernel Library (MKL) is not released. This is by design and is a one-time occurrence for MKL routines that require workspace memory buffers. However, this workspace appears to be allocated on a per-thread basis, which can be a problem for applications that spawn large numbers of threads. As the MKL documentation delicately puts it, “the user should be aware that some tools might report this as a memory leak”.

There are two solutions for multithreaded applications to avoid continuous memory accumulation:

  1. Use a thread pool, so the number of new threads is bounded by the size of the pool.
  2. Use the MKL_FreeBuffers() function to free the memory allocated by the MKL memory manager.

The MKL_FreeBuffers() function is not currently exposed in NMath, but will be added in the next release. In the meantime, you can add this function to Kernel.cpp in NMath Core, and rebuild:

static void MklFreeBuffers() {
  BLAS_PREFIX(MKL_FreeBuffers());
}

Or, if you want some console output to confirm that memory is being released, try this:

static void MklFreeBuffers() {

  MKL_INT64 AllocatedBytes;
  int N_AllocatedBuffers;

  AllocatedBytes = MKL_MemStat(&N_ AllocatedBuffers);
  System::Console::WriteLine("BEFORE: " + (long)AllocatedBytes + " bytes in " + N_AllocatedBuffers + " buffers");

  BLAS_PREFIX(MKL_FreeBuffers()) ;

  AllocatedBytes = MKL_MemStat(&N_AllocatedBuffers);
  System::Console::WriteLine("AFTER: " + (long)AllocatedBytes + " bytes in " + N_AllocatedBuffers + " buffers");

}

Once you’ve rebuilt NMath Core, you’d use the new method like so:

using CenterSpace.NMath.Kernel;
...
DotNetBlas.MklFreeBuffers();

Note that some care should be taken when calling MklFreeBuffers(), since a drop in performance may occur for any subsequent MKL functions within the same thread, due to reallocation of buffers. Furthermore, given the cost of freeing the buffers themselves, rather than calling MklFreeBuffers() at the end of each thread, it might be more performant to do so after every n threads, or perhaps even based on the total memory usage of your program.

Ken