<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	
	xmlns:georss="http://www.georss.org/georss"
	xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
	>

<channel>
	<title>MKL Archives - CenterSpace</title>
	<atom:link href="https://www.centerspace.net/category/mkl/feed" rel="self" type="application/rss+xml" />
	<link>https://www.centerspace.net/category/mkl</link>
	<description>.NET numerical class libraries</description>
	<lastBuildDate>Tue, 01 Mar 2016 21:44:26 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.1.1</generator>
<site xmlns="com-wordpress:feed-additions:1">104092929</site>	<item>
		<title>Precision and Reproducibility in Computing</title>
		<link>https://www.centerspace.net/precision-and-reproducibility-in-computing</link>
					<comments>https://www.centerspace.net/precision-and-reproducibility-in-computing#respond</comments>
		
		<dc:creator><![CDATA[Paul Shirkey]]></dc:creator>
		<pubDate>Mon, 16 Nov 2015 22:32:31 +0000</pubDate>
				<category><![CDATA[MKL]]></category>
		<category><![CDATA[NMath]]></category>
		<category><![CDATA[Object-Oriented Numerics]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[floating point precision]]></category>
		<category><![CDATA[MKL repeatability]]></category>
		<category><![CDATA[MKL reproducibility]]></category>
		<category><![CDATA[NMath repeatability]]></category>
		<category><![CDATA[NMath Reproducibility]]></category>
		<category><![CDATA[repeatability]]></category>
		<category><![CDATA[repeatability in computing]]></category>
		<category><![CDATA[Reproducibility in computing]]></category>
		<guid isPermaLink="false">http://www.centerspace.net/blog/?p=5810</guid>

					<description><![CDATA[<p>Run-to-run reproducibility in computing is often assumed as an obvious truth.  However software running on modern computer architectures, among many other processes, particularly when coupled with advanced performance-optimized libraries, is often only guaranteed to produce reproducible results only up to a certain precision; beyond that results can and do vary run-to-run.</p>
<p>The post <a rel="nofollow" href="https://www.centerspace.net/precision-and-reproducibility-in-computing">Precision and Reproducibility in Computing</a> appeared first on <a rel="nofollow" href="https://www.centerspace.net">CenterSpace</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>Run-to-run reproducibility in computing is often assumed as an obvious truth.  However software running on modern computer architectures, among many other processes, particularly when coupled with advanced performance-optimized libraries, is often only guaranteed to produce reproducible results only up to a certain precision; beyond that results can and do vary run-to-run.  Reproducibility is interrelated with the precision of floating-point point types and the resultant rounding, operation re-ordering, memory structure and use, and finally how real numbers are represented internally in a computer&#8217;s registers.  </p>
<p>This issue of reproducibility arises with <strong>NMath</strong> users when writing and running unit tests; which is why it&#8217;s important when writing tests to compare floating point numbers only up to their designed precision, at an absolute maximum.  With the IEEE 754 floating point representation which virtually all modern computers adhere to, the single precision <code>float </code>type uses 32 bits or 4 bytes and offers 24 bits of precision or about <em>7 decimal digits</em>. While the double precision <code>double </code>type requires 64 bits or 8 bytes and offers 53 bits of precision or about <em>15 decimal digits</em>.  Few algorithms can achieve significant results to the 15th decimal place due to rounding, loss of precision due to subtraction and other sources of numerical precision degradation.  <strong>NMath&#8217;s</strong> numerical results are tested, at a maximum, to the 14th decimal place.</p>
<h4 style="padding-left: 30px;"><em>A Precision Example</em></h4>
<p style="padding-left: 30px;">As an example, what does the following code output?</p>
<pre style="padding-left: 30px;" lang="csharp">      double x = .050000000000000003;
      double y = .050000000000000000;
      if ( x == y )
        Console.WriteLine( "x is y" );
      else
        Console.WriteLine( "x is not y" );
</pre>
<p style="padding-left: 30px;">I get &#8220;x is y&#8221;, which is clearly not the case, but the number x specified is beyond the precision of a <code>double </code>type.</p>
<p>Due to these limits on decimal number representation and the resulting rounding, the numerical results of some operations can be affected by the associative reordering of operations. For example, in some cases <code>a*x + a*z</code> may not equal <code>a*(x + z)</code> with floating point types.  Although this can be difficult to test using modern optimizing compilers because the code you write and the code that runs can be organized in a very different way, but is mathematically equivalent if not numerically.</p>
<p>So <em>reproducibility </em>is impacted by precision via dynamic operation reorderings in the ALU and additionally by run-time processor dispatching, data-array alignment, and variation in thread number among other factors.  These issues can create <em>run-to-run</em> differences in the least significant digits.  Two runs, same code, two answers.  <em>This is by design and is not an issue of correctness</em>.  Subtle changes in the memory layout of the program&#8217;s data, differences in loading of the ALU registers and operation order, and differences in threading all due to unrelated processes running on the same machine cause these run-to-run differences. </p>
<h3> Managing Reproducibility </h3>
<p>Most importantly, one should test code&#8217;s numerical results only to the precision that can be expected by the algorithm, input data, and finally the limits of floating point arithmetic.  To do this in unit tests, compare floating point numbers carefully only to a fixed number of digits.  The code snippet below compares two double numbers and returns true only if the numbers match to a specified number of digits.  </p>
<pre lang="csharp">
private static bool EqualToNumDigits( double expected, double actual, int numDigits )
    {
      double max = System.Math.Abs( expected ) > System.Math.Abs( actual ) ? System.Math.Abs( expected ) : System.Math.Abs( actual );
      double diff = System.Math.Abs( expected - actual );
      double relDiff = max > 1.0 ? diff / max : diff;
      if ( relDiff <= DOUBLE_EPSILON )
      {
        return true;
      }

      int numDigitsAgree = (int) ( -System.Math.Floor( Math.Log10( relDiff ) ) - 1 );
      return numDigitsAgree >= numDigits;
    }
</pre>
<p>This type of comparison should be used throughout unit testing code.  The full code listing, which we use for our internal testing, is provided at the end of this article.</p>
<p>If it is essential to enforce binary run-to-run reproducibility to the limits of precision, <strong>NMath </strong>provides a flag in its configuration class to ensure this is the case.  However this flag should be set for unit testing only because there can be a significant cost to performance.  In general, expect a 10% to 20% reduction in performance with some common operations degrading far more than that.  For example, some matrix multiplications will take twice the time with this flag set.</p>
<p>Note that the number of threads that Intel&#8217;s MKL library uses ( which <strong>NMath</strong> depends on ) must also be fixed before setting the reproducibility flag.</p>
<pre lang="csharp">
int numThreads = 2;  // This must be fixed for reproducibility.
NMathConfiguration.SetMKLNumThreads( numThreads );
NMathConfiguration.Reproducibility = true;
</pre>
<p>This reproducibility run configuration for <strong>NMath </strong>cannot be unset at a later point in the program.  Note that both setting the number of threads and the reproducibility flag may be set in the AppConfig or in environmental variables.  See the <a href="https://www.centerspace.net/doc/NMath/user/overview-83549.htm#Xoverview-83549">NMath User Guide</a> for instructions on how to do this. </p>
<p>Paul</p>
<p><strong>References</strong></p>
<p>M. A. Cornea-Hasegan, B. Norin.  <em>IA-64 Floating-Point Operations and the IEEE Standard for Binary Floating-Point Arithmetic</em>. Intel Technology Journal, Q4, 1999.<br />
<a href="http://gec.di.uminho.pt/discip/minf/ac0203/icca03/ia64fpbf1.pdf">http://gec.di.uminho.pt/discip/minf/ac0203/icca03/ia64fpbf1.pdf</a></p>
<p>D. Goldberg, <em>What Every Computer Scientist Should Know About Floating-Point Arithmetic</em>. Computing Surveys. March 1991.<br />
<a href="http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html">http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html</a></p>
<h3> Full <code>double</code> Comparison Code </h3>
<pre lang="csharp">
private static bool EqualToNumDigits( double expected, double actual, int numDigits )
    {
      bool xNaN = double.IsNaN( expected );
      bool yNaN = double.IsNaN( actual );
      if ( xNaN && yNaN )
      {
        return true;
      }
      if ( xNaN || yNaN )
      {
        return false;
      }
      if ( numDigits <= 0 )
      {
        throw new InvalidArgumentException( "numDigits is not positive in TestCase::EqualToNumDigits." );
      }

      double max = System.Math.Abs( expected ) > System.Math.Abs( actual ) ? System.Math.Abs( expected ) : System.Math.Abs( actual );
      double diff = System.Math.Abs( expected - actual );
      double relDiff = max > 1.0 ? diff / max : diff;
      if ( relDiff <= DOUBLE_EPSILON )
      {
        return true;
      }

      int numDigitsAgree = (int) ( -System.Math.Floor( Math.Log10( relDiff ) ) - 1 );
      //// Console.WriteLine( "x = {0}, y = {1}, rel diff = {2}, diff = {3}, num digits = {4}", x, y, relDiff, diff, numDigitsAgree );
      return numDigitsAgree >= numDigits;
    }
</pre>
<p>The post <a rel="nofollow" href="https://www.centerspace.net/precision-and-reproducibility-in-computing">Precision and Reproducibility in Computing</a> appeared first on <a rel="nofollow" href="https://www.centerspace.net">CenterSpace</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.centerspace.net/precision-and-reproducibility-in-computing/feed</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">5810</post-id>	</item>
		<item>
		<title>Absolute value of complex numbers</title>
		<link>https://www.centerspace.net/absolute-value-of-complex-numbers</link>
					<comments>https://www.centerspace.net/absolute-value-of-complex-numbers#comments</comments>
		
		<dc:creator><![CDATA[CenterSpace]]></dc:creator>
		<pubDate>Tue, 08 Mar 2011 19:51:14 +0000</pubDate>
				<category><![CDATA[MKL]]></category>
		<category><![CDATA[NMath]]></category>
		<category><![CDATA[abs complex number]]></category>
		<category><![CDATA[absolute value of a complex number]]></category>
		<category><![CDATA[BLAS absolute value]]></category>
		<category><![CDATA[MKL absolute value]]></category>
		<guid isPermaLink="false">http://www.centerspace.net/blog/?p=3277</guid>

					<description><![CDATA[<p><img class="excerpt" src="http://latex.codecogs.com/gif.latex?\left &#124;x \right &#124;_{l_1} = \left &#124;x_r \right &#124; + \left &#124;x_c \right &#124;" title="\left &#124;x \right &#124; = \left &#124;x_r \right &#124; + \left &#124;x_c \right &#124;" />  Max from <a href="http://www.slb.com/">Schlumberger</a> Fiber Optics came to us with an interesting bug report regarding the <tt>MaxAbsValue()</tt> and <tt>MaxAbsIndex()</tt> functions as applied to complex vectors in the <tt>NMathFunctions</tt> class.  Most of the time these methods worked as expected, but they would intermittently fail to correctly identify the maximum element in large vectors with similar elements.  The issue turned out to be the unusual definition for the absolute value of a complex number in the BLAS standard. </p>
<p>The post <a rel="nofollow" href="https://www.centerspace.net/absolute-value-of-complex-numbers">Absolute value of complex numbers</a> appeared first on <a rel="nofollow" href="https://www.centerspace.net">CenterSpace</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>Max Hadley from <a href="http://www.slb.com/">Schlumberger</a> in Southampton, UK came to us with an interesting bug report regarding the <tt>MaxAbsValue()</tt> and <tt>MaxAbsIndex()</tt> functions as applied to complex vectors in the <tt>NMathFunctions</tt> class.  Most of the time these methods worked as expected, but they would intermittently fail to correctly identify the maximum element in large vectors with similar elements.</p>
<p>In researching the MKL documentation we found that this was in fact not a problem from MKL&#8217;s <a href="http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/mklxe/mkl_manual_win_mac/bla/functn_iamax.htm">perspective</a>.  MKL uses the L1-norm, or <a href="https://en.wikipedia.org/wiki/Taxicab_geometry">Manhattan distance</a> from 0,  as a metric to compute the absolute value of a complex number.  This simply means that it adds together the absolute values of the real and imaginary components:</p>
<table style="margin-left: 120px; padding: 5px;">
<tr>
<td ><img decoding="async" src="http://latex.codecogs.com/gif.latex?\left |x \right |_{l_1} = \left |x_r \right | + \left |x_c \right |" title="\left |x \right | = \left |x_r \right | + \left |x_c \right |" /></td>
</tr>
</table>
<table style="margin-left: 5px; margin-top: -30px"><center></p>
<caption align="bottom" style="padding: 5px;">Absolute value of a complex number according to BLAS.</caption>
<p></center></p>
</table>
<p>We had expected the absolute value to be computed via the L2-norm, or <a href="https://en.wikipedia.org/wiki/Euclidean_distance">Euclidean distance</a> from zero, which is referred to in places as the <a href="http://mathworld.wolfram.com/VectorNorm.html">magnitude</a> metric.  Interestingly, MKL uses the L1-norm because that is the norm defined by the underlying BLAS standard, and apparently the original designers of BLAS choose that norm for computational efficiency.  This means that all BLAS-based linear algebra packages compute the norm of a complex vector in this way &#8211; and it&#8217;s probably not what most people expect.</p>
<p>This was a tricky bug to find for two reasons.  First, substituting one norm for the other did not elicit incorrect behavior often because the real component generally dominates the magnitude.  Second, the actual calculation of the absolute value of a complex number (rather than the maximum absolute value of a complex vector) has always been calculated using the L2-norm.</p>
<p>Now that we found the problem, we faced the unenviable task of trying to make our API consistent while interfacing with MKL and how it deals with finding the maximum absolute value element in a vector of complex numbers.  We started by suffixing all complex versions of min and max abs methods that use MKL and therefore use the L1-norm to compute the absolute value of complex numbers with a &#8216;1&#8217;:</p>
<pre lang="csharp">public static int MaxAbs1Index( FloatComplexVector v )
public static int MaxAbs1Value( FloatComplexVector v )
public static int MinAbs1Index( FloatComplexVector v )
public static int MinAbs1Value( FloatComplexVector v )
public static int MaxAbs1Index( DoubleComplexVector v )
public static int MaxAbs1Value( DoubleComplexVector v )
public static int MinAbs1Index( DoubleComplexVector v )
public static int MinAbs1Value( DoubleComplexVector v )</pre>
<p>And we have subsequently written new methods that compute the maximum and minimum absolute values of a complex vector according to the L2-norm, or Euclidean distance, of its elements.  Users should be aware that these methods do not use MKL:</p>
<pre lang="csharp">public static int MaxAbsIndex( FloatComplexVector v )
public static int MaxAbsValue( FloatComplexVector v )
public static int MinAbsIndex( FloatComplexVector v )
public static int MinAbsValue( FloatComplexVector v )
public static int MaxAbsIndex( DoubleComplexVector v )
public static int MaxAbsValue( DoubleComplexVector v )
public static int MinAbsIndex( DoubleComplexVector v )
public static int MinAbsValue( DoubleComplexVector v )</pre>
<p>We hope the change is intuitive and useful.</p>
<p>Darren</p>
<p>The post <a rel="nofollow" href="https://www.centerspace.net/absolute-value-of-complex-numbers">Absolute value of complex numbers</a> appeared first on <a rel="nofollow" href="https://www.centerspace.net">CenterSpace</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.centerspace.net/absolute-value-of-complex-numbers/feed</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">3277</post-id>	</item>
		<item>
		<title>MKL Memory Leak?</title>
		<link>https://www.centerspace.net/mkl-memory-leak</link>
					<comments>https://www.centerspace.net/mkl-memory-leak#comments</comments>
		
		<dc:creator><![CDATA[Ken Baldwin]]></dc:creator>
		<pubDate>Wed, 28 Jan 2009 17:03:16 +0000</pubDate>
				<category><![CDATA[MKL]]></category>
		<category><![CDATA[memory]]></category>
		<guid isPermaLink="false">http://www.centerspace.net/blog/?p=74</guid>

					<description><![CDATA[<p>We recently heard from an NMath user: I am seeing a memory accumulation in my application (which uses NMath Core 2.5). From my memory profiler it looks like it could be an allocation in DotNetBlas.Product(), within the MKL dgemm() function. I understand that MKL is designed such that memory is not released until the application [&#8230;]</p>
<p>The post <a rel="nofollow" href="https://www.centerspace.net/mkl-memory-leak">MKL Memory Leak?</a> appeared first on <a rel="nofollow" href="https://www.centerspace.net">CenterSpace</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>We recently heard from an NMath user:</p>
<blockquote><p>I am seeing a memory accumulation in my application (which uses NMath Core 2.5). From my memory profiler it looks like it could be an allocation in DotNetBlas.Product(), within the MKL dgemm() function.</p>
<p>I understand that MKL is designed such that memory is not released until the application closes. However, as this application runs in new worker threads all the time, I wondering if each new thread is holding onto it&#8217;s own memory for Product().</p>
<p>I&#8217;ve tried setting the system variable MKL_DISABLE_FAST_MM &#8211; but this seems to have made no difference &#8211; would I expect this to have an immediate effect (after re-starting the application)?   Is there any other way within NMath to force MKL to release memory?</p></blockquote>
<p>It&#8217;s true that for performance reasons, memory allocated by the Intel Math Kernel Library (MKL) is not released. This is by design and is a one-time occurrence for MKL routines that require workspace memory buffers. However, this workspace appears to be allocated on a per-thread basis, which can be a problem for applications that spawn large numbers of threads. As the MKL documentation delicately puts it, &#8220;the user should be aware that some tools might report this as a memory leak&#8221;.</p>
<p>There are two solutions for multithreaded applications to avoid continuous memory accumulation:</p>
<ol>
<li>Use a thread pool, so the number of new threads is bounded by the size of the pool.</li>
<li>Use the <span class="option">MKL_FreeBuffers()</span> function to free the memory allocated by the MKL memory manager.</li>
</ol>
<p>The <span class="option">MKL_FreeBuffers()</span> function is not currently exposed in NMath, but <strong>will be added in the next release</strong>. In the meantime, you can add this function to Kernel.cpp in NMath Core, and rebuild:</p>
<blockquote>
<pre>static void MklFreeBuffers() {
  BLAS_PREFIX(MKL_FreeBuffers());
}</pre>
</blockquote>
<p>Or, if you want some console output to confirm that memory is being released, try this:</p>
<blockquote>
<pre>static void MklFreeBuffers() {

  MKL_INT64 AllocatedBytes;
  int N_AllocatedBuffers;

  AllocatedBytes = MKL_MemStat(&amp;N_ AllocatedBuffers);
  System::Console::WriteLine("BEFORE: " + (long)AllocatedBytes + " bytes in " + N_AllocatedBuffers + " buffers");

  BLAS_PREFIX(MKL_FreeBuffers()) ;

  AllocatedBytes = MKL_MemStat(&amp;N_AllocatedBuffers);
  System::Console::WriteLine("AFTER: " + (long)AllocatedBytes + " bytes in " + N_AllocatedBuffers + " buffers");</pre>
<p>}</p></blockquote>
<p>Once you&#8217;ve rebuilt NMath Core, you&#8217;d use the new method like so:</p>
<blockquote>
<pre>using CenterSpace.NMath.Kernel;
...
DotNetBlas.MklFreeBuffers();</pre>
</blockquote>
<p>Note that some care should be taken when calling MklFreeBuffers(), since a drop in performance may occur for any subsequent MKL functions within the same thread, due to reallocation of buffers. Furthermore, given the cost of freeing the buffers themselves, rather than calling MklFreeBuffers() at the end of each thread, it might be more performant to do so after every <em>n</em> threads, or perhaps even based on the total memory usage of your program.</p>
<p>Ken</p>
<p>The post <a rel="nofollow" href="https://www.centerspace.net/mkl-memory-leak">MKL Memory Leak?</a> appeared first on <a rel="nofollow" href="https://www.centerspace.net">CenterSpace</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.centerspace.net/mkl-memory-leak/feed</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">74</post-id>	</item>
	</channel>
</rss>
