<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	
	xmlns:georss="http://www.georss.org/georss"
	xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
	>

<channel>
	<title>FFT Archives - CenterSpace</title>
	<atom:link href="https://www.centerspace.net/tag/fft/feed" rel="self" type="application/rss+xml" />
	<link>https://www.centerspace.net/tag/fft</link>
	<description>.NET numerical class libraries</description>
	<lastBuildDate>Tue, 07 Feb 2023 21:48:41 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.1.1</generator>
<site xmlns="com-wordpress:feed-additions:1">104092929</site>	<item>
		<title>FFT Performance Benchmarks in .NET</title>
		<link>https://www.centerspace.net/fft-performance-benchmarks-in-net</link>
					<comments>https://www.centerspace.net/fft-performance-benchmarks-in-net#respond</comments>
		
		<dc:creator><![CDATA[Paul Shirkey]]></dc:creator>
		<pubDate>Wed, 05 Jan 2011 20:07:46 +0000</pubDate>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[NMath]]></category>
		<category><![CDATA[FFT]]></category>
		<category><![CDATA[FFT .NET benchmarks]]></category>
		<category><![CDATA[FFT benchmarks]]></category>
		<category><![CDATA[fft C#]]></category>
		<category><![CDATA[FFT in .NET]]></category>
		<category><![CDATA[Multicore FFT]]></category>
		<category><![CDATA[NMATH FFT and FFTW]]></category>
		<category><![CDATA[Non power of 2 FFT]]></category>
		<guid isPermaLink="false">http://www.centerspace.net/blog/?p=2942</guid>

					<description><![CDATA[<p>We've had a number of inquires about the CenterSpace FFT benchmarks, so I thought I would code up a few tests and run them on my machine.  I've included our FFT performance numbers and the code that generated those numbers so you can try them on your machine.  (If you don't have NMath, you'll need to download the <a href="https://www.centerspace.net/downloads/trial-versions/">eval version</a>).  I also did a head-to-head comparison with FFTW, one of the fastest desktop FFT implementations.</p>
<p>The post <a rel="nofollow" href="https://www.centerspace.net/fft-performance-benchmarks-in-net">FFT Performance Benchmarks in .NET</a> appeared first on <a rel="nofollow" href="https://www.centerspace.net">CenterSpace</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>We&#8217;ve had a number of inquires about the CenterSpace FFT benchmarks, so I thought I would code up a few tests and run them on my machine.  I&#8217;ve included our FFT performance numbers and the code that generated those numbers so you can try them on your machine.  (If you don&#8217;t have NMath, you&#8217;ll need to download the <a href="/trial-version/">eval version</a>).  I also did a comparison of 1 dimensional real DFTs, with FFTW, one of the fastest desktop FFT implementations available.</p>
<h3> Benchmarks </h3>
<p>These benchmarks were run on a 2.80 Ghz, Intel Core i7 CPU, with 4Gb of memory installed. </p>
<pre class="code">
The clock resolution is 0.003 ns
1024 point, forward, real FFT required 4361.364 ns, Mflops 4069
1000 point, forward, real FFT required 5338.785 ns, Mflops 3235
4096 point, forward, real FFT required 21708.565 ns, Mflops 3924
4095 point, forward, real FFT required 43012.010 ns, Mflops 1980
1024 * 1024 point, forward, real FFT required 15.635 ms, Mflops 2324
</pre>
<p>I&#8217;m estimating the megaflop performance during the FFT using:<br />
<center><br />
<img decoding="async" src="http://latex.codecogs.com/gif.latex?MFlops \approx {2.5*n \ ln (n)) \over{ \textit{time in} \ \mu s} }" title="MFlops \approx {2.5*n \ ln (n)) \over{ \textit{time in} \ \mu s} }" /><br />
</center></p>
<p>This is the asymptotic number of floating point operations for the radix-2 Cooley-Tukey FFT algorithm. This FFT MFlop estimate is used in a number of FFT benchmark reports and serves as a good basis for comparing algorithm efficiency.</p>
<p>As expected we take a performance hit for non-power of 2 lengths, but due to various optimizations for processing prime length FFT kernels (3, 5, 7 &#038; 11), the performance hit is minimal in many cases. The 1000-point FFT has prime factors <code>(2)(2)(2)(5)(5)(5)</code>, and the 4095-point FFT has prime factors <code>(3)(3)(5)(7)(13)</code>, so those larger prime factors in the 4095-point FFT cost us some performance.  Typically, user&#8217;s zero pad their data vectors to a power-of-two length to get optimal performance.</p>
<h3> Side by side comparison with FFTW </h3>
<p>FFTW claims to be the &#8220;Fastest Fourier Transform in the West&#8221;, and is a clever, high performance implementation of the discrete Fourier transform.  This algorithm is shipped with all copies of MATLAB.  FFTW is implemented in C and has the reputation as being one of the fastest desktop FFT algorithm.  </p>
<p>Both the NMath FFT and the FFTW have a pre-computation setup that establishes the best algorithmic approach for the DFT at hand, before computing any FFT&#8217;s.  This pre-computational phase is not included in the times below.   In the case of the NMath FFT classes, this pre-computational phase in done in the class constructor; Therefore users must avoid constructing NMath FFT classes in tight loops for best performance (as shown in the benchmark code below).  Below is a small side-by-side comparison between FFTW and NMath&#8217;s FFT (using the numbers from above).</p>
<pre class="code">
<table><tbody>
<tr>
<th colspan="3"> Comparison of a forward, real, out-of-place FFT. </th>
</tr>
<tr> 
<th> FFT length</th> <th> FFTW </th> <th> NMATH FFT </th>
</tr>
<tr> 
<td> 1024 </td> <td> 4.14 &mu;s</td> <td> 4.36 &mu;s </td> 
</tr>
<tr> 
<td> 1000</td> <td> 5.98 &mu;s </td> <td> 5.33 &mu;s </td> 
</tr>
<tr> 
<td> 4096</td> <td> 20.31 &mu;s </td> <td> 21.71 &mu;s </td> 
</tr>
<tr> 
<td> 4095</td> <td> 49.90 &mu;s </td> <td> 43.01 &mu;s </td> 
</tr>
<tr> 
<td> 1024^2 </td> <td> 17.16 ms </td> <td> 15.63 ms </td> 
</tr>
</tbody>
</table>
</pre>
<p>Clearly NMATH is very competitive with, and at times out-performs FFTW for real FFT&#8217;s of both power-of-2 length signals and otherwise.  I chose 1D real signals as a test case because this is one of the most frequent use cases of our NMATH FFT library. </p>
<p>On a subjective scale, running a 1024-point FFT on a desktop commodity machine at around (an algorithm normalized) 4 GFlops is amazing.  That means that in a real time measurement situation, users can compute 1024-point FFT&#8217;s at around 220kHz &#8211; all with just a couple of lines of code.</p>
<p>Happy Computing,<br />
<em> Paul </em></p>
<h3> Benchmark Code </h3>
<pre lang="csharp">
 public void BenchMarks()
    {
      Double numberTrials = 10000;
      Double flops;

      Stopwatch timer = new System.Diagnostics.Stopwatch();
      Console.WriteLine( String.Format("The clock resolution is {0:0.000} ns", Stopwatch.Frequency / 1000000000.0 ) );

      // Snip one - power of two
      RandGenUniform rand = new RandGenUniform();
      DoubleForward1DFFT fft = new DoubleForward1DFFT( 1024 );
      DoubleVector realsignal = new DoubleVector( 1024, rand );

      DoubleVector result = new DoubleVector( 1024 * 1024 );

      timer.Reset();
      for( int i = 0; i < numberTrials; i++ )
      {
        timer.Start();
        fft.FFT( realsignal, ref result );
        timer.Stop();
      }
      flops = (2.5 * 1024 * NMathFunctions.Log(1024)) / (((timer.ElapsedTicks / numberTrials) / Stopwatch.Frequency) * 1000000.0 );
      Console.WriteLine( String.Format( "1024 point, forward, real FFT required {0:0.000} ns, Mflops {1:0}", ( ( timer.ElapsedTicks / numberTrials ) / Stopwatch.Frequency ) * 1000000000.0, flops ) );

      // length 1000
      fft = new DoubleForward1DFFT( 1000 );
      realsignal = new DoubleVector( 1000, rand );

      timer.Reset();
      for( int i = 0; i < numberTrials; i++ )
      {
        timer.Start();
        fft.FFT( realsignal, ref result );
        timer.Stop();
      }
      flops = ( 2.5 * 1000 * NMathFunctions.Log( 1000 ) ) / ( ( ( timer.ElapsedTicks / numberTrials ) / Stopwatch.Frequency ) * 1000000.0 );
      Console.WriteLine( String.Format( "1000 point, forward, real FFT required {0:0.000} ns, Mflops {1:0}", ( ( timer.ElapsedTicks / numberTrials ) / Stopwatch.Frequency ) * 1000000000.0, flops ) );

      // length 4096
      fft = new DoubleForward1DFFT( 4096 );
      realsignal = new DoubleVector( 4096, rand );

      timer.Reset();
      for( int i = 0; i < numberTrials; i++ )
      {
        timer.Start();
        fft.FFT( realsignal, ref result );
        timer.Stop();
      }
      flops = ( 2.5 * 4096 * NMathFunctions.Log( 4096 ) ) / ( ( ( timer.ElapsedTicks / numberTrials ) / Stopwatch.Frequency ) * 1000000.0 );
      Console.WriteLine( String.Format( "4096 point, forward, real FFT required {0:0.000} ns, Mflops {1:0}", ( ( timer.ElapsedTicks / numberTrials ) / Stopwatch.Frequency ) * 1000000000.0, flops ) );

      // length 4095
      fft = new DoubleForward1DFFT( 4095 );
      realsignal = new DoubleVector( 4095, rand );

      timer.Reset();
      for( int i = 0; i < numberTrials; i++ )
      {
        timer.Start();
        fft.FFT( realsignal, ref result );
        timer.Stop();
      }
      flops = ( 2.5 * 4095 * NMathFunctions.Log( 4095 ) ) / ( ( ( timer.ElapsedTicks / numberTrials ) / Stopwatch.Frequency ) * 1000000.0 );
      Console.WriteLine( String.Format( "4095 point, forward, real FFT required {0:0.000} ns, Mflops {1:0}", ( ( timer.ElapsedTicks / numberTrials ) / Stopwatch.Frequency ) * 1000000000.0, flops ) );


      // length 1M
      fft = new DoubleForward1DFFT( 1024 * 1024 );
      realsignal = new DoubleVector( 1024 * 1024, rand );

      timer.Reset();
      for( int i = 0; i < 100; i++ )
      {
        timer.Start();
        fft.FFT( realsignal, ref result );
        timer.Stop();
      }
      flops = ( 2.5 * 1024 * 1024 * NMathFunctions.Log( 1024 * 1024 ) ) / ( ( ( timer.ElapsedTicks / 100.0 ) / Stopwatch.Frequency ) * 1000000.0 );
      Console.WriteLine( String.Format( "Million point (1024 * 1024), forward, real point FFT required {0:0.000} ms, Mflops {1:0}", ( ( timer.ElapsedTicks / 100.0 ) / Stopwatch.Frequency ) * 1000.0, flops ) );

    }
</pre>
<p>The post <a rel="nofollow" href="https://www.centerspace.net/fft-performance-benchmarks-in-net">FFT Performance Benchmarks in .NET</a> appeared first on <a rel="nofollow" href="https://www.centerspace.net">CenterSpace</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.centerspace.net/fft-performance-benchmarks-in-net/feed</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">2942</post-id>	</item>
		<item>
		<title>Convolution, Correlation, and the FFT</title>
		<link>https://www.centerspace.net/convolution-correlation-and-the-fft</link>
					<comments>https://www.centerspace.net/convolution-correlation-and-the-fft#respond</comments>
		
		<dc:creator><![CDATA[Paul Shirkey]]></dc:creator>
		<pubDate>Tue, 03 Nov 2009 00:26:58 +0000</pubDate>
				<category><![CDATA[NMath]]></category>
		<category><![CDATA[Circular Convolution]]></category>
		<category><![CDATA[Convolution]]></category>
		<category><![CDATA[Convolution .NET class]]></category>
		<category><![CDATA[Convolution in C#]]></category>
		<category><![CDATA[Convolution in R]]></category>
		<category><![CDATA[Correlation]]></category>
		<category><![CDATA[Correlation .NET class]]></category>
		<category><![CDATA[Fast Convolution]]></category>
		<category><![CDATA[FFT]]></category>
		<category><![CDATA[Linear Convolution]]></category>
		<category><![CDATA[R convolve]]></category>
		<guid isPermaLink="false">http://www.centerspace.net/blog/?p=347</guid>

					<description><![CDATA[<p>A discussion of the relationships of convolution, correlation and the Fourier transform, including examples porting code from R to NMath.</p>
<p>The post <a rel="nofollow" href="https://www.centerspace.net/convolution-correlation-and-the-fft">Convolution, Correlation, and the FFT</a> appeared first on <a rel="nofollow" href="https://www.centerspace.net">CenterSpace</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>Most scientists and programmers understand the basic implementation details of their chosen math library.  However, when algorithms are ported from one library to another, problems are hard to avoid.  This seems to be particularly so when dealing with convolutions, correlations and the FFT &#8211; fundamental building blocks in many areas of computation.  Frequently the theoretical concepts are clear, but when the bits hit the silicon, the confusion (at least for myself) starts.</p>
<p>To start eliminating some of this confusion, it&#8217;s important to understand two fundamental relationships between these three transforms.</p>
<p><center><br />
<img decoding="async" src="http://latex.codecogs.com/gif.latex?\mathcal{F}\{f*g\} = \mathcal{F}\{f\} \cdot \mathcal{F}\{g\}" title="\mathcal{F}\{f*g\} = \mathcal{F}\{f\} \cdot \mathcal{F}\{g\}" /><br />
</center><br />
This is known as the Convolution Theorem, where the italic F represents the Fourier transform, and the splat, convolution.   This basic equality, along with the FFT, is used to compute large convolutions efficiently.  The correlation operator has a similar analogous theorem, and this is where some of the problems start.</p>
<p><center><br />
<img decoding="async" src="http://latex.codecogs.com/gif.latex?\mathcal{F}\{f\star g\}=(\mathcal{F}\{f\})^* \cdot \mathcal{F}\{g\}" title="\mathcal{F}\{f\star g\}=(\mathcal{F}\{f\})^* \cdot \mathcal{F}\{g\}" /><br />
</center></p>
<p>The star is the correlation operator.  Note that the first Fourier transform is conjugated, and that this breaks some basic symmetries in correlation that are found in convolution.</p>
<h2> Convolution in R </h2>
<p>If you happen to be porting code from R, the R language (v. 2.10.0) is distributed with a <code> convolve </code> function which actually, by default, returns the correlation.  This is unfortunate and is a source of confusion for anybody porting a prototype from R to CenterSpace&#8217;s NMath, or to other math library for that matter.</p>
<p>Briefly, in R</p>
<pre lang="r">
kernel <- c(1, 2, 3, 1, 0, 0)
data <- c(1, 2, 3, 4, 5, 6)
convolve(kernel, data)
</pre>
<p>will result in:</p>
<pre lang="r">
[18 17 22 33 32 25] 
</pre>
<p>Now this is a strange result, on a couple of fronts.  First, this isn't the convolution, and second, it isn't the correlation either!   Supposing that we want to compute the correlation between this kernel and signal, we would be expecting precisely,</p>
<pre lang="r">
[0 0 1 5 11 18 25 32 32 17 6 ].
</pre>
<p>Conflating correlation with convolution in one function is certain to cause confusion, because among many other reasons, convolution is communitive, and correlation is not.  In general, for correlation,</p>
<p><center><br />
<img decoding="async" src="http://latex.codecogs.com/gif.latex?f\star g \ne g \star f" title="f\star g \ne g \star f" /><br />
</center></p>
<p>yet for convolution,</p>
<p><center><br />
<img decoding="async" src="http://latex.codecogs.com/gif.latex?f * g = g * f" title="f * g = g * f" /><br />
</center></p>
<p>If g or h satisfy certain symmetry properties, correlation can gain back the communitive property.  </p>
<p>Back to our correlation example above, if we exchange the arguments <code> kernel </code> and <code> data </code> in R's convolve() function we get a different answer.</p>
<pre lang="r">
convolve(data,kernel)
</pre>
<p>gives us,</p>
<pre lang="r">
[18 25 32 33 22 17 ].
</pre>
<p>There is part of a correlation swimming around in that vector, but the last three numbers given by R are not part of a linear correlation.   Many users naturally take those 6 numbers incorrectly as the  linear correlation (or worse convolution) of the <code> kernel </code> and <code> data </code>.  This brings us to our next topic.</p>
<h2> Fast Convolution </h2>
<p>The fast Fourier transform is used to compute the convolution or correlation for performance reasons.  This FFT based algorithm is often referred to as 'fast convolution', and is given by,</p>
<p><center><br />
<img decoding="async" src="http://latex.codecogs.com/gif.latex?\small f*g = \mathcal{F}^{-1} \{\mathcal{F}\{f\} \cdot \mathcal{F}\{g\}\}." title="\small f*g = \mathcal{F}^{-1} \{\mathcal{F}\{f\} \cdot \mathcal{F}\{g\}\}." /></center></p>
<p>In the discrete case, when the two sequences are the same length, <code> N </code>, the FFT based method requires <code> O(N log N) </code> time, where a direct summation would require <code> O(N*N) </code> time.</p>
<p>This asymptotic runtime performance makes the FFT method the defacto standard for computing convolution.  However, this is unfortunate because if the kernel is much smaller than the data,  the direct summation is actually faster than using the FFT.  This is not a rare special case, and is actually very common in signal filtering, wavelet transforms, and image processing applications.  This also brings to light that many libraries (including R) require both inputs to be zero padded to the same length (typically a power of 2) - immediately eliminating this optimization and always forcing the use of the FFT technique.</p>
<p>Returning to our example above.  If we remove the unnecessary padding from the kernel, and recompute the correlation, we arrive at,</p>
<p><img decoding="async" src="http://latex.codecogs.com/gif.latex?\small \texttt{[1 2 3 1]} \star \texttt{[1 2 3 4 5 6] = [1 5 11 18 25 32 32 17 6 ]}" title="\small \texttt{[1 2 3 1]} \star \texttt{[1 2 3 4 5 6] = [1 5 11 18 25 32 32 17 6 ]}" /></p>
<p>Now since both the correlation (and the convolution) spread the signal data by <code> kernel.Length() - 1 = 3 </code> elements, most (engineering) users are interested in the correlation exclusively where the kernel fully overlaps the signal data.  This windowing would then give us,</p>
<p><img decoding="async" src="http://latex.codecogs.com/gif.latex?\small \texttt{[18 25 32]}" title="\small \texttt{[18 25 32]}" /></p>
<p>which are the first three numbers provided by R's <code> convolve </code> function.  The latter three numbers are the results of a <em> circular </em>, not a linear, correlation.  This is probably not the result most engineers are looking for unless they are filtering a periodic signal or an image wrapped on a cylinder.  Circular correlation wraps the data end-to-end in a continuous loop when summing, by effectively joining the first and last elements of the data array.</p>
<p>The circular correlation for this running example would look like the following table.  </p>
<pre class="code">
[1 2 3 4 5 6]
[1 2 3 1 - -] = 18
[- 1 2 3 1 -] = 25
[- - 1 2 3 1] = 32
[1 - - 1 2 3] = 33 (circular)
[3 1 - - 1 2] = 22 (circular)
[2 3 1 - - 1] = 17 (circular)
</pre>
<p>The top array is the data, and the arrays below represent the kernel sweeping across the data step by step.</p>
<p>The difference between the circular and linear correlation is restricted to the edges of the correlation where the (unpadded) kernel does not fully overlap the data.   The circular and linear correlations are identical in the areas where the kernel fully overlaps the data - which in many applications is the area of interest.</p>
<h2> Convolution & Correlation Classes in the NMath library </h2>
<p>CenterSpace's convolution and correlation classes rigorously and efficiently compute their respective transformation correctly, regardless of the computational technique used.  This means that zero padding by the application programmer is no longer necessary, and in fact is discouraged.   As is reflexively using the 'fast convolution' technique when direct summation is actually faster.  </p>
<p>When a NMath convolution or correlation class is constructed, it estimates the number of MFlops needed by all competing techniques and chooses the fastest computational method.  Zero padding will introduce errors into this MFlops estimation process.</p>
<h3> Classes </h3>
<p>The CenterSpace NMath library offers the following eight classes.</p>
<ul>
<li>{Double | Float}1DConvolution
<li>{DoubleComplex | FloatComplex}1DConvolution
<li>{Double | Float}1DCorrelation
<li>{DoubleComplex | FloatComplex}1DCorrelation
</ul>
<p>The two sets of correlation and convolution classes have completely symmetric interfaces.</p>
<p><em> Code Examples </em><br />
If you are currently porting code from a system that uses the FFT 'fast correlation' technique, I will now outline how you would port that code to NMath.  </p>
<p>Porting our running R-example from above to NMath, and assuming that what you need is <em> linear </em> correlation, the NMath code would look like:</p>
<pre lang="csharp">
DoubleVector kernel = new DoubleVector(1, 2, 3, 1);
DoubleVector data = new DoubleVector(1, 2, 3, 4, 5, 6);
      
Double1DCorrelation corr = new Double1DCorrelation(kernel, data.Length);
DoubleVector correlation = corr.Correlate(data);
DoubleVector corr_full_kernel_overlap = 
  corr.TrimConvolution(correlation, CorrelationBase.Windowing.FullKernelOverlap);
      
DoubleVector corr_centered = 
  corr.TrimConvolution(correlation, CorrelationBase.Windowing.CenterWindow);

// correlation =            [1 5 11 18 25 32 32 17 6]
// corr_centered =              [11 18 25 32 32 17]
// corr_full_kernel_overlap =      [18 25 32]
</pre>
<p>Note that the windowing method, <code> TrimConvolution() </code> does not copy any data.  It just creates a windowed view (reference) into the underlying convolution vector.  Windowing of native arrays are not supported because a copy would be required.</p>
<p>The CenterSpace NMath libraries currently do not support circular convolution, so if that is required due to the circular symmetry / periodicity of the data, the circular convolution or correlation must be computed using our FFT classes directly.</p>
<pre lang="csharp">
// Compute circular correlation via FFT's.
// Zero-padding is required here.
// Typically pad to the nearest power of 2.
double[] nhkernel = { 1, 2, 3, 1, 0, 0};      
double[] data = { 1, 2, 3, 4, 5, 6 };
      
// Build FFT classes
 // and setup the correct scaling.
 fft = new DoubleComplexForward1DFFT(nhkernel.Length);
 ifft = new DoubleComplexBackward1DFFT(nhkernel.Length);
 ifft.SetScaleFactorByLength();

// Build the complex vectors of the real data
DoubleComplexVector kernelz = 
new DoubleComplexVector(new DoubleVector(nhkernel), new DoubleVector(nhkernel.Length));

DoubleComplexVector dataz = 
  new DoubleComplexVector(new DoubleVector(data), new DoubleVector(nhkernel.Length));

// Compute.  The next five lines 
// implement the fast correlation algorithm.
fft.FFTInPlace(kernelz);
fft.FFTInPlace(dataz);
dataz = NMathFunctions.Conj(dataz)
DoubleComplexVector prodz = kernelz * dataz;
ifft.FFTInPlace(prod);
r = new DoubleVector(NMathFunctions.Real(prod));
// r = [18 17 22 33 32 25] 
</pre>
<p>
<em>-Paul</em></p>
<p>See our <a href="/topic-fast-fourier-transforms/">FFT landing page </a> for complete documentation and code examples.</p>
<p>The post <a rel="nofollow" href="https://www.centerspace.net/convolution-correlation-and-the-fft">Convolution, Correlation, and the FFT</a> appeared first on <a rel="nofollow" href="https://www.centerspace.net">CenterSpace</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.centerspace.net/convolution-correlation-and-the-fft/feed</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">347</post-id>	</item>
		<item>
		<title>Convolution in CenterSpace&#8217;s NMATH 4.0</title>
		<link>https://www.centerspace.net/convolution-in-centerspaces-nmath-4-0</link>
					<comments>https://www.centerspace.net/convolution-in-centerspaces-nmath-4-0#respond</comments>
		
		<dc:creator><![CDATA[Paul Shirkey]]></dc:creator>
		<pubDate>Mon, 19 Oct 2009 18:00:47 +0000</pubDate>
				<category><![CDATA[NMath]]></category>
		<category><![CDATA[Convolution]]></category>
		<category><![CDATA[Convolution class]]></category>
		<category><![CDATA[Correlation]]></category>
		<category><![CDATA[Efficient Convolution]]></category>
		<category><![CDATA[Fast Convolution]]></category>
		<category><![CDATA[FFT]]></category>
		<guid isPermaLink="false">http://www.centerspace.net/blog/?p=285</guid>

					<description><![CDATA[<p>Convolution is a fundamental operation in data smoothing and filtering, and is used in many other applications ranging from discrete wavelet transform&#8217;s to LTI system theory. NMath supports a high performance, forward scaling set of convolution classes that support both complex and real data. These classes will scale in performance in proportion to the number [&#8230;]</p>
<p>The post <a rel="nofollow" href="https://www.centerspace.net/convolution-in-centerspaces-nmath-4-0">Convolution in CenterSpace&#8217;s NMATH 4.0</a> appeared first on <a rel="nofollow" href="https://www.centerspace.net">CenterSpace</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>Convolution is a fundamental operation in data smoothing and filtering, and is used in many other applications ranging from discrete wavelet transform&#8217;s to LTI system theory.  NMath supports a high performance, forward scaling set of convolution classes that support both complex and real data.  These classes will scale in performance in proportion to the number of processing cores &#8211; eliminating code rewrites to take advantage of new multi-core hardware upgrades.</p>
<p>The following four convolution classes are available in the NMath 4.0 library.</p>
<p><code></p>
<ul>
<li>{Double | DoubleComplex}1DConvolution</li>
<li>{Float | FloatComplex}1DConvolution</li>
</ul>
<p></code></p>
<p>Additionally a symmetric set of correlation classes will be available.</p>
<p><code></p>
<ul>
<li>{Double | DoubleComplex}1DCorrelation</li>
<li>{Float | FloatComplex}1DCorrelation</li>
</ul>
<p></code></p>
<h3> Example </h3>
<p>Computing a convolution is as simple as defining the convolution kernel, creating the right class object for the data type, and running the convolution.  </p>
<pre class="code">
// Create some random signal data using the 
// Mersenne Twist random number generator.
RandomNumberGenerator rand = new RandGenMTwist(4230987);
DoubleVector data = new DoubleVector(500, rand);
      
// Create a simple averaging kernel.
DoubleVector kernel =  new DoubleVector("[ .25 .25 .25 .25 ]");

// Create the real number domain convolution class.
Double1DConvolution conv = 
    new Double1DConvolution(kernel, data.Length);

// Compute the convolution.
DoubleVector smoothed_data = conv.Convolve(data); </pre>
<h5> Optimal performance for all convolution problems </h5>
<p>Exploiting the fundamental duality between convolution and the Fourier transform, and the O(n ln n) FFT algorithm, convolutions can be computed in O(n ln n) time between two sequences g and h.<br />
<center><br />
<img decoding="async" title="g \ast h = \mathcal{F}^{-1} \{ \mathcal{F}\{g\}\cdot \mathcal{F}\{h\}\}" src="http://latex.codecogs.com/gif.latex?g \ast h = \mathcal{F}^{-1} \{ \mathcal{F}\{g\}\cdot \mathcal{F}\{h\}\}" alt="" /><br />
</center></p>
<p>For convolutions on very large data sets this is clearly the most time efficient algorithm, even though two forward and one backward FFT are required.  However for shorter data sequences, this is slower in practice that just directly summing the convolution sum &#8211; particularly so on modern multi-core processors with large on-chip caches.  Also, direct summation is often faster when the kernel is much shorter that the data, which is frequently the case in signal processing applications.</p>
<p>The decision machinery for choosing which technique to use for the problem at hand is done automatically when the class is constructed.  This way the user is always getting the best available convolution performance without worrying about which technique to use. </p>
<p><em><br />
-Paul<br />
</em></p>
<p>The post <a rel="nofollow" href="https://www.centerspace.net/convolution-in-centerspaces-nmath-4-0">Convolution in CenterSpace&#8217;s NMATH 4.0</a> appeared first on <a rel="nofollow" href="https://www.centerspace.net">CenterSpace</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.centerspace.net/convolution-in-centerspaces-nmath-4-0/feed</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">285</post-id>	</item>
		<item>
		<title>Modern Fast Fourier Transform</title>
		<link>https://www.centerspace.net/modern-fast-fourier-transform</link>
					<comments>https://www.centerspace.net/modern-fast-fourier-transform#comments</comments>
		
		<dc:creator><![CDATA[Paul Shirkey]]></dc:creator>
		<pubDate>Tue, 29 Sep 2009 05:32:46 +0000</pubDate>
				<category><![CDATA[NMath]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[FFT]]></category>
		<category><![CDATA[FFT performance]]></category>
		<category><![CDATA[High performance FFT]]></category>
		<category><![CDATA[Multicore FFT]]></category>
		<guid isPermaLink="false">http://www.centerspace.net/blog/?p=226</guid>

					<description><![CDATA[<p>All variants of the original Cooley-Tukey O(n log n) fast Fourier transform fundamentally exploit different ways to factor the discrete Fourier summation of length N. For example, the split-radix FFT algorithm divides the Fourier summation of length N into three new Fourier summations: one of length N/2 and two of length N/4. The prime factor [&#8230;]</p>
<p>The post <a rel="nofollow" href="https://www.centerspace.net/modern-fast-fourier-transform">Modern Fast Fourier Transform</a> appeared first on <a rel="nofollow" href="https://www.centerspace.net">CenterSpace</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>All variants of the original Cooley-Tukey O(n log n) fast Fourier transform fundamentally exploit different ways to factor the discrete Fourier summation of length N.</p>
<p><center><br />
<a href="http://www.codecogs.com/eqnedit.php?latex=X_k = \sum_{n=0}^{N-1} x_n e^{(-2 \pi i / N) kn} \ \ \ \ \ k = 0, ... ,N-1" target="_blank" rel="noopener"><img decoding="async" title="X_k = \sum_{n=0}^{N-1} x_n e^{(-2 \pi i / N) kn} \ \ \ \ \ k = 0, ... ,N-1" src="http://latex.codecogs.com/gif.latex?X_k = \sum_{n=0}^{N-1} x_n e^{(-2 \pi i / N) kn} \ \ \ \ \ k = 0, ... ,N-1" alt="" /></a></center><br />
For example, the <em>split-radix FFT</em> algorithm divides the Fourier summation of length N into three new Fourier summations: one of length N/2 and two of length N/4.</p>
<p><center><br />
<a href="http://www.codecogs.com/eqnedit.php?latex=X_{k_N} = X_{k_{N/2}} @plus; X_{k_{N/4}} @plus; X_{k_{N/4}}" target="_blank" rel="noopener"><img decoding="async" title="X_{k_N} = X_{k_{N/2}} + X_{k_{N/4}} + X_{k_{N/4}}" src="http://latex.codecogs.com/gif.latex?X_{k_N} = X_{k_{N/2}} + X_{k_{N/4}} + X_{k_{N/4}}" alt="" /></a></center><br />
The <em>prime factor FFT</em>, divides the Fourier summation of length N, into two (if they exist) summations of length N1 and N2, where N1 and N2 must be relatively prime.</p>
<p><center><br />
<a href="http://www.codecogs.com/eqnedit.php?latex=X_{k_N} = X_{k_{N1}} ( X_{k_{N2}} ) \ \ where \ N1 \perp N2" target="_blank" rel="noopener"><img decoding="async" title="X_{k_N} = X_{k_{N1}} ( X_{k_{N2}} ) \ \ where \ N1 \perp N2" src="http://latex.codecogs.com/gif.latex?X_{k_N} = X_{k_{N1}} ( X_{k_{N2}} ) \ \ where \ N1 \perp N2" alt="" /></a></center><br />
These algorithms are typically applied recursively, and in combination with one another (or with still other factorizations) to maximize performance for a particular N.</p>
<p>In modern implementations there really isn&#8217;t a single static FFT algorithm, but more a dynamic collection of FFT algorithms and tools that are cleverly collated for the Fourier transform type at hand. Major algorithmic changes occur in the underlying implementation as the length and forward domain (real or complex) of the problem vary. Sophisticated FFT implementations insulate the end-user programmer from all of this background machinery.</p>
<h5>DFT length is fundamental to performance</h5>
<p>The days of power-of-2-only FFT algorithms are dead. Users of modern FFT libraries should not need to worry about the large complexities involved in finding the optimal algorithm for the FFT computation at hand; the library should look at the FFT length, problem domain (real or complex), number of machine cores, and machine architecture, and find and compute with the best hybridized FFT algorithm available. However, it is still helpful to understand that your realized performance will depend fundamentally on the various factorization of the length of your FFT. Most know that the best FFT performance will be had when N is a power of 2. If this stringent length requirement cannot be met, then it is best to use a length that be factored into small primes. CenterSpace&#8217;s FFT algorithms contain optimized kernels for prime factor lengths of 2, 3, 5, 7 and 11. The table below demonstrates the FFT performance sensitivity to FFT length.</p>
<table border="0" cellpadding="4">
<caption>Forward real 1D FFT performance at various lengths.</caption>
<tbody>
<tr align="center">
<td><em> DFT Length </em></td>
<td><em>Factors </em></td>
<td><em>MFLOP approximation </em></td>
</tr>
<tr>
<td>512</td>
<td>2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2</td>
<td>5324.5</td>
</tr>
<tr>
<td>511</td>
<td>7 x 73</td>
<td>1327.8</td>
</tr>
<tr>
<td>510</td>
<td>2 x 3 x 5 x 17</td>
<td>3879.4</td>
</tr>
<tr>
<td>509</td>
<td>509 (prime)</td>
<td>1762.4</td>
</tr>
<tr>
<td>508</td>
<td>2 x 2 x 127</td>
<td>2637.6</td>
</tr>
<tr>
<td>507</td>
<td>3 x 13 x 13</td>
<td>2631.5</td>
</tr>
<tr>
<td>506</td>
<td>2 x 11 x 23</td>
<td>3938.3</td>
</tr>
<tr>
<td>505</td>
<td>5 x 101</td>
<td>1122.6</td>
</tr>
<tr>
<td>504</td>
<td>2 x 2 x 2 x 3 x 3 x 7</td>
<td>5227</td>
</tr>
</tbody>
</table>
<p>Clearly the fastest FFT&#8217;s are for lengths that can be factored into small primes (512, 510, 507, 506, 504), and especially small primes that have optimized kernels (512 and 504). The more kernel optimized primes your FFT length contains the faster it will run. This is a universal fact that all FFT implementations confront and holds true for higher dimension FFT&#8217;s as well. <em> Slight changes in length can have a profound impact on FFT performance</em>.</p>
<p>You can factor your FFT length using an online service to assess how your FFT will perform.</p>
<h5>Multi-core Scalability</h5>
<p>The ability to factor a particular FFT into a set independent computations makes it fundamentally suitable for parallelization. All modern desktop and many laptop computers today contain at least two processor cores and any modern math library should be exploiting this fact where possible. CenterSpace&#8217;s complex domain FFT&#8217;s (and related convolutions) are multi-core aware, and automatically expand to fully utilize the available processor cores. Small problems are run on a single core, but once the computational advantages of algorithm parallelization overcome the overhead costs of multi-core parallelization, the computation is spread across all available cores. This automatic parallelization is gained simply by using CenterSpace&#8217;s NMath class libraries. No end-user programming effort is involved.</p>
<table border="0" cellpadding="6">
<caption>Forward complex 1D FFT performance on 1 and 8 cores.</caption>
<tbody>
<tr align="center">
<th><em> FFT Length </em></th>
<th><em> Machine Cores </em></th>
<th><em> Time (seconds) </em></th>
<th><em> MFLOP approximation </em></th>
</tr>
<tr>
<td>2^20</td>
<td>One</td>
<td>56.7</td>
<td>6405.9</td>
</tr>
<tr>
<td>2^20 + 1</td>
<td>One</td>
<td>554.6</td>
<td>655.3</td>
</tr>
<tr>
<td>2^20</td>
<td>Eight</td>
<td>53.3</td>
<td>6813.7</td>
</tr>
<tr>
<td>2^20 + 1</td>
<td>Eight</td>
<td>124.2</td>
<td>2925.3</td>
</tr>
</tbody>
</table>
<p>The power of two FFT&#8217;s are so computationally efficient on modern processors that the gain between one and eight cores is only about 3 seconds on a 2^20-point FFT. However, for the non-power-of-two case we get a 4.5 times speed improvement going from one core to eight. Looked at another way, with multi-core scalability of the FFT, we suffered only a 2X loss in performance going from a 2^20 length FFT to a 2^20+1 length FFT, instead of a 10X loss in performance. In other words, the multi-core scalability of CenterSpace&#8217;s NMath FFT algorithms mitigate the performance loss in using non-power-of-2 lengths, and this simplifies the end-user programmer&#8217;s job.</p>
<p><em> -Paul </em></p>
<p>See our <a href="/topic-fast-fourier-transforms/">FFT landing page </a> for complete documentation and code examples.</p>
<p>The post <a rel="nofollow" href="https://www.centerspace.net/modern-fast-fourier-transform">Modern Fast Fourier Transform</a> appeared first on <a rel="nofollow" href="https://www.centerspace.net">CenterSpace</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.centerspace.net/modern-fast-fourier-transform/feed</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">226</post-id>	</item>
		<item>
		<title>High Performance FFT in NMath 4.0</title>
		<link>https://www.centerspace.net/high-performance-fft-coming-in-the-next-release</link>
					<comments>https://www.centerspace.net/high-performance-fft-coming-in-the-next-release#respond</comments>
		
		<dc:creator><![CDATA[Paul Shirkey]]></dc:creator>
		<pubDate>Wed, 02 Sep 2009 23:41:45 +0000</pubDate>
				<category><![CDATA[NMath]]></category>
		<category><![CDATA[FFT]]></category>
		<category><![CDATA[FFT in .NET]]></category>
		<category><![CDATA[NMath FFT example]]></category>
		<category><![CDATA[Non power of 2 FFT]]></category>
		<category><![CDATA[Real FFT packing]]></category>
		<guid isPermaLink="false">http://www.centerspace.net/blog/?p=160</guid>

					<description><![CDATA[<p>High performance, multi-core aware, FFT class set will be offered in the upcoming NMath 4.0 release.</p>
<p>The post <a rel="nofollow" href="https://www.centerspace.net/high-performance-fft-coming-in-the-next-release">High Performance FFT in NMath 4.0</a> appeared first on <a rel="nofollow" href="https://www.centerspace.net">CenterSpace</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>The next release of Center Space&#8217;s NMATH .NET libraries will contain high performance, multi-core aware, fast fourier transform classes.  This set of classes will elegantly support all common 1D and 2D FFT computations in a robust easy to use object-oriented interface.</p>
<p>The following FFT classes will be available.</p>
<p><code></p>
<ul>
<li>DoubleComplexForward1DFFT</li>
<li>DoubleComplexBackward1DFFT</li>
<li>DoubleComplexForward2DFFT</li>
<li>DoubleComplexBackward2DFFT</li>
<li>DoubleForward1DFFT</li>
<li>DoubleSymmetricBackward1DFFT</li>
<li>DoubleForward2DFFT</li>
<li>DoubleGeneral1DFFT (for computing FFT's of data with offset &amp; strided memory layouts)</li>
</ul>
<p></code></p>
<p>All classes efficiently support FFT&#8217;s of arbitrary length, with a simple interface for both in-place and out-of-place computations.  Additionally, there is a parallel set of classes for single precision computation.</p>
<h3> Example </h3>
<p>Here is a simple example computing a 1000-point forward 1D FFT.</p>
<p><code><br />
// Create some random signal data.<br />
RandomNumberGenerator rand = new RandGenMTwist(427);<br />
DoubleVector data = new DoubleVector(1000, rand);</p>
<p>// Create the 1D real FFT instance<br />
DoubleForward1DFFT fft1000 =<br />
       new DoubleForward1DFFT(1000);</p>
<p>// Compute the FFT<br />
fft1000.FFTInPlace(data);<br />
</code></p>
<p>The FFT of Real (non-Complex) data results in a FFT signal of complex-conjugate symmetric data.  For memory efficiency this is returned to the user in a packed format (making in-place computation possible).  To facilitate the unpacking of this data, signal reader classes are supplied that support random-access indexers into the packed data.  Continuing with the example above.<br />
<code><br />
// Ask the FFT instance for the correct reader,<br />
// passing in the FFT data.<br />
DoubleSymmetricSignalReader reader<br />
    = fft1000.GetSignalReader(data);</p>
<p>// Now we can access any element from the<br />
// packed complex-conjugate symmetric FFT data set<br />
// using common random-access index sematics.<br />
DoubleComplex thirdelement = reader[2];</p>
<p>// Also the entire result can be unpacked<br />
DoubleComplex[] unpackedfft =<br />
 reader.UnpackFullToArray();<br />
</code></p>
<p>The readers are not necessary for the Complex versions of the FFT classes because FFT&#8217;s of Complex data is Complex and so no data packing is possible (for memory savings).</p>
<h3> Packing Format Notes </h3>
<p>As mentioned above, the Fourier transform of a real signal, results in a complex-conjugate symmetric signal.  This symmetry is used by CenterSpace to pack the Fourier transform into an array which is the same size as the signal array.</p>
<p>The following table describes the layout of the packed complex-conjugate symmetric signal, of length N, in one dimension.</p>
<table>
<tr> <em> For N even </em></p>
<tr> <img decoding="async" src="http://latex.codecogs.com/gif.latex?[ \ R_0 \ R_1 \ I_1 \ R_2 \ I_2 ... I_{N/2-1} \ R_{N/2} \ ]" title="[ \ R_0 \ R_1 \ I_1 \ R_2 \ I_2 ... I_{N/2-1} \ R_{N/2} \ ]" /></p>
<tr> <em> For N odd </em></p>
<tr> <img decoding="async" src="http://latex.codecogs.com/gif.latex?[ \ R_0 \ R_1 \ I_1 \ R_2 \ I_2 ... R_{{(N-1)\over{2}}-1} \ I_{{(N-1)\over{2}}-1} \ R_{(N-1)\over{2}} \ I_{(N-1)\over{2}} ]" title="[ \ R_0 \ R_1 \ I_1 \ R_2 \ I_2 ... R_{{(N-1)\over{2}}-1} \ I_{{(N-1)\over{2}}-1} \ R_{(N-1)\over{2}} \ I_{(N-1)\over{2}} ]" /><br />
</table>
<p>If we were to unroll the array, where each element in the array contains alternating real and complex values, for the case of N even, we would have an array of length 2*N.</p>
<table>
<tr>
<img decoding="async" src="http://latex.codecogs.com/gif.latex?\small [ \ R_0 \ R_1 \ I_1 \ R_2 \ I_2 ... I_{N/2-1} \ R_{N/2} \ R_{N/2-1} \ -I_{N/2-1} ... \ R_2 \ -I_2 \ R_1 \ -I_1 ]" title="\small [ \ R_0 \ R_1 \ I_1 \ R_2 \ I_2 ... I_{N/2-1} \ R_{N/2} \ R_{N/2-1} \ -I_{N/2-1} ... \ R_2 \ -I_2 \ R_1 \ -I_1 ]" /><br />
</table>
<p>The complexities of the packing in two dimensions increase substantially, and will not be recorded here.  All NMath FFT users are encourage to use the readers to unwind packed results.  Not only does this reduce coding complexity, if the underlying packing format changes, the readers will still provide the expected functionality.</p>
<p>Finally, when inverting complex-conjugate symmetric signals, using the <code> DoubleSymmetricBackward1DFFT </code> class, the input signals are expect be packed.</p>
<p>
<em> -Paul </em></p>
<p>The post <a rel="nofollow" href="https://www.centerspace.net/high-performance-fft-coming-in-the-next-release">High Performance FFT in NMath 4.0</a> appeared first on <a rel="nofollow" href="https://www.centerspace.net">CenterSpace</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.centerspace.net/high-performance-fft-coming-in-the-next-release/feed</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">160</post-id>	</item>
	</channel>
</rss>
