fft C# Archives - CenterSpace

FFT Performance Benchmarks in .NET

Paul Shirkey — Wed, 05 Jan 2011 20:07:46 +0000

We’ve had a number of inquires about the CenterSpace FFT benchmarks, so I thought I would code up a few tests and run them on my machine. I’ve included our FFT performance numbers and the code that generated those numbers so you can try them on your machine. (If you don’t have NMath, you’ll need to download the eval version). I also did a comparison of 1 dimensional real DFTs, with FFTW, one of the fastest desktop FFT implementations available.

Benchmarks

These benchmarks were run on a 2.80 Ghz, Intel Core i7 CPU, with 4Gb of memory installed.

The clock resolution is 0.003 ns
1024 point, forward, real FFT required 4361.364 ns, Mflops 4069
1000 point, forward, real FFT required 5338.785 ns, Mflops 3235
4096 point, forward, real FFT required 21708.565 ns, Mflops 3924
4095 point, forward, real FFT required 43012.010 ns, Mflops 1980
1024 * 1024 point, forward, real FFT required 15.635 ms, Mflops 2324

I’m estimating the megaflop performance during the FFT using:

This is the asymptotic number of floating point operations for the radix-2 Cooley-Tukey FFT algorithm. This FFT MFlop estimate is used in a number of FFT benchmark reports and serves as a good basis for comparing algorithm efficiency.

As expected we take a performance hit for non-power of 2 lengths, but due to various optimizations for processing prime length FFT kernels (3, 5, 7 & 11), the performance hit is minimal in many cases. The 1000-point FFT has prime factors (2)(2)(2)(5)(5)(5), and the 4095-point FFT has prime factors (3)(3)(5)(7)(13), so those larger prime factors in the 4095-point FFT cost us some performance. Typically, user’s zero pad their data vectors to a power-of-two length to get optimal performance.

Side by side comparison with FFTW

FFTW claims to be the “Fastest Fourier Transform in the West”, and is a clever, high performance implementation of the discrete Fourier transform. This algorithm is shipped with all copies of MATLAB. FFTW is implemented in C and has the reputation as being one of the fastest desktop FFT algorithm.

Both the NMath FFT and the FFTW have a pre-computation setup that establishes the best algorithmic approach for the DFT at hand, before computing any FFT’s. This pre-computational phase is not included in the times below. In the case of the NMath FFT classes, this pre-computational phase in done in the class constructor; Therefore users must avoid constructing NMath FFT classes in tight loops for best performance (as shown in the benchmark code below). Below is a small side-by-side comparison between FFTW and NMath’s FFT (using the numbers from above).



 Comparison of a forward, real, out-of-place FFT. 

 
 FFT length   FFTW   NMATH FFT 

 
 1024   4.14 μs   4.36 μs  

 
 1000   5.98 μs   5.33 μs  

 
 4096   20.31 μs   21.71 μs  

 
 4095   49.90 μs   43.01 μs  

 
 1024^2   17.16 ms   15.63 ms

Comparison of a forward, real, out-of-place FFT.
FFT length	FFTW	NMATH FFT
1024	4.14 μs	4.36 μs
1000	5.98 μs	5.33 μs
4096	20.31 μs	21.71 μs
4095	49.90 μs	43.01 μs
1024^2	17.16 ms	15.63 ms

Clearly NMATH is very competitive with, and at times out-performs FFTW for real FFT’s of both power-of-2 length signals and otherwise. I chose 1D real signals as a test case because this is one of the most frequent use cases of our NMATH FFT library.

On a subjective scale, running a 1024-point FFT on a desktop commodity machine at around (an algorithm normalized) 4 GFlops is amazing. That means that in a real time measurement situation, users can compute 1024-point FFT’s at around 220kHz – all with just a couple of lines of code.

Happy Computing,
Paul

Benchmark Code

 public void BenchMarks()
    {
      Double numberTrials = 10000;
      Double flops;

      Stopwatch timer = new System.Diagnostics.Stopwatch();
      Console.WriteLine( String.Format("The clock resolution is {0:0.000} ns", Stopwatch.Frequency / 1000000000.0 ) );

      // Snip one - power of two
      RandGenUniform rand = new RandGenUniform();
      DoubleForward1DFFT fft = new DoubleForward1DFFT( 1024 );
      DoubleVector realsignal = new DoubleVector( 1024, rand );

      DoubleVector result = new DoubleVector( 1024 * 1024 );

      timer.Reset();
      for( int i = 0; i < numberTrials; i++ )
      {
        timer.Start();
        fft.FFT( realsignal, ref result );
        timer.Stop();
      }
      flops = (2.5 * 1024 * NMathFunctions.Log(1024)) / (((timer.ElapsedTicks / numberTrials) / Stopwatch.Frequency) * 1000000.0 );
      Console.WriteLine( String.Format( "1024 point, forward, real FFT required {0:0.000} ns, Mflops {1:0}", ( ( timer.ElapsedTicks / numberTrials ) / Stopwatch.Frequency ) * 1000000000.0, flops ) );

      // length 1000
      fft = new DoubleForward1DFFT( 1000 );
      realsignal = new DoubleVector( 1000, rand );

      timer.Reset();
      for( int i = 0; i < numberTrials; i++ )
      {
        timer.Start();
        fft.FFT( realsignal, ref result );
        timer.Stop();
      }
      flops = ( 2.5 * 1000 * NMathFunctions.Log( 1000 ) ) / ( ( ( timer.ElapsedTicks / numberTrials ) / Stopwatch.Frequency ) * 1000000.0 );
      Console.WriteLine( String.Format( "1000 point, forward, real FFT required {0:0.000} ns, Mflops {1:0}", ( ( timer.ElapsedTicks / numberTrials ) / Stopwatch.Frequency ) * 1000000000.0, flops ) );

      // length 4096
      fft = new DoubleForward1DFFT( 4096 );
      realsignal = new DoubleVector( 4096, rand );

      timer.Reset();
      for( int i = 0; i < numberTrials; i++ )
      {
        timer.Start();
        fft.FFT( realsignal, ref result );
        timer.Stop();
      }
      flops = ( 2.5 * 4096 * NMathFunctions.Log( 4096 ) ) / ( ( ( timer.ElapsedTicks / numberTrials ) / Stopwatch.Frequency ) * 1000000.0 );
      Console.WriteLine( String.Format( "4096 point, forward, real FFT required {0:0.000} ns, Mflops {1:0}", ( ( timer.ElapsedTicks / numberTrials ) / Stopwatch.Frequency ) * 1000000000.0, flops ) );

      // length 4095
      fft = new DoubleForward1DFFT( 4095 );
      realsignal = new DoubleVector( 4095, rand );

      timer.Reset();
      for( int i = 0; i < numberTrials; i++ )
      {
        timer.Start();
        fft.FFT( realsignal, ref result );
        timer.Stop();
      }
      flops = ( 2.5 * 4095 * NMathFunctions.Log( 4095 ) ) / ( ( ( timer.ElapsedTicks / numberTrials ) / Stopwatch.Frequency ) * 1000000.0 );
      Console.WriteLine( String.Format( "4095 point, forward, real FFT required {0:0.000} ns, Mflops {1:0}", ( ( timer.ElapsedTicks / numberTrials ) / Stopwatch.Frequency ) * 1000000000.0, flops ) );


      // length 1M
      fft = new DoubleForward1DFFT( 1024 * 1024 );
      realsignal = new DoubleVector( 1024 * 1024, rand );

      timer.Reset();
      for( int i = 0; i < 100; i++ )
      {
        timer.Start();
        fft.FFT( realsignal, ref result );
        timer.Stop();
      }
      flops = ( 2.5 * 1024 * 1024 * NMathFunctions.Log( 1024 * 1024 ) ) / ( ( ( timer.ElapsedTicks / 100.0 ) / Stopwatch.Frequency ) * 1000000.0 );
      Console.WriteLine( String.Format( "Million point (1024 * 1024), forward, real point FFT required {0:0.000} ms, Mflops {1:0}", ( ( timer.ElapsedTicks / 100.0 ) / Stopwatch.Frequency ) * 1000.0, flops ) );

    }

The post FFT Performance Benchmarks in .NET appeared first on CenterSpace.

Power Spectral Density with NMath

Paul Shirkey — Wed, 13 Jan 2010 19:35:53 +0000

Application Note

Computing the Power Spectrum in C#

We’ve had several customers ask about computing the PSD in C# with NMath, so I thought it was time for a post on the subject. The power spectral density provides an estimate of the power present within each slice of spectrum, and is presented as graph of the signal power versus frequency. It’s a common signal processing calculation across many fields from acoustics to chemistry, and can provide insight into periodicities contained within a time domain signal.

For stationary, square summable-signals, the PSD is expressed as,

where F(w) is the Fourier transform of the time domain signal f(t), and T is the width of the time domain sampled signal. Naturally we can never sample the entire signal, so calculations of the PSD (power spectrum density) are all estimates. Techniques for estimating the PSD can be divided into two classes, parametric (model based), and non-parametic (non-model based). We will discuss only non-parametic techniques here. For discrete signals that are not square-summable (i.e. non-stationary signals – and so the Fourier transform does not exist), estimates of the power spectral density can be derived from,

Which is the Fourier transform of the autocorrelation as the correlation width of the sampled signal tends to infinity. For a concise derivation of both of these formulas read these short lecture notes.

Computing the PSD in C# with NMath

Concentrating on stationary (periodic) signals, the PSD is most efficiently computed by applying smoothing to discrete periodograms .

Where n is the number of signal samples. Each point in the periodogram represents the relative contribution to the variance in the time domain signal at that frequency. (Visualization provided courtesy of Infragistics.)

An example periodogram of sunspot data. This is smoothed in some fashion to estimate the PSD.

Another estimation technique involves computing multiple windowed periodograms and then combining these together to get a progressively more accurate estimate (Welch’s Method, similarly MTM with Slepian windows). The time domain signal should be detrended before any of these operations.

The following C# code estimates the PSD by smoothing the periodogram using a Savitzy-Golay zero phase shift filter.

using CenterSpace.NMath.Core;

/* Estimate the Power Spectrum Density in C# / .NET */
public DoubleVector PSDEstimate(DoubleVector signal)
{
  // Detrend the periodic data
  signal = signal - NMathFunctions.Mean(signal); 

  // Compute the periodogram 
  var forwardFFT = new DoubleForward1DFFT( signal.Length); 
  var packedFFT = forwardFFT.FFT( signal ); 
  DoubleSymmetricSignalReader reader = forwardFFT.GetSignalReader( packedFFT ); 
  DoubleComplexVector fft = reader.UnpackSymmetricHalfToVector();

  // Square of the absolute value, scale the result by the length, 1/N.
  DoubleVector pg = NMathFunctions.Pow( NMathFunctions.Abs(fft), 2.0) / fft.Length; 
  
  // Smooth w/ filter of width of 7, & polynomial degree of 5. 
  var sgf = new SavitzkyGolayFilter(3, 3, 5); 
  DoubleVector smoothedPSD = sgf.Filter(pg);

  return smoothedPSD;
}

Or using a Daniell filter and the MovingWindowFilter class to smooth the periodogram.

MovingWindowFilter mwf = 
  new MovingWindowFilter(2, 2, new DoubleVector(.125, .25, .25, .25, .125));
mwf.WindowBoundaryOption = 
  MovingWindowFilter.BoundaryOption.PadWithZeros;

DoubleVector smoothedPSDaniell = mwf.Filter(data);

More information about NMath’s FFT class set can be found on our fft landing page. The class SavitzkyGolayFilter will be avaliable in our next release, however, current users can use the MovingWindowFilter class with Savitzky-Golay coefficients generated via a provided helper method. The class SavitzyGolayFilter is available in the current release.

Happy Computing,

Paul

References

Haykin, S. “Adaptive Filter Theory”. Prentice-Hall, Inc. 1996.

The post Power Spectral Density with NMath appeared first on CenterSpace.