Finding Peaks in Data with NMath

Finding peaks in experimental data is a very common computing activity, and because of its intuitive nature there are many established techniques and literally dozens of heuristics built on top of those. CenterSpace Software has jumped into this algorithmic fray with a new peak finding class based on smooth Savitzy-Golay polynomials. If you are not familiar with Savitzy-Golay polynomial smoothing, take a look at our previous blog article. When used for peak finding, we simply report the zero crossing derivatives of the smoothing, locally-fit, Savitzy-Golay polynomials. This is a very fast peak finder because the Savitzy-Golay smoothing algorithm can be slightly altered to directly report the first derivatives, which remarkably, can be done with a convolve operation. Because this peak finder is based on Savitzy-Golay polynomials, it requires that the data be sampled at regular intervals.

An Introductory Example

Suppose we have the following sampled data.

graph of example data
Simple data with a single peak.

The following C# code builds the test data and locates the single peak in this simple data set.

Using CenterSpace.NMath.Core;

DoubleVector d = new DoubleVector(0,-1, 1.5, 2, 3, 4, 4.5, 4, 3, 2.2, 1.0, -3.0, 0);
PeakFinderSavitzkyGolay pfa = new PeakFinderSavitzkyGolay(d, 5, 4);

The peak data reported back by the Savitzky-Golay peak finder can be either the (x,y) location of the peak, or the index lying on or preceding the found peak.

Peak found at x:6.00, y:4.50
Peak found at index: 6

The peak finding PeakFinderSavitzkyGolay class requires three parameters: a data vector, the filter window width, and the degree of the smoothing polynomial.

A More Complex Example

To build a peak finder on top on this class for some domain specific data, we need to understand the basic parameters that control which peaks are reported. For this second example, let’s use the complex signal show below, which contains a mixture of isolated peaks, adjacent peaks, and narrow and broad peaks.

Example peaks
Peaks of various widths and heights

This signal also includes some very subtle peaks near x = 23.5 and x = 29. Applying the PeakFinderSavitzkyGolay class as shown below, all 15 peaks are found (the top of the first peak is off the scale in our image).

PeakFinderSavitzkyGolay pfa = new PeakFinderSavitzkyGolay(signal, 10, 5);
pfa.AbscissaInterval = 0.1;
Console.WriteLine("Number of peaks found: " + pfa.NumberPeaks.ToString());
Console.WriteLine(String.Format("Peak found at x:{0:0.00}, y:{1:0.00}", pfa[4].X, pfa[4].Y));

The position of the fifth peak is reported to the console, using indexing notation, pfa[4].X, pfa[4].Y, on the peak finder object.

Number of peaks found: 15
Peak found at x:9.03, y:0.35

By setting the AbsciassaInterval to the signal sample rate (in this example 0.1 seconds) the class can scale the x-axis according to your units, and supply the (x, y) positions of all found peaks. If you only need the peak location down to the resolution of the sample interval, the peak finder will just report the index that either precedes or lies on the peak abscissa location. This avoids the an extra interpolation step to locate the inter-sample peak abcissa, and increases performance.

Now supposing that we want to eliminate all broad peaks and can increase the peak finders selectivity.

PeakFinderSavitzkyGolay pfa = new PeakFinderSavitzkyGolay(signal, 10, 5);
pfa.AbscissaInterval = 0.1;
pfa.SlopeSelectivity = 0.003;

The property SlopeSelectivity defaults to zero, causing the peak finder to report all found peaks. By increasing the selectivity a hair to 0.003, the peak finder no longer reports the last three peaks. Both the two subtle peaks are eliminated along with the final broad peak near 26. The slope selectivity is simply the slope of the smoothed first derivative of the Savitzy-Golay polynomial at each zero crossing – so as it’s value is increased only the more pronounced peaks (with steeply diving smoothed first derivatives) are reported.

If we want to heavily filter the peaks and only see the peaks of the general trend line, we could increase the filter width dramatically from 10 to 80.

PeakFinderSavitzkyGolay pfa = new PeakFinderSavitzkyGolay(signal, 80, 5);
pfa.AbscissaInterval = 0.1;
pfa.SlopeSelectivity = 0.0;

Using these parameters, we find only the four peaks that capture the low frequency variation of the signal above.

Peak found at x:7.53, y:0.34
Peak found at x:14.27, y:0.26
Peak found at x:20.28, y:0.25
Peak found at x:26.76, y:0.24

Note that the y-values here correspond to the smoothed, fitted polynomial not the actual data at the x value. This demonstrates the ability of the this peak finder to act as a low pass filter, which can be use to sort out the peaks of interest from higher frequency noise. This is why Savitzy-Golay data smoothing is often used for baseline subtraction (for example in pre-processing mass spectrometry data before peak finding).

Summary & Performance

The first example used a smoothing polynomial of degree 4, and the second example a polynomial degree of 5. Experience shows that typically a polynomial degree between 3-7 will be best suited to smooth measurement data and correctly locate peaks. However, there is no hard and fast rule, so feel free to experiment. Having said that, the polynomial degree must always be strictly less than the window width or an exception will be thrown.

polynomial degree < window width

Because, after object construction, this peak finder boils down to a convolution, it's performance is far better that many peak finders. On my 2.8Ghz Intel i7 Quad Core, I can find peaks in 3 million data points in about 80 ms, and in 30 million data that requires about 700ms. That would give us the ability to do peak finding in a real time system running with a sample rate of ~43 MHz. In a real system there would likely be other peak filtering overhead, but we would still be able to process data at a very high rate - suitable for most real-time data sampling applications.

Happy Computing

-Paul Shirkey

Leave a Reply

Your email address will not be published. Required fields are marked *