Blog

Posts Tagged ‘.NET moving average filter’

Excel Trendlines

Thursday, March 25th, 2010

We are sometimes asked how to reproduce the various Excel Trendline types in NMath, including printing out the form of the equation and the R2 value (coefficient of determination). Excel offers these trend types:

  • Linear Trendline
  • Logarithmic Trendline
  • Exponential Trendline
  • Power Trendline
  • Polynomial Trendline
  • Moving Average Trendline

These can all be easily computed using NMath. Let’s see how. First, let’s start with a simple data series:

DoubleVector x = new DoubleVector(11, 15, 18, 23, 26, 31, 39, 44, 54, 64, 74);
DoubleVector y = new DoubleVector(0.00476, 0.0105, 0.0207, 0.0619, 0.337, 0.74, 1.7, 2.45, 3.5, 4.5, 5.09);

These data represent the evolution of an algal bloom in the Adriatic Sea. The x-values are time expressed in days, and the y-values are the size of the surface of the bloom in mm2.

Linear Trendline

The Linear trendline fits a line, y = a*x + b, to the data. This is easily computed using the LinearRegression class in NMath Stats, as shown in the following C# code:

bool addIntercept = true;
LinearRegression lr = new LinearRegression(new DoubleMatrix(x), y, addIntercept);
LinearRegressionAnova lrAnova = new LinearRegressionAnova(lr);
double a = lr.Parameters[0];
double b = lr.Parameters[1];
double r2 = lrAnova.RSquared;
 
Console.WriteLine("y = {0}x + {1}", a, b);
Console.WriteLine("r2 = {0}", r2);

Output

y = -1.63908651994092x + 0.0913403802489977
r2 = 0.967828167696715

Note that to compute the coefficient of determination (R2), we construct a LinearRegressionAnova object from the LinearRegression instance. This class tests overall model significance for regressions computed by LinearRegression.

Logarithmic Trendline

The Logarithmic trendline fits a line to ln(x), y–that is, y = a*ln(x) + b. Again, we can use the LinearRegression class for this:

DoubleVector logX = NMathFunctions.Log(x);
lr = new LinearRegression(new DoubleMatrix(logX), y, addIntercept);
a = lr.Parameters[0];
b = lr.Parameters[1];
r2 = new LinearRegressionAnova(lr).RSquared;
 
Console.WriteLine("y = {0}ln(x) + {1}", a, b);
Console.WriteLine("r2 = {0}", r2);

Output:

y = -8.17805531083818ln(x) + 2.87283105614145
r2= 0.840393353070466

Exponential Trendline

The Exponential trendline fits the function y = a * e ^(b * x). This can also be computed using LinearRegression by fitting a line to x, ln(y). Taking the log of both sides of the function gives:

ln(y) = ln(a * e ^(b * x)) = ln(a) + bx

The following C# code does this:

DoubleVector logY = NMathFunctions.Log(y);
lr = new LinearRegression(new DoubleMatrix(x), logY, addIntercept);
a = Math.Exp(lr.Parameters[0]);
b = lr.Parameters[1];
r2 = new LinearRegressionAnova(lr).RSquared;
 
Console.WriteLine("y = {0}e^{1}x", a, b);
Console.WriteLine("r2 = {0}", r2);

Output:

y = 0.00552689525130073e^0.112876522212724x
r2 = 0.812366433832176

Note that because the intercept of the fitted line is the natural log of the “a” parameter in the exponential function, we need to take the exponential of the found intercept (using Math.Exp) to recover the value of “a”.

Power Trendline

The Power trendline fits the function y = a * x^b. This can be computed using LinearRegression by fitting a line to ln(x), ln(y). Taking the log of both sides of the equation gives:

ln(y) = ln(a * x^b) = ln(a) + b * ln(x)
lr = new LinearRegression(new DoubleMatrix(logX), logY, addIntercept);
lrAnova = new LinearRegressionAnova(lr);
a = Math.Exp(lr.Parameters[0]);
b = lr.Parameters[1];
r2 = new LinearRegressionAnova(lr).RSquared;
 
Console.WriteLine("y = {0}x^{1}", a, b);
Console.WriteLine("r2 = {0}", r2);

Output:

y = 2.46993343563889E-07x^4.11443630332377
r2 = 0.947447653331871

Again, we recover the value of parameter “a” by taking the exponential of the found intercept.

Polynomial Trendline

NMath provides class PolynomialLeastSquares for fitting a polynomial of the specified degree to paired vectors of x- and y-values. For example, this code fits a cubic to our data:

int degree = 3;
PolynomialLeastSquares pls = new PolynomialLeastSquares(degree, x, y);
 
// compute R2
DoubleVector predictions = pls.FittedPolynomial.Evaluate(x);
double regressionSumOfSquares = StatsFunctions.SumOfSquares(predictions - StatsFunctions.Mean(y));
double residualSumOfSquares = pls.LeastSquaresSolution.Residuals.TwoNormSquared();
r2 = regressionSumOfSquares / (regressionSumOfSquares + residualSumOfSquares);
 
Console.WriteLine("y = " + pls.FittedPolynomial);
Console.WriteLine("r2 = {0}", r2);

Output:

y = -4.68278158094222E-05x^3 + 0.00640408381023593x^2 - 
      0.1643720340709x + 1.12703300920657
r2 = 0.998634459376868

Note that PolynomialLeastSquares does not currently provide the R2 value, so in the code above we compute it directly.

Moving Average Trendline

Unlike the trendlines we’ve examined so far, a moving average does not fit a functional form to data. Rather, it filters data in order to smooth out noise. NMath provides class MovingWindowFilter for this purpose.

Class MovingWindowFilter replaces data points with a linear combination of the data points immediately to the left and right of each point, based on a given set of coefficients to use in the linear combination. Static class methods are provided for generating coefficients to implement a moving average filter and a Savitzky-Golay smoothing filter.

For example, the following C# code filters the data using a window of width 3 (the “period” parameter in Excel):

int numberLeft = 1;
int numberRight = 1;
DoubleVector movingAvgCoefficents = MovingWindowFilter.MovingAverageCoefficients(numberLeft, numberRight);
MovingWindowFilter filter = new MovingWindowFilter(numberLeft, numberRight, movingAvgCoefficents);
DoubleVector yfiltered = filter.Filter(y, MovingWindowFilter.BoundaryOption.DoNotFilterBoundaryPoints);
Console.WriteLine("yfiltered = " + yfiltered);

Output:

yfiltered = [ 0.00476 0.0119866666666667 0.0310333333333333
              0.139866666666667 0.379633333333333 0.925666666666667 1.63
              2.55 3.48333333333333 4.36333333333333 5.09 ]

Advanced Curve Fitting

That covers the simple trendlines produced by Excel. For advanced curve fitting, NMath provides class OneVariableFunctionFitter which fits arbitrary one variable functions to a set of points. (You must supply at least as many data points to fit as your function has parameters.) In a future post, we’ll take a look at this class. In the meantime, see this page for an example demonstrating the use of OneVariableFunctionFitter, including how to define your own function.

Ken