Advanced Curve Fitting using Excel and NMath

In recent blog posts, we have discussed how to call CenterSpace’s Libraries from Excel. In his March 2010 blog post, CenterSpace’s Ken Baldwin demonstrated how to replicate Excel’s existing Trendline functions using C# and NMath. In this post, we will demonstrate the advanced curve fitting functions available in the CenterSpace libraries that could be easily be integrated into Excel analysis work.

Curve fitting is one of the most practical applications of mathematics as we are often asked to explain or make predictions based on a collection of data points. For example, if we collect a series of changing data values over time, we look to explain the relationship that time effects the generated values or in mathematical terms y=f(x). Here we are using x to represent time values and f(x) to represent the relationship or function that generates the resulting values or y. So if we can find a function f(x) that represents a good fit of the data we should be able to predict the result at any moment in time.

With this in mind, let us get started with some data points (x, y) that we have collected. I have entered the following values in an Excel spreadsheet.

We can now use Excel’s charting function to create the following XY scatterplot with our data:

At this point, we can add an Excel trendline to get the best fit it can provide. By right clicking on a data value in our chart we will get the option to add a trendline. Choose a 2nd order polynomial and these options.

Name the trendline “2nd Order Polynomial” and check “Display equation on chart” and “Display R-squared value on chart”.

Excel calculates and plots the line while returning the equation and the R2 value. If our R2 equals 1 we would have found a perfect fit. For a second order polynomial Excel returned a value of 0.8944 which means that roughly 10.6 percent (1-.894) are not on this line.

If we continue increasing our polynomial orders up to the maximum of six we can achieve the best R2 value of 0.9792, but look at the curve we have fitted to these points.

A better fit might be an exponential function so let us try Excel’s trendline option using exponentials.

Clearly the results are visually a better fit but the R2 value tells us that over 15% of the data points are not on this line. This pretty much represents the best we can do with Excel’s trendline functions for our data.

Looking to CenterSpace’s NMath and NStat libraries to give us more robust analysis, we can utilize more powerful curve fitting tools quickly and with little effort.

Using the linkage provided by ExcelDNA that we examined in our previous posts, we can create the following C# code for our ExcelDNA text file.

 fitter = new  OneVariableFunctionFitter
( AnalysisFunctions.FourParameterLogistic );
  DoubleVector solution = fitter.Fit(TempVx, TempVy, TempVs);
  return solution.ToArray();
 }

 [ExcelFunction(Description="Four parameterized R2")]
 public static double NOneVarFunctFitFourR2(double[] xValues, double[] yValues, double[] start)
 {
  DoubleVector TempVx = new DoubleVector(xValues);
  DoubleVector TempVy = new DoubleVector(yValues);
  DoubleVector TempVs = new DoubleVector(start);
  OneVariableFunctionFitter
 fitter = new  OneVariableFunctionFitter
( AnalysisFunctions.FourParameterLogistic );
  DoubleVector solution = fitter.Fit(TempVx, TempVy, TempVs);
  GoodnessOfFit gof = new GoodnessOfFit(fitter, TempVx, TempVy, solution);
  return gof.RSquared;
 }
}
]]>

As you can see by the code, I have chosen to use NMath’s OneVariableFunctionFitter with the FourParameterLogistic predefined generalized function.

From CenterSpace’s documentation, we get the above equation and need to solve for the model parameters a, b, c, d .

The OneVariableFunction call will require us to provide a starting guess along with the ranges containing the xValues and yValues. The following screen shows us making the function call from Excel with the necessary parameters.

After computing these values, we can call for the R2 value and generate some points to be plotted.

Note that I choose to hide the rfows from r17 to r107 for display purposes. You will need to have them unhid for the chart to look right. As you can see we returned the best R2 value so far at 0.9923.

In the next screen, we will have added the series to our chart and drawn a line through our calculated points.

This example illustrates the ease that Excel can use the power of NMath curve fitting routines to compute accurate fits to a collection of data.

Mike Magee

Leave a Reply

Your email address will not be published. Required fields are marked *

Top