Blog

Archive for the ‘NMath Tutorial’ Category

Clearing a vector

Wednesday, November 9th, 2011

A customer recently asked us for the best method to zero out a vector. We decided to run some tests to find out. Here are the five methods we tried, with any drawbacks and the timings.

These were performed on a DoubleVector, v,  of length 10,000,000.

1) Create a new vector. This isn’t really clearing out an existing vector but we thought we should include it for completeness.

 DoubleVector v2 = new DoubleVector( v.Length, 0.0 );

The big drawback here is that you’re creating new memory. Time: 419.5ms

2) Probably the first thing to come to mind is to simply iterate through the vector and set everything to zero.

for ( int i = 0; i < v.Length; i++ )
{
  v[i] = 0.0;
}

We have to do some checking in the index operator. No new memory created. Time: 578.5ms

3) In some cases, you could iterate through the underlying array of data inside the DoubleVector.

for ( int i = 0; i < v.DataBlock.Data.Length; i++ )
{
  v.DataBlock.Data[i] = 0.0;
}

This is a little less intuitive. And, very importantly, it will not work with many views into other data structures. For example, a row slice of a matrix. However, it’s easier for the CLR to optimize this loop. Time: 173.5ms

4) We can use the power of Intel’s MKL to multiply the vector by zero.

 v.Scale( 0.0 );

Scale() does this in-place. No new memory is created. In this example, we assume that MKL has already been loaded and is ready to go which is true if another MKL-based NMath call was already made or if NMath was initialized. This method will work on all views of other data structures. Time: 170ms

5) This surprised us a bit but the best method we could find was to clear out the underlying array using Array.Clear() in .NET

 Array.Clear( v.DataBlock.Data, 0, v.DataBlock.Data.Length );

This creates no new memory. However, this will not work with non-contiguous views. However, this method is very fast. Time: 85.8ms

To make this much simpler, we have created a Clear() method and a Clear( Slice) method on vectors and matrices. It will do the right thing in the right circumstance. It will be released in NMath 5.2 in 2012.

And, here is the code we used for this test:

using System;
using CenterSpace.NMath.Core;

namespace Test
{
  class ClearVector
  {
    static int size = 100000000;
    static int runs = 10;
    static int methods = 5;

    static void Main( string[] args )
    {
      System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
      DoubleMatrix times = new DoubleMatrix( runs, methods );
      NMathKernel.Init();

      for ( int run = 0; run < runs; run++ )
      {
        Console.WriteLine( "Run {0}...", run );
        DoubleVector v = null;

        // Create a new one
        v = new DoubleVector( size, 1.0, 2.0 );
        sw.Start();
        DoubleVector v2 = new DoubleVector( v.Length, 0.0 );
        sw.Stop();
        times[run, 0] = sw.ElapsedMilliseconds;
        Console.WriteLine( Assert( v2 ) );

        // iterate through vector
        v = new DoubleVector( size, 1.0, 2.0 );
        sw.Reset();
        sw.Start();
        for ( int i = 0; i < v.Length; i++ )
        {
          v[i] = 0.0;
        }
        sw.Stop();
        times[run, 1] = sw.ElapsedMilliseconds;
        Console.WriteLine( Assert( v ) );

        // iterate through array
        v = new DoubleVector( size, 1.0, 2.0 );
        sw.Reset();
        sw.Start();
        for ( int i = 0; i < v.DataBlock.Data.Length; i++ )
        {
          v.DataBlock.Data[i] = 0.0;
        }
        sw.Stop();
        times[run, 2] = sw.ElapsedMilliseconds;
        Console.WriteLine( Assert( v ) );

        // scale
        v = new DoubleVector( size, 1.0, 2.0 );
        sw.Reset();
        sw.Start();
        v.Scale( 0.0 );
        sw.Stop();
        times[run, 3] = sw.ElapsedMilliseconds;
        Console.WriteLine( Assert( v ) );

        // Array Clear
        v = new DoubleVector( size, 1.0, 2.0 );
        sw.Reset();
        sw.Start();
        Array.Clear( v.DataBlock.Data, 0, v.DataBlock.Data.Length );
        sw.Stop();
        times[run, 4] = sw.ElapsedMilliseconds;
        Console.WriteLine( Assert( v ) );
        Console.WriteLine( times.Row( run ) );
      }
      Console.WriteLine( "Means: " + NMathFunctions.Mean( times ) );
    }

    private static bool Assert( DoubleVector v )
    {
      if ( v.Length != size )
      {
        return false;
      }
      for ( int i = 0; i < v.Length; ++i )
      {
        if ( v[i] != 0.0 )
        {
          return false;
        }
      }
      return true;
    }
  }
}

- Trevor

Share

Initializing NMath

Wednesday, November 9th, 2011

NMath uses Intel’s Math Kernel Library (MKL) internally. This code contains native, optimized code to wring out the best performance possible.

There is a one-time delay when the appropriate x86 or x64 native code is loaded. This cost can be easily controlled by the developer by using the NMathKernel.Init() method. Please see Initializing NMath for more details.

- Trevor

Share

Fitting Geometric Primitives to Points Using Nonlinear Least Squares

Tuesday, August 9th, 2011

We were recently contacted by a customer looking for help on how to use NMath to fit geometric primitives to clouds of 2D points. The solution is to cast the problem as a minimization. In the space of the parameters which define the geometric object, minimize the residuals with respect to the 2D points. NMath provides .NET classes for solving nonlinear least squares problems such as this, using the Trust-Region or Levenberg-Marquardt methods.

For instance, let’s try fitting a circle to a set of points. A circle is defined by three parameters: a center (a,b), and a radius r. The circle is all points such that (x-a)^2 + (y – b)^2 = r^2. To setup the minimization problem, we first need to define a function which given a set of circle parameters, returns the residuals with respect to a cloud of 2D points. There are several methods for doing this in NMath. In the following C# code, we extend the abstract base class DoubleMultiVariableFunction, and override the Evaluate() method:

public class CircleFitFunction : DoubleMultiVariableFunction
{
 
  public CircleFitFunction( DoubleVector x, DoubleVector y )
    : base( 3, x.Length )
  {
    if( x.Length != y.Length )
      throw new Exception( "Unequal number of x,y values." );
 
    X = x;
    Y = y;
  }
 
  public DoubleVector X { get; internal set; }
  public DoubleVector Y { get; internal set; }
 
  public override void Evaluate( DoubleVector parameters, ref DoubleVector residuals )
  {
    // parameters of circle with center (a,b) and radius r
    double a = parameters[0];
    double b = parameters[1];
    double r = parameters[2];
 
    for( int i = 0; i < X.Length; i++ )
    {
      // distance of point from circle center
      double d = Math.Sqrt( Math.Pow( X[i] - a, 2.0 ) + Math.Pow( Y[i] - b, 2.0 ) );
 
      residuals[i] = d - r;
    }
  }
}

The constructor takes the set of x,y points to fit, which are stored on the class. Note that we call the base constructor with the dimensions of the domain and range of the function:

: base( 3, x.Length )

In this case, the function maps the 3-dimensional space of the circle parameters to the x.Length-dimension space of the residuals.

Next we override the Evaluate() method, which takes an input vector and output vector passed by reference. Our implementation computes the residual for each point with respect to a circle defined by the given parameters. We calculate the distance, d, of each point from the circle center; the residual is then equal to d – r. The nonlinear least squares method will minimize the L2 norm of this function.

To demonstrate the fitting process, let’s first start with a set of points generated from a circle of known parameters. For example, this C# code creates 20 x,y points evenly spaced around a circle with center (0.5, 0.25) and radius 2, plus some added noise:

int n = 20;
DoubleVector x = new DoubleVector( n );
DoubleVector y = new DoubleVector( n );
 
double a = 0.5;
double b = 0.25;
double r = 2;
 
RandGenUniform noise = new RandGenUniform( -.75, .75 );
for( int i = 0; i < n; i++ )
{
  double t = i * 2 * Math.PI / n;
  x[i] = a + r * Math.Cos( t ) + noise.Next();
  y[i] = b + r * Math.Sin( t ) + noise.Next();
}

Now that we have our function defined and some points to fit, performing the minimization takes only a few lines of code. This C# code finds the minimum and prints the solution and final residual:

CircleFitFunction f = new CircleFitFunction( x, y );
TrustRegionMinimizer minimizer = new TrustRegionMinimizer();
DoubleVector start = new DoubleVector( "0.1 0.1 0.1" );
DoubleVector solution = minimizer.Minimize( f, start );
 
Console.WriteLine( "solution = " + solution );
Console.WriteLine( "final residual = " + minimizer.FinalResidual );

Sample output:

solution = [ 0.446122611523468 0.175483486509012 1.93511389538286 ]
final residual = 1.62414259527024

Starting from point (0.1, 0.1, 0.1) in the parameter space of the circle, we minimized the sum of the squared residuals with respect to the (x,y) points. In the run above, the minimum found was center (0.45, 0.18) and radius 1.9, close to the actual parameters which generated our (noisy) data. To visually inspect the fit, we can use NMath with the Microsoft Chart Controls for .NET.

Fitted Circle

Now that we’re sure the minimization is working as desired, let’s try it again with some random points:

int n = 10;
RandGenUniform rng = new RandGenUniform( -1, 2 );
DoubleVector x = new DoubleVector( n, rng );
DoubleVector y = new DoubleVector( n, rng );
 
CircleFitFunction f = new CircleFitFunction( x, y );
TrustRegionMinimizer minimizer = new TrustRegionMinimizer();
DoubleVector start = new DoubleVector( "0.1 0.1 0.1" );
DoubleVector solution = minimizer.Minimize( f, start );

Again, plotting the solution:

Fitted CircleIn some runs, the fit may seem counter-intuitive. For example:

Fitted Circle

But if you think about it, this sort of fit makes perfect sense, given how we defined our fit function. We’re asking the minimizer to minimize the residuals with respect to the circle perimeter, and allowing it vary the circle center and radius however it can to achieve that goal. In an extreme case, if our points fall approximately along a line, the best fit will be a circle with a very large radius, so that the circle perimeter is nearly linear as it passes through the points.

Fitted CircleIn this case, the fitted circle has center (428.0, -757.8) and radius 870.6.  If we wish to avoid solutions such as these, we could define our fit function differently. For example, we might constrain the circle center to be the center of mass of the points, and vary only the radius.

Obviously, a similar technique can be used to fit other geometric primitives, though as the complexity of the shape increases, so does the complexity of the fit function.

Ken

Share

Advanced Curve Fitting using Excel and NMath

Monday, March 14th, 2011

In recent blog posts, we have discussed how to call CenterSpace’s Libraries from Excel. In his March 2010 blog post, CenterSpace’s Ken Baldwin demonstrated how to replicate Excel’s existing Trendline functions using C# and NMath. In this post, we will demonstrate the advanced curve fitting functions available in the CenterSpace libraries that could be easily be integrated into Excel analysis work.

Curve fitting is one of the most practical applications of mathematics as we are often asked to explain or make predictions based on a collection of data points. For example, if we collect a series of changing data values over time, we look to explain the relationship that time effects the generated values or in mathematical terms y=f(x). Here we are using x to represent time values and f(x) to represent the relationship or function that generates the resulting values or y. So if we can find a function f(x) that represents a good fit of the data we should be able to predict the result at any moment in time.

With this in mind, let us get started with some data points (x, y) that we have collected. I have entered the following values in an Excel spreadsheet.

We can now use Excel’s charting function to create the following XY scatterplot with our data:

At this point, we can add an Excel trendline to get the best fit it can provide. By right clicking on a data value in our chart we will get the option to add a trendline. Choose a 2nd order polynomial and these options.

Name the trendline “2nd Order Polynomial” and check “Display equation on chart” and “Display R-squared value on chart”.

Excel calculates and plots the line while returning the equation and the R2 value. If our R2 equals 1 we would have found a perfect fit. For a second order polynomial Excel returned a value of 0.8944 which means that roughly 10.6 percent (1-.894) are not on this line.

If we continue increasing our polynomial orders up to the maximum of six we can achieve the best R2 value of 0.9792, but look at the curve we have fitted to these points.

A better fit might be an exponential function so let us try Excel’s trendline option using exponentials.

Clearly the results are visually a better fit but the R2 value tells us that over 15% of the data points are not on this line. This pretty much represents the best we can do with Excel’s trendline functions for our data.

Looking to CenterSpace’s NMath and NStat libraries to give us more robust analysis, we can utilize more powerful curve fitting tools quickly and with little effort.

Using the linkage provided by ExcelDNA that we examined in our previous posts, we can create the following C# code for our ExcelDNA text file.

<![CDATA[
 
using System;
using ExcelDna.Integration;
using CenterSpace.NMath.Core;
using CenterSpace.NMath.Matrix;
using CenterSpace.NMath.Analysis;
using CenterSpace.NMath.Stats;
 
public class NMathExcelCurveFit
{
 
 [ExcelFunction(Description="Four parameterized Fit")]
 public static double[] NOneVarFunctFitFour(double[] xValues, double[] yValues, double[] start)
 {
  DoubleVector TempVx = new DoubleVector(xValues);
  DoubleVector TempVy = new DoubleVector(yValues);
  DoubleVector TempVs = new DoubleVector(start);
  OneVariableFunctionFitter
<TrustRegionMinimizer> fitter = new  OneVariableFunctionFitter
<TrustRegionMinimizer>( AnalysisFunctions.FourParameterLogistic );
  DoubleVector solution = fitter.Fit(TempVx, TempVy, TempVs);
  return solution.ToArray();
 }
 
 [ExcelFunction(Description="Four parameterized R2")]
 public static double NOneVarFunctFitFourR2(double[] xValues, double[] yValues, double[] start)
 {
  DoubleVector TempVx = new DoubleVector(xValues);
  DoubleVector TempVy = new DoubleVector(yValues);
  DoubleVector TempVs = new DoubleVector(start);
  OneVariableFunctionFitter
<TrustRegionMinimizer> fitter = new  OneVariableFunctionFitter
<TrustRegionMinimizer>( AnalysisFunctions.FourParameterLogistic );
  DoubleVector solution = fitter.Fit(TempVx, TempVy, TempVs);
  GoodnessOfFit gof = new GoodnessOfFit(fitter, TempVx, TempVy, solution);
  return gof.RSquared;
 }
}
]]>

As you can see by the code, I have chosen to use NMath’s OneVariableFunctionFitter with the FourParameterLogistic predefined generalized function.

From CenterSpace’s documentation, we get the above equation and need to solve for the model parameters a, b, c, d .

The OneVariableFunction call will require us to provide a starting guess along with the ranges containing the xValues and yValues. The following screen shows us making the function call from Excel with the necessary parameters.

After computing these values, we can call for the R2 value and generate some points to be plotted.

Note that I choose to hide the rfows from r17 to r107 for display purposes. You will need to have them unhid for the chart to look right. As you can see we returned the best R2 value so far at 0.9923.

In the next screen, we will have added the series to our chart and drawn a line through our calculated points.

This example illustrates the ease that Excel can use the power of NMath curve fitting routines to compute accurate fits to a collection of data.

Mike Magee

Share

Using NMath with Microsoft Chart Controls for .NET

Wednesday, March 9th, 2011

In 2007, Microsoft acquired the Dundas chart components, in order to deliver data visualization directly within Microsoft products. In October 2008, they released the Microsoft Chart Controls for .NET, which includes the Dundas ASP.NET and Windows Forms Chart controls. The Chart controls are available as a separate download for .NET 3.5. Beginning in .NET 4.0, the Chart controls are part of the .NET Framework. To use the Chart controls, add a reference to System.Windows.Forms.DataVisualization.

A Chart object contains one or more ChartAreas, each of which contain one or more data Series. Each Series has an associated chart type, and a DataPoint collection. DataPoints can be manually appended or inserted into the collection, or added automatically when a series is bound to a datasource using either the DataBindY() or DataBindXY() method. Since any IEnumerable can act as a datasource, it’s easy to use an NMath vector or vector view (of a matrix row or column, for example) as a datasource.

For example, suppose we want to create a scatter plot of the first two columns of a 20 x 5 DoubleMatrix (that is, column 0 vs. column 1).

DoubleMatrix data = new DoubleMatrix( 20, 5, new RandGenUniform() );
DoubleVector x = data.Col( 0 );
DoubleVector y = data.Col( 1 );

Begin by creating a new Chart object, and optionally adding a Title.

Chart chart = new Chart()
{
  Size = new Size( 500, 500 ),
};
 
Title title = new Title()
{
  Name = chart.Titles.NextUniqueName(),
  Text = "My Data",
  Font = new Font( "Trebuchet MS", 12F, FontStyle.Bold ),
};
chart.Titles.Add( title );

Next, add a ChartArea.

ChartArea area = new ChartArea()
{
  Name = chart.ChartAreas.NextUniqueName(),
};
area.AxisX.Title = "Col 0";
area.AxisX.TitleFont = new Font( "Trebuchet MS", 10F, FontStyle.Bold );
area.AxisX.MajorGrid.LineColor = Color.LightGray;
area.AxisX.RoundAxisValues();
area.AxisY.Title = "Col 1";
area.AxisY.TitleFont = new Font( "Trebuchet MS", 10F, FontStyle.Bold );
area.AxisY.MajorGrid.LineColor = Color.LightGray;
area.AxisY.RoundAxisValues();
chart.ChartAreas.Add( area );

Finally, add a new data Series, and bind the datasource to the NMath x,y vectors.

Series series = new Series()
{
  Name = "Points",
  ChartType = SeriesChartType.Point,
  MarkerStyle = MarkerStyle.Circle,
  MarkerSize = 8,
};
series.Points.DataBindXY( x, y );
chart.Series.Add( series );

To display the chart, we can use a utility function like this, which shows the given chart in a default form running in a new thread.

public static void Show( Chart chart )
{
  Form form = new Form();
  form.Size = new Size( chart.Size.Width + 20, chart.Size.Height + 40 );
  form.Controls.Add( chart );
  Thread t = new Thread( () =&gt; Application.Run( form ) );
   t.Start();
}

After calling

Show( chart );

the result looks like this:

Microsoft Chart Controls for .NET

Of course, you can easily plot more complicated data as well. For instance, suppose we fit a 4th degree polynomial to this (random) data:

int degree = 4;
PolynomialLeastSquares pls = new PolynomialLeastSquares( degree, x, y );

To add the fitted polynomial to the cart, just add a new data Series interpolating over the range of x values.

Series series2 = new Series()
{
  Name = "Polynomial",
  ChartType = SeriesChartType.Line,
  MarkerStyle = MarkerStyle.None,
};
int numInterpolatedValues = 100;
double xmin = NMathFunctions.MinValue( x );
double xmax = NMathFunctions.MaxValue( x );
double step = ( xmax - xmin ) / ( numInterpolatedValues - 1 );
DoubleVector xi = new DoubleVector( numInterpolatedValues, xmin, step );
series2.Points.DataBindXY( xi, pls.FittedPolynomial.Evaluate( xi ) );
chart.Series.Add( series2 );

Let’s also add a subtitle with the fitted function, and a Legend.

Title subtitle = new Title()
{
  Name = chart.Titles.NextUniqueName(),
  Text = "f(x) = " + pls.FittedPolynomial.ToString( "N2" ),
};
chart.Titles.Add( subtitle );
 
Legend legend = new Legend()
{
  Name = chart.Legends.NextUniqueName(),
  DockedToChartArea = chart.ChartAreas[0].Name,
};
chart.Legends.Add( legend );

Now the chart looks like this:

If you add a Chart control to your application using the visual designer (by dragging a Chart control from the Data category in the Toolbox), this generates code to create a default Chart, with a single ChartArea, Series, and Legend. You can then edit the properties in the Properties tab, and bind the generated Series to an NMath datasource. For example:

public Form1()
{
  InitializeComponent();
 
  DoubleMatrix data = new DoubleMatrix( 20, 5, new RandGenUniform() );
  this.chart1.Series[0].Points.DataBindXY( data.Col(0), data.Col(1) );
}

The Microsoft Chart Controls for .NET provide an easy (and free) solution for visualizing NMath numerical types.

Ken

Share