Blog

Archive for the ‘NMath’ Category

The Importance of Graphing Your Data

Tuesday, December 13th, 2011

In his classic book The Visual Display of Quantitative Information, Edward R. Tufte argued that “graphics can be more precise and revealing than conventional statistical computations”. As an example, he described Anscombe’s Quartet–four datasets that have identical simple statistical properties, yet appear very different when graphed.
Anscombe's Quartet
These data sets–each consisting of 11 x,y points–were constructed by statistician Francis Anscombe in 1973.

As previously described, NMath 5.1 and NMath Stats 3.4 include classes for plotting NMath types using the Microsoft Chart Controls for .NET. (Free adapter code is also available for using NMath with Syncfusion Essential Chart.) Let’s use Anscombe’s data to explore how NMath’s new visualization capabilities can be used to reveal the differences in the data sets.

First, we’ll load the data into a DoubleMatrix.

DoubleMatrix A = new DoubleMatrix( @"11x8 [
  10.0 8.04  10.0 9.14 10.0 7.46  8.0  6.58
  8.0  6.95  8.0  8.14 8.0  6.77  8.0  5.76
  13.0 7.58  13.0 8.74 13.0 12.74 8.0  7.71
  9.0  8.81  9.0  8.77 9.0  7.11  8.0  8.84
  11.0 8.33  11.0 9.26 11.0 7.81  8.0  8.47
  14.0 9.96  14.0 8.10 14.0 8.84  8.0  7.04
  6.0  7.24  6.0  6.13 6.0  6.08  8.0  5.25
  4.0  4.26  4.0  3.10 4.0  5.39  19.0 12.50
  12.0 10.84 12.0 9.13 12.0 8.15  8.0  5.56
  7.0  4.82  7.0  7.26 7.0  6.42  8.0  7.91
  5.0  5.68  5.0  4.74 5.0  5.73  8.0  6.89 ]" );

Now let’s perform some simple descriptive statistics.

 int groups = 4;
 Slice rows = Slice.All;
 Slice xCols = new Slice( 0, groups, 2 );
 Slice yCols = new Slice( 1, groups, 2 );
 double unbiased = (double)A.Rows / ( A.Rows - 1 );
 
 Console.WriteLine( "Mean of x: {0}",
   NMathFunctions.Mean( A[ rows, xCols ] ) );
 Console.WriteLine( "Variance of x: {0}",
   NMathFunctions.Variance( A[rows, xCols] ) * unbiased );
 Console.WriteLine( "Mean of y: {0}",
   NMathFunctions.Round( NMathFunctions.Mean( A[rows, yCols] ), 2 ) );
 Console.WriteLine( "Variance of y: {0}",
   NMathFunctions.Round(
     NMathFunctions.Variance( A[rows, yCols] ) * unbiased, 3 ) );
 
 Console.Write( "Correlation of x-y: " );
 for (int i = 0; i < A.Cols; i += 2 )
 {
   Console.Write( NMathFunctions.Round(
    StatsFunctions.Correlation( A.Col(i), A.Col(i + 1) ), 3 ) + " " );
 }
 Console.WriteLine();

You can see from the output that the statistics are nearly identical for all four data sets:

Mean of x: [ 9 9 9 9 ]
Variance of x: [ 11 11 11 11 ]
Mean of y: [ 7.5 7.5 7.5 7.5 ]
Variance of y: [ 4.127 4.128 4.123 4.123 ]
Correlation of x-y: 0.816 0.816 0.816 0.817

Now let’s fit a linear model to each data set.

 LinearRegression[] lrs = new LinearRegression[groups];
 
 for( int i = 0; i < groups; i ++ )
 {
   Console.WriteLine( "Group {0}", i + 1 );
 
   bool addIntercept = true;
   lrs[i] = new LinearRegression( new DoubleMatrix( A.Col( 2 * i ) ),
     A.Col( 2 * i + 1 ), addIntercept );
   Console.WriteLine( "equation of regression line: Y = {0} + {1}X",
     Math.Round( lrs[i].Parameters[0], 2 ),
     Math.Round( lrs[i].Parameters[1], 3 ) );
 
   LinearRegressionParameter param =
     new LinearRegressionParameter( lrs[i], 1 );
   Console.WriteLine( "standard error of estimate of slope: {0}",
     Math.Round( param.StandardError, 3 ) );
   Console.WriteLine( "t-statistic: {0}",
     Math.Round( param.TStatistic( 0 ), 2 ) );
 
   LinearRegressionAnova anova = new LinearRegressionAnova( lrs[i] );
   Console.WriteLine( "regression sum of squares: {0}",
     Math.Round( anova.RegressionSumOfSquares, 2 ) );
   Console.WriteLine( "residual Sum of squares: {0}",
     Math.Round( anova.ResidualSumOfSquares, 2 ) );
   Console.WriteLine( "r2: {0}", Math.Round( anova.RSquared, 2 ) );
 
   Console.WriteLine();
 }

Again, the output is nearly identical for each data set:

Group 1
equation of regression line: Y = 3 + 0.5X
standard error of estimate of slope: 0.118
t-statistic: 4.24
regression sum of squares: 27.51
residual Sum of squares: 13.76
r2: 0.67

Group 2
equation of regression line: Y = 3 + 0.5X
standard error of estimate of slope: 0.118
t-statistic: 4.24
regression sum of squares: 27.5
residual Sum of squares: 13.78
r2: 0.67

Group 3
equation of regression line: Y = 3 + 0.5X
standard error of estimate of slope: 0.118
t-statistic: 4.24
regression sum of squares: 27.47
residual Sum of squares: 13.76
r2: 0.67

Group 4
equation of regression line: Y = 3 + 0.5X
standard error of estimate of slope: 0.118
t-statistic: 4.24
regression sum of squares: 27.49
residual Sum of squares: 13.74
r2: 0.67

Finally, let’s use the new NMath charting functionality to plot each linear regression fit. Note that we make use of the Compose() method to combine multiple charts into a single composite Chart control.

 List<Chart> charts = new List<Chart>();
 for( int i = 0; i < lrs.Length; i++ )
 {
   charts.Add( NMathStatsChart.ToChart( lrs[i], 0 ) );
 }
 Chart all = NMathStatsChart.Compose( charts, 2, 2,
   NMathChart.AreaLayoutOrder.RowMajor );
 for( int i = 0; i < groups; i++ )
 {
   all.ChartAreas[i].AxisX.Title = "x" + ( i + 1 );
   all.ChartAreas[i].AxisX.Minimum = 2;
   all.ChartAreas[i].AxisX.Maximum = 22;
   all.ChartAreas[i].AxisX.Interval = 4;
 
   all.ChartAreas[i].AxisY.Title = "y" + ( i + 1 );
   all.ChartAreas[i].AxisY.Minimum = 2;
   all.ChartAreas[i].AxisY.Maximum = 14;
   all.ChartAreas[i].AxisY.Interval = 4;
 
   all.Series[2 * i].Color = Color.DarkOrange;
   all.Series[2 * i + 1].Color = Color.SteelBlue;
 }
 NMathStatsChart.Show( all );

Anscombe's QuartetThe charts reveal  dramatic differences between the data sets, despite the identical fitted models. Group 1 shows a linear relationship, while in Group 2 the releationship is clearly non-linear.  Groups 3 and 4 demonstrate how a single outlier can have a large effect on simple statistics.

Ken

References

Anscombe, F. J. (1973). “Graphs in Statistical Analysis”. American Statistician 27 (1): 17–21. JSTOR 2682899.
Tufte, Edward R. (2001). The Visual Display of Quantitative Information (2nd ed.). Cheshire, CT: Graphics Press

Share

Clearing a vector

Wednesday, November 9th, 2011

A customer recently asked us for the best method to zero out a vector. We decided to run some tests to find out. Here are the five methods we tried, with any drawbacks and the timings.

These were performed on a DoubleVector, v,  of length 10,000,000.

1) Create a new vector. This isn’t really clearing out an existing vector but we thought we should include it for completeness.

 DoubleVector v2 = new DoubleVector( v.Length, 0.0 );

The big drawback here is that you’re creating new memory. Time: 419.5ms

2) Probably the first thing to come to mind is to simply iterate through the vector and set everything to zero.

for ( int i = 0; i < v.Length; i++ )
{
  v[i] = 0.0;
}

We have to do some checking in the index operator. No new memory created. Time: 578.5ms

3) In some cases, you could iterate through the underlying array of data inside the DoubleVector.

for ( int i = 0; i < v.DataBlock.Data.Length; i++ )
{
  v.DataBlock.Data[i] = 0.0;
}

This is a little less intuitive. And, very importantly, it will not work with many views into other data structures. For example, a row slice of a matrix. However, it’s easier for the CLR to optimize this loop. Time: 173.5ms

4) We can use the power of Intel’s MKL to multiply the vector by zero.

 v.Scale( 0.0 );

Scale() does this in-place. No new memory is created. In this example, we assume that MKL has already been loaded and is ready to go which is true if another MKL-based NMath call was already made or if NMath was initialized. This method will work on all views of other data structures. Time: 170ms

5) This surprised us a bit but the best method we could find was to clear out the underlying array using Array.Clear() in .NET

 Array.Clear( v.DataBlock.Data, 0, v.DataBlock.Data.Length );

This creates no new memory. However, this will not work with non-contiguous views. However, this method is very fast. Time: 85.8ms

To make this much simpler, we have created a Clear() method and a Clear( Slice) method on vectors and matrices. It will do the right thing in the right circumstance. It will be released in NMath 5.2 in 2012.

And, here is the code we used for this test:

using System;
using CenterSpace.NMath.Core;

namespace Test
{
  class ClearVector
  {
    static int size = 100000000;
    static int runs = 10;
    static int methods = 5;

    static void Main( string[] args )
    {
      System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
      DoubleMatrix times = new DoubleMatrix( runs, methods );
      NMathKernel.Init();

      for ( int run = 0; run < runs; run++ )
      {
        Console.WriteLine( "Run {0}...", run );
        DoubleVector v = null;

        // Create a new one
        v = new DoubleVector( size, 1.0, 2.0 );
        sw.Start();
        DoubleVector v2 = new DoubleVector( v.Length, 0.0 );
        sw.Stop();
        times[run, 0] = sw.ElapsedMilliseconds;
        Console.WriteLine( Assert( v2 ) );

        // iterate through vector
        v = new DoubleVector( size, 1.0, 2.0 );
        sw.Reset();
        sw.Start();
        for ( int i = 0; i < v.Length; i++ )
        {
          v[i] = 0.0;
        }
        sw.Stop();
        times[run, 1] = sw.ElapsedMilliseconds;
        Console.WriteLine( Assert( v ) );

        // iterate through array
        v = new DoubleVector( size, 1.0, 2.0 );
        sw.Reset();
        sw.Start();
        for ( int i = 0; i < v.DataBlock.Data.Length; i++ )
        {
          v.DataBlock.Data[i] = 0.0;
        }
        sw.Stop();
        times[run, 2] = sw.ElapsedMilliseconds;
        Console.WriteLine( Assert( v ) );

        // scale
        v = new DoubleVector( size, 1.0, 2.0 );
        sw.Reset();
        sw.Start();
        v.Scale( 0.0 );
        sw.Stop();
        times[run, 3] = sw.ElapsedMilliseconds;
        Console.WriteLine( Assert( v ) );

        // Array Clear
        v = new DoubleVector( size, 1.0, 2.0 );
        sw.Reset();
        sw.Start();
        Array.Clear( v.DataBlock.Data, 0, v.DataBlock.Data.Length );
        sw.Stop();
        times[run, 4] = sw.ElapsedMilliseconds;
        Console.WriteLine( Assert( v ) );
        Console.WriteLine( times.Row( run ) );
      }
      Console.WriteLine( "Means: " + NMathFunctions.Mean( times ) );
    }

    private static bool Assert( DoubleVector v )
    {
      if ( v.Length != size )
      {
        return false;
      }
      for ( int i = 0; i < v.Length; ++i )
      {
        if ( v[i] != 0.0 )
        {
          return false;
        }
      }
      return true;
    }
  }
}

- Trevor

Share

Initializing NMath

Wednesday, November 9th, 2011

NMath uses Intel’s Math Kernel Library (MKL) internally. This code contains native, optimized code to wring out the best performance possible.

There is a one-time delay when the appropriate x86 or x64 native code is loaded. This cost can be easily controlled by the developer by using the NMathKernel.Init() method. Please see Initializing NMath for more details.

- Trevor

Share

CEM Solver at UT Austin

Wednesday, November 2nd, 2011

The folks at the Center for Electromechanics at University of Texas at Austin (UT-CEM) are doing some neat simulation projects with power systems, and we were honored to learn that NMath is at the core of their CEM Solver software. CEM Solver demonstrates substantial performance improvements over SimPowerSystems, reducing simulation time for a typical example from 23 minutes down to 30 seconds, for a 46x improvement!

Dr. Fabian M. Uriarte, Research Associate at UT-CEM, recently had this to say about NMath:

“If performance, documentation, and reliable technical support are important to you, then NMath is what you need. In our application, we parallelize the solution of large, sparse Ax=b systems. Using NMath, we saw significant performance gains over Math.NET. Additionally, NMath has a very large set of classes that suit our needs in other numerical aspects as well. We are extremely pleased with the performance of NMath and highly recommend it to others. This library is a ‘must have.’"

Thanks for the kind words, Fabian!  You can check out the CEM Solver project at http://www.utexas.edu/research/cem/projects/CEM_solver.html, or check out the demonstration video below.

Share

.NET Math with Microsoft Chart Controls, Revisited

Wednesday, October 5th, 2011

Easy Charting

In an earlier post, we described how NMath types can be plotted using the Microsoft Chart Controls for .NET, but programmatically constructing Chart objects with data series bound to NMath objects. The latest release of the NMath libraries makes this process much simpler by providing adapter classes which easily construct basic charts for you from NMath types:

  • NMath 5.1 includes assembly NMathChartMicrosoft.dll containing class NMathChart, which provides convenience methods for plotting NMath types using the Microsoft Chart Controls
  • NMath Stats 3.4 includes assembly NMathStatsChartMicrosoft.dll containing NMathStatsChart which provides convenience methods for plotting NMath Stats types using the Microsoft Chart Controls

The generated charts can then be customized to meet your needs.

Using the Chart Adapters

The Microsoft Chart Controls for .NET are available as a separate download for .NET 3.5. Beginning in .NET 4.0, the Chart controls are part of the .NET Framework. To use the Chart controls, add a reference to System.Windows.Forms.DataVisualization.
To use the adapters, add a reference to NMathChartMicrosoft.dll and/or NMathStatsChartMicrosoft.dll, and using statement:

using CenterSpace.NMath.Charting.Microsoft;

A previous blog article provides a simple app.config solution for deploying your chart dependent applications to target both .NET 3.5 or .NET 4.0. Without this app.config modification, your application will only target .NET 3.5.

The Adapter API

Both NMathChart and NMathStatsChart classes share a common API. Overloads of the ToChart() function are provided for our common math and stats types. ToChart() returns an instance of System.Windows.Forms.DataVisualization.Charting.Chart, which can be customized as desired. For example

Polynomial poly = new Polynomial( new DoubleVector( 4, 2, 5, -2, 3 ) );
Chart chart = NMathChart.ToChart( poly, -1, 1 );
chart.Titles.Add("Hello World");

For prototyping and debugging console applications, the Show() function shows a given chart in a default form.

NMathChart.Show( chart );

Note that when the window is closed, the chart is disposed.

If you do not need to customize the chart, overloads of Show() are also provided for common NMath types.

NMathChart.Show( poly );

This is equivalent to calling:

NMathChart.Show( NMathChart.ToChart( poly ) );

The Save() function saves a chart to a file or stream.

NMathChart.Save( chart, "chart.png", ChartImageFormat.Png );

If you are developing a Windows Forms application using the Designer, add a Microsoft Chart control to your form, then update it with an NMath object using the appropriate Update() function after initialization.

public Form1()
{
  InitializeComponent();
 
  Polynomial poly =
    new Polynomial( new DoubleVector( 4, 2, 5, -2, 3 ) );
  NMathChart.Update( ref this.chart1, poly, -1, 1 );
}

This has the following effect on your existing Chart object:

  • a new, default ChartArea is added if one does not exist, otherwise chart.ChartAreas[0] is used
  • axis titles, and DefaultAxisTitleFont and DefaultMajorGridLineColor, only have an effect if a new ChartArea is added
  • titles are added only if the given Chart does not already contain any titles
  • chart.Series[0] is replaced, or added if necessary

Charting Examples

Here are a few examples of usage in C# (Of course, as with all NMath functionality, these examples will also work, given the necessary syntactic changes, from VB.NET or F#, or any other .NET language.) :

double xmin = -1;
double xmax = 1;
int numInterpolatedValues = 100;
Polynomial poly = new Polynomial( new DoubleVector( 4, 2, 5, -2, 3 ) );
NMathChart.Show( poly, xmin, xmax, numInterpolatedValues );

DoubleVector x = new DoubleVector( 100, 0, 0.1 );
DoubleVector cos = x.Apply( NMathFunctions.CosFunction );
DoubleVector sin = x.Apply( NMathFunctions.SinFunction );
 
DoubleVector[] data = new DoubleVector[] { cos, sin };
NMathChart.Unit unit = new NMathChart.Unit( 0, 0.1, "x" );
Chart chart = NMathChart.ToChart( data, unit );
 
chart.Series[0].Name = "cos(x)";
chart.Series[1].Name = "sin(x)";
NMathChart.Show( chart );

double lambda = 4.0;
PoissonDistribution poisson = new PoissonDistribution( lambda );
 
NMathStatsChart.Show( poisson,
  NMathStatsChart.DistributionFunction.PDF );

DoubleMatrix predictors =
  new DoubleMatrix( 100, 3, new RandGenUniform() ); ;
DoubleVector observations = 2 * predictors.Col(0) +
  -0.75 * predictors.Col(1) + 1.25 * predictors.Col(2) + 0.5;
bool addIntercept = true;
 
LinearRegression lr =
  new LinearRegression( predictors, observations, addIntercept );
NMathStatsChart.Show( lr, 2 );

More Information

For more information, and many more examples, see:

Share