Blog

Archive for the ‘NMath Stats’ Category

The Importance of Graphing Your Data

Tuesday, December 13th, 2011

In his classic book The Visual Display of Quantitative Information, Edward R. Tufte argued that “graphics can be more precise and revealing than conventional statistical computations”. As an example, he described Anscombe’s Quartet–four datasets that have identical simple statistical properties, yet appear very different when graphed.
Anscombe's Quartet
These data sets–each consisting of 11 x,y points–were constructed by statistician Francis Anscombe in 1973.

As previously described, NMath 5.1 and NMath Stats 3.4 include classes for plotting NMath types using the Microsoft Chart Controls for .NET. (Free adapter code is also available for using NMath with Syncfusion Essential Chart.) Let’s use Anscombe’s data to explore how NMath’s new visualization capabilities can be used to reveal the differences in the data sets.

First, we’ll load the data into a DoubleMatrix.

DoubleMatrix A = new DoubleMatrix( @"11x8 [
  10.0 8.04  10.0 9.14 10.0 7.46  8.0  6.58
  8.0  6.95  8.0  8.14 8.0  6.77  8.0  5.76
  13.0 7.58  13.0 8.74 13.0 12.74 8.0  7.71
  9.0  8.81  9.0  8.77 9.0  7.11  8.0  8.84
  11.0 8.33  11.0 9.26 11.0 7.81  8.0  8.47
  14.0 9.96  14.0 8.10 14.0 8.84  8.0  7.04
  6.0  7.24  6.0  6.13 6.0  6.08  8.0  5.25
  4.0  4.26  4.0  3.10 4.0  5.39  19.0 12.50
  12.0 10.84 12.0 9.13 12.0 8.15  8.0  5.56
  7.0  4.82  7.0  7.26 7.0  6.42  8.0  7.91
  5.0  5.68  5.0  4.74 5.0  5.73  8.0  6.89 ]" );

Now let’s perform some simple descriptive statistics.

 int groups = 4;
 Slice rows = Slice.All;
 Slice xCols = new Slice( 0, groups, 2 );
 Slice yCols = new Slice( 1, groups, 2 );
 double unbiased = (double)A.Rows / ( A.Rows - 1 );
 
 Console.WriteLine( "Mean of x: {0}",
   NMathFunctions.Mean( A[ rows, xCols ] ) );
 Console.WriteLine( "Variance of x: {0}",
   NMathFunctions.Variance( A[rows, xCols] ) * unbiased );
 Console.WriteLine( "Mean of y: {0}",
   NMathFunctions.Round( NMathFunctions.Mean( A[rows, yCols] ), 2 ) );
 Console.WriteLine( "Variance of y: {0}",
   NMathFunctions.Round(
     NMathFunctions.Variance( A[rows, yCols] ) * unbiased, 3 ) );
 
 Console.Write( "Correlation of x-y: " );
 for (int i = 0; i < A.Cols; i += 2 )
 {
   Console.Write( NMathFunctions.Round(
    StatsFunctions.Correlation( A.Col(i), A.Col(i + 1) ), 3 ) + " " );
 }
 Console.WriteLine();

You can see from the output that the statistics are nearly identical for all four data sets:

Mean of x: [ 9 9 9 9 ]
Variance of x: [ 11 11 11 11 ]
Mean of y: [ 7.5 7.5 7.5 7.5 ]
Variance of y: [ 4.127 4.128 4.123 4.123 ]
Correlation of x-y: 0.816 0.816 0.816 0.817

Now let’s fit a linear model to each data set.

 LinearRegression[] lrs = new LinearRegression[groups];
 
 for( int i = 0; i < groups; i ++ )
 {
   Console.WriteLine( "Group {0}", i + 1 );
 
   bool addIntercept = true;
   lrs[i] = new LinearRegression( new DoubleMatrix( A.Col( 2 * i ) ),
     A.Col( 2 * i + 1 ), addIntercept );
   Console.WriteLine( "equation of regression line: Y = {0} + {1}X",
     Math.Round( lrs[i].Parameters[0], 2 ),
     Math.Round( lrs[i].Parameters[1], 3 ) );
 
   LinearRegressionParameter param =
     new LinearRegressionParameter( lrs[i], 1 );
   Console.WriteLine( "standard error of estimate of slope: {0}",
     Math.Round( param.StandardError, 3 ) );
   Console.WriteLine( "t-statistic: {0}",
     Math.Round( param.TStatistic( 0 ), 2 ) );
 
   LinearRegressionAnova anova = new LinearRegressionAnova( lrs[i] );
   Console.WriteLine( "regression sum of squares: {0}",
     Math.Round( anova.RegressionSumOfSquares, 2 ) );
   Console.WriteLine( "residual Sum of squares: {0}",
     Math.Round( anova.ResidualSumOfSquares, 2 ) );
   Console.WriteLine( "r2: {0}", Math.Round( anova.RSquared, 2 ) );
 
   Console.WriteLine();
 }

Again, the output is nearly identical for each data set:

Group 1
equation of regression line: Y = 3 + 0.5X
standard error of estimate of slope: 0.118
t-statistic: 4.24
regression sum of squares: 27.51
residual Sum of squares: 13.76
r2: 0.67

Group 2
equation of regression line: Y = 3 + 0.5X
standard error of estimate of slope: 0.118
t-statistic: 4.24
regression sum of squares: 27.5
residual Sum of squares: 13.78
r2: 0.67

Group 3
equation of regression line: Y = 3 + 0.5X
standard error of estimate of slope: 0.118
t-statistic: 4.24
regression sum of squares: 27.47
residual Sum of squares: 13.76
r2: 0.67

Group 4
equation of regression line: Y = 3 + 0.5X
standard error of estimate of slope: 0.118
t-statistic: 4.24
regression sum of squares: 27.49
residual Sum of squares: 13.74
r2: 0.67

Finally, let’s use the new NMath charting functionality to plot each linear regression fit. Note that we make use of the Compose() method to combine multiple charts into a single composite Chart control.

 List<Chart> charts = new List<Chart>();
 for( int i = 0; i < lrs.Length; i++ )
 {
   charts.Add( NMathStatsChart.ToChart( lrs[i], 0 ) );
 }
 Chart all = NMathStatsChart.Compose( charts, 2, 2,
   NMathChart.AreaLayoutOrder.RowMajor );
 for( int i = 0; i < groups; i++ )
 {
   all.ChartAreas[i].AxisX.Title = "x" + ( i + 1 );
   all.ChartAreas[i].AxisX.Minimum = 2;
   all.ChartAreas[i].AxisX.Maximum = 22;
   all.ChartAreas[i].AxisX.Interval = 4;
 
   all.ChartAreas[i].AxisY.Title = "y" + ( i + 1 );
   all.ChartAreas[i].AxisY.Minimum = 2;
   all.ChartAreas[i].AxisY.Maximum = 14;
   all.ChartAreas[i].AxisY.Interval = 4;
 
   all.Series[2 * i].Color = Color.DarkOrange;
   all.Series[2 * i + 1].Color = Color.SteelBlue;
 }
 NMathStatsChart.Show( all );

Anscombe's QuartetThe charts reveal  dramatic differences between the data sets, despite the identical fitted models. Group 1 shows a linear relationship, while in Group 2 the releationship is clearly non-linear.  Groups 3 and 4 demonstrate how a single outlier can have a large effect on simple statistics.

Ken

References

Anscombe, F. J. (1973). “Graphs in Statistical Analysis”. American Statistician 27 (1): 17–21. JSTOR 2682899.
Tufte, Edward R. (2001). The Visual Display of Quantitative Information (2nd ed.). Cheshire, CT: Graphics Press

Share

.NET Math with Microsoft Chart Controls, Revisited

Wednesday, October 5th, 2011

Easy Charting

In an earlier post, we described how NMath types can be plotted using the Microsoft Chart Controls for .NET, but programmatically constructing Chart objects with data series bound to NMath objects. The latest release of the NMath libraries makes this process much simpler by providing adapter classes which easily construct basic charts for you from NMath types:

  • NMath 5.1 includes assembly NMathChartMicrosoft.dll containing class NMathChart, which provides convenience methods for plotting NMath types using the Microsoft Chart Controls
  • NMath Stats 3.4 includes assembly NMathStatsChartMicrosoft.dll containing NMathStatsChart which provides convenience methods for plotting NMath Stats types using the Microsoft Chart Controls

The generated charts can then be customized to meet your needs.

Using the Chart Adapters

The Microsoft Chart Controls for .NET are available as a separate download for .NET 3.5. Beginning in .NET 4.0, the Chart controls are part of the .NET Framework. To use the Chart controls, add a reference to System.Windows.Forms.DataVisualization.
To use the adapters, add a reference to NMathChartMicrosoft.dll and/or NMathStatsChartMicrosoft.dll, and using statement:

using CenterSpace.NMath.Charting.Microsoft;

A previous blog article provides a simple app.config solution for deploying your chart dependent applications to target both .NET 3.5 or .NET 4.0. Without this app.config modification, your application will only target .NET 3.5.

The Adapter API

Both NMathChart and NMathStatsChart classes share a common API. Overloads of the ToChart() function are provided for our common math and stats types. ToChart() returns an instance of System.Windows.Forms.DataVisualization.Charting.Chart, which can be customized as desired. For example

Polynomial poly = new Polynomial( new DoubleVector( 4, 2, 5, -2, 3 ) );
Chart chart = NMathChart.ToChart( poly, -1, 1 );
chart.Titles.Add("Hello World");

For prototyping and debugging console applications, the Show() function shows a given chart in a default form.

NMathChart.Show( chart );

Note that when the window is closed, the chart is disposed.

If you do not need to customize the chart, overloads of Show() are also provided for common NMath types.

NMathChart.Show( poly );

This is equivalent to calling:

NMathChart.Show( NMathChart.ToChart( poly ) );

The Save() function saves a chart to a file or stream.

NMathChart.Save( chart, "chart.png", ChartImageFormat.Png );

If you are developing a Windows Forms application using the Designer, add a Microsoft Chart control to your form, then update it with an NMath object using the appropriate Update() function after initialization.

public Form1()
{
  InitializeComponent();
 
  Polynomial poly =
    new Polynomial( new DoubleVector( 4, 2, 5, -2, 3 ) );
  NMathChart.Update( ref this.chart1, poly, -1, 1 );
}

This has the following effect on your existing Chart object:

  • a new, default ChartArea is added if one does not exist, otherwise chart.ChartAreas[0] is used
  • axis titles, and DefaultAxisTitleFont and DefaultMajorGridLineColor, only have an effect if a new ChartArea is added
  • titles are added only if the given Chart does not already contain any titles
  • chart.Series[0] is replaced, or added if necessary

Charting Examples

Here are a few examples of usage in C# (Of course, as with all NMath functionality, these examples will also work, given the necessary syntactic changes, from VB.NET or F#, or any other .NET language.) :

double xmin = -1;
double xmax = 1;
int numInterpolatedValues = 100;
Polynomial poly = new Polynomial( new DoubleVector( 4, 2, 5, -2, 3 ) );
NMathChart.Show( poly, xmin, xmax, numInterpolatedValues );

DoubleVector x = new DoubleVector( 100, 0, 0.1 );
DoubleVector cos = x.Apply( NMathFunctions.CosFunction );
DoubleVector sin = x.Apply( NMathFunctions.SinFunction );
 
DoubleVector[] data = new DoubleVector[] { cos, sin };
NMathChart.Unit unit = new NMathChart.Unit( 0, 0.1, "x" );
Chart chart = NMathChart.ToChart( data, unit );
 
chart.Series[0].Name = "cos(x)";
chart.Series[1].Name = "sin(x)";
NMathChart.Show( chart );

double lambda = 4.0;
PoissonDistribution poisson = new PoissonDistribution( lambda );
 
NMathStatsChart.Show( poisson,
  NMathStatsChart.DistributionFunction.PDF );

DoubleMatrix predictors =
  new DoubleMatrix( 100, 3, new RandGenUniform() ); ;
DoubleVector observations = 2 * predictors.Col(0) +
  -0.75 * predictors.Col(1) + 1.25 * predictors.Col(2) + 0.5;
bool addIntercept = true;
 
LinearRegression lr =
  new LinearRegression( predictors, observations, addIntercept );
NMathStatsChart.Show( lr, 2 );

More Information

For more information, and many more examples, see:

Share

Using NMath Charts with .NET 4.0

Friday, September 30th, 2011

The NMath Chart DLLs (NMathChartMicrosoft.dll and NMathStatsChartMicrosoft.dll) are built against the .NET 3.5 version of the Microsoft Chart Controls for .NET (System.Windows.Forms.DataVisualization). The .NET runtime will not automatically replace this reference with the version of the Microsoft Chart DLL built into the .NET 4.0 Framework, unless you explicitly tell it to do so by adding the following lines to your app.config:

<configuration> ... <runtime> ... <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1"> ... <dependentAssembly> <assemblyIdentity name="System.Windows.Forms.DataVisualization" publicKeyToken="31bf3856ad364e35"/> <bindingRedirect oldVersion="3.5.0.0-3.5.0.0" newVersion="4.0.0.0"/> </dependentAssembly> ... </assemblyBinding> ... </runtime> ... </configuration>

In this way, your application with charts can be deployed to both .NET 3.5 and .NET 4.0. Please note that all other NMath assemblies (with non-visual functionality) can be used in .NET 3.5 and .NET 4.0 without this step.

See this tutorial for more information.

Share

New Versions of NMath Libraries Released

Tuesday, September 27th, 2011

CenterSpace is proud to announce the immediate availability of new versions of our .NET math libraries, NMath 5.1 and NMath Stats 3.4. This release adds many new features and performance enhancements.

Version 5.1 of NMath adds:

  • Assembly NMathChartMicrosoft.dll containing class NMathChart, which provides static methods for plotting NMath types using the Microsoft Chart Controls for .NET. [More]

A complete changelog is located here.

Version 3.4 of NMath Stats adds:

  • Assembly NMathStatsChartMicrosoft.dll containing class NMathStatsChart, which provides static methods for plotting NMath Stats types using the Microsoft Chart Controls for .NET. [More]

A complete changelog is located here.

The new release is build-compatible with the prior release. Upgrades are provided free of charge to customers with current annual maintenance contracts. Maintenance contracts are available through our webstore.

This release features simplified NMath and NMath Stats installers. Updated license keys will be issued at upgrade time. A license file must now be deployed with your NMath applications. [More]

Share

New Versions of NMath Libraries Released

Monday, February 7th, 2011

CenterSpace is proud to announce the immediate availability of new versions of our .NET math libraries, NMath 5.0 and NMath Stats 3.3. This release adds many new features and performance enhancements.

Version 5.0 of NMath adds:

Version 3.3 of NMath Stats adds a class for performing a Pearson’s chi-square hypothesis test.

The new release is build-compatible with the prior release. Upgrades are provided free of charge to customers with current annual maintenance contracts. Maintenance contracts are available through our webstore.

Share