NMath Stats User's Guide

TOC | Previous | Next | Index

3.2 Missing Values (.NET, C#, CSharp, VB, Visual Basic, F#)

Most functions in class StatsFunctions are accompanied by a paired function that ignores missing values, such as Double.NaN in a DoubleVector, DFNumericColumn, or array of doubles. For example, there are Mean() and NaNMean() functions, Variance() and NaNVariance() functions, and so forth. Unless a function is explicitly designed to handle missing values, it may return NaN or have unexpected results if values are missing.

Code Example – C#

DoubleVector v =
  new DoubleVector( "[ 3.2 1.0 Double.NaN 4.5 -1.2 ]"); 

double mean1 = StatsFunctions.Mean( v );
// mean1 = Double.NaN

double mean2 = StatsFunctions.NaNMean( v );
// mean2 = 1.875

The provided convenience method NaNCheck() returns true if a given data set contains any missing values. NaNRemove() creates a copy of a data set with missing values removed. For two-dimensional data sets, such as matrices and data frames, NaNRemoveCols() creates a copy with only those columns that do not contain missing values. NaNRemoveRows() removes any rows containing missing data. The CleanCols() and CleanRows() methods on class DataFrame have the same effect.

As described in Section 2.1, data frame column types enable you to specify how missing values are represented within a particular column instance, or for all columns of a particular type. For example, this column stores numeric data in a string column, and uses NA to indicate a missing value:

Code Example – C#

DFStringColumn col =
  new DFStringColumn( "myCol", "32.1", "NA", "6.0", "34" );

This code identifies the missing value string, then computes the mean, ignoring missing values:

Code Example – C#

col.MissingValue = "NA";
double mean = StatsFunctions.NaNMean( col );

Because the column is not an instance of DFIntColumn or DFNumericColumn, an attempt is made to convert the data to double using System.Convert.ToDouble() (Section 3.1). If StatsFunctions.Mean() was used, instead of StatsFunctions.NaNMean(), or if col.MissingValue was set to something other than NA (for example, the default value is "."), an exception would be thrown.