NMath User's Guide

TOC | Previous | Next | Index

38.2 Missing Values (.NET, C#, CSharp, VB, Visual Basic, F#)

Most functions in class StatsFunctions are accompanied by a paired function that ignores missing values, such as Double.NaN in a DoubleVector, DFNumericColumn, or array of doubles. For example, there are Mean() and NaNMean() functions, Variance() and NaNVariance() functions, and so forth. Unless a function is explicitly designed to handle missing values, it may return NaN or have unexpected results if values are missing.

Code Example – C#

var v = new DoubleVector( "[ 3.2 1.0 Double.NaN 4.5 -1.2 ]"); 



double mean1 = StatsFunctions.Mean( v );
// mean1 = Double.NaN



double mean2 = StatsFunctions.NaNMean( v );
// mean2 = 1.875

Code Example – VB

Dim V As New DoubleVector("[ 3.2 1.0 Double.NaN 4.5 -1.2 ]")



Dim Mean1 As Double = StatsFunctions.Mean(V)
'' mean1 = Double.NaN



Dim Mean2 As Double = StatsFunctions.NaNMean(V)
'' mean2 = 1.875

The provided convenience method NaNCheck() returns true if a given data set contains any missing values. NaNRemove() creates a copy of a data set with missing values removed. For two-dimensional data sets, such as matrices and data frames, NaNRemoveCols() creates a copy with only those columns that do not contain missing values. NaNRemoveRows() removes any rows containing missing data. The CleanCols() and CleanRows() methods on class DataFrame have the same effect.

As described in Section 37.1, data frame column types enable you to specify how missing values are represented within a particular column instance, or for all columns of a particular type. For example, this column stores numeric data in a string column, and uses NA to indicate a missing value:

Code Example – C#

var col =
  new DFStringColumn( "myCol", "32.1", "NA", "6.0", "34" );

Code Example – VB

Dim Col As New DFStringColumn("myCol", "32.1", "NA", "6.0", "34")

This code identifies the missing value string, then computes the mean, ignoring missing values:

Code Example – C#

col.MissingValue = "NA";
double mean = StatsFunctions.NaNMean( col );

Code Example – VB

Col.MissingValue = "NA"
Dim Mean As Double = StatsFunctions.NaNMean(Col)

Because the column is not an instance of DFIntColumn or DFNumericColumn, an attempt is made to convert the data to double using System.Convert.ToDouble() (Section 38.1). If StatsFunctions.Mean() was used, instead of StatsFunctions.NaNMean(), or if col.MissingValue was set to something other than NA (for example, the default value is "."), an exception would be thrown.


Top

Top