NMath Stats User's Guide

TOC | Previous | Next | Index

2.1 Column Types (.NET, C#, CSharp, VB, Visual Basic, F#)

A DataFrame may contain columns of different types­—the only constraint is that the columns must be of the same length. DFColumn, which implements the IDFColumn interface, is the abstract base class for data frame columns. NMath Stats provides the following derived classes for column types:

DFBoolColumn represents a column of logical data.

DFDateTimeColumn represents a column of temporal data.

DFGenericColumn represents a column of generic data.

DFIntColumn represents a column of integer data.

DFNumericColumn represents a column of double-precision floating point data.

DFStringColumn represents a column of string data.

Creating Columns

Empty columns are constructed by simply supplying a name for the column. For example:

Code Example – C#

var col = new DFDateTimeColumn( "myCol" );

The name of a column can be used to access the column in a data frame. Once a column instance is constructed, the name cannot be changed.

NOTE—Columns also provide a modifiable Label property for display purposes; see below.

Columns can also be initialized with an array of data at construction time:

Code Example – C#

var bArray =
  new bool[] { true, false, true, true, true, false, false };
var col = new DFBoolColumn( "myCol", bArray );

Constructors that take an array of data use the params keyword, so values may also be passed as parameters:

Code Example – C#

DFStringColumn col =
  new DFStringColumn( "myCol", "Jane", "Joe", "Mary", "Bill" );

Some column types provide additional options for initializing data at construction time. For instance, this code initializes a numeric column with data from a DoubleVector:

Code Example – C#

var v = new DoubleVector( 50, 0, .1 );
var col = new DFNumericColumn( "myCol", v );

This code initializes a generic column with data from an ICollection:

Code Example – C#

var list = new ArrayList( 3 );
list.Add( 3.14 );
list.Add( "Hello World" );
list.Add( DateTime.Now );
var col = new DFGenericColumn( "myCol", list );

Lastly, you can create a column from another column. For example, this code creates a DFIntColumn from a DFStringColumn:

Code Example – C#

var col = new DFStringColumn( "Col1", "1", "2", "3", "4" );
var col2 = new DFIntColumn( "Col2", col1 );

A NMathFormatException is raised if the data in the given column cannot be converted to the appropriate type.

Adding and Removing Data

Once a column is constructed you can add or remove data from it. The Add() method appends an element to the end of the column:

Code Example – C#

var col = new DFStringColumn( "Name" );
col.Add( "Joe Smith" );
col.Add( "Jane Doe" );
col.Add( "John Davis" );

The Insert() method inserts an element into a column at a given index. For instance, this code insert a new element at the top of the column:

Code Example – C#

col.Insert( 0, "Sally Jones" );

The RemoveAt() method removes the element at a given index:

Code Example – C#

col.RemoveAt( 3 );

Accessing Column Data

The data frame column classes provide standard indexing operators for getting and setting element values. Thus, col[i] always returns the ith element of the column:

Code Example – C#

DFStringColumn col =
  new DFStringColumn( "Names", "Jane", "Joe", "Mary", "Bill" );
col[0] = "Janet";

The GetEnumerator() method returns an enumerator for the column data:

Code Example – C#

IEnumerator enumerator = col.GetEnumerator();
while ( enumerator.MoveNext() )
{
  // Do something with enumerator.Current
}

Column Properties

Data frame column types provide the following properties:

ColumnType gets the type of the objects held by the column.

Count gets the number of ojects in the column.

IsNumeric returns true if a column is of type DFIntColumn or DFNumericColumn.

Label gets and sets the label in the header of the column.

MissingValue gets and sets the value used to represent missing values in the column (see below).

Name gets the name of the column.

NOTE—The Name of a column can only be set in a constructor. Once a column is con­structed, the name cannot be changed. For a modifiable label, see the Label property.

Reordering Column Data

You can use the Permute() method to arbitrarily reorder the elements in a column. This method accepts a permutation array of element indices and reorders the elements such that this[ permutation[i] ] is set to the ith object in the original column.

For example, this code moves the last two elements to the head of the column:

Code Example – C#

DFStringColumn col =
   new DFStringColumn( "myCol", "a", "b", "c", "d", "e" );
col.Permute( 2, 3, 4, 0, 1 );

Missing Values

All column types—except DFBoolColumn, which has only two valid values—support missing values. Most statistical functions in NMath Stats are accompanied by a paired function that ignores missing values (Section 3.2).

NOTE—To represent missing values in boolean data, use a DFIntColumn. For example, use 1 for true, 0 for false, and -1 for missing.

At construction time, the missing value for a column is defined using a static variable in class StatsSettings, as shown in Table 2.

Table 2 – Default missing values for data frame column types

Column Type

StatsSettings Variable

Default Value

DFDateTimeColumn

DateTimeMissingValue

DateTime.MinValue

DFGenericColumn

GenericMissingValue

null

DFIntColumn

IntegerMissingValue

int.MinValue

DFNumericColumn

NumericMissingValue

Double.NaN

DFStringColumn

StringMissingValue

"."

For instance, this code computes the mean of a column of integers, ignoring any missing values:

Code Example – C#

var col = new DFIntColumn( "myCol", 5, 2, -1, 1, 0, 7 );
double mean = StatsFunctions.NaNMean( col );

By default, a missing value in a DFIntColumn is represented using the default setting of StatsFunctions.IntegerMissingValue, which is int.MinValue. You can change the way a missing value is represented for a particular column instance using the MissingValue property:

Code Example – C#

col.MissingValue = -1;
double mean = StatsFunctions.NaNMean( col );

In this example, all values in col equal to -1 are ignored when computing the mean.

NOTE—For DFNumericColumn instances you can use the MissingValue property to indicate that missing values are represented by something other than the default value Double.NaN. However, Double.NaN will continue to be treated as missing, in addition to whatever value you set.

You can also change the default missing value for all columns of a particular type by setting the appropriate static variable in StatsSettings. Thus, this code sets the default missing value for integer columns to -1 for all subsequently constructed DFIntColumn instances:

Code Example – C#

StatsSettings.IntegerMissingValue = -1;

The Clean() method returns a new column with missing values removed.

Transforming Column Data

NMath Stats provides convenience methods for applying functions to elements of a column. Each of these methods takes a function delegate. The Apply() method returns a new column whose contents are the result of applying the given function to each element of the column. The Transform() method modifies a column object by applying the given function to each of its elements.

Suppose, for example, that you want to cap all numeric values in a DFNumericColumn at 100.0. You could write a simple function like this:

Code Example – C#

private static double Cap( double x )
{
  return x > 100.0 ? 100.0 : x;
}

Then encapsulate the function in a Func<double, double> delegate:

Code Example – C#

Func<double, double> capDelegate =
  new Func<double, double>( Cap );

This code caps all numeric values in column col:

Code Example – C#

col.Transform( capDelegate );

A common use of the Apply() functions is to create a new column whose values are a function of values in one or two existing column. For example, suppose you have FirstName and LastName string columns in data frame df, and want to create a new column containing customers' full names. You could write a simple function like this:

Code Example – C#

private static string Cat( string first, string last )
{
  return first + " " + last;
}

Then encapsulate the function in a Func<String, String, String> delegate:

Code Example – C#

Func<String, String, String> catDelegate =
  new Func<String, String, String>( Cat );

This code creates a new column containing the concatenated names:

Code Example – C#

DFStringColumn col =
  ( (DFStringColumn)data["FirstName"] ).Apply( "FullName",   
    catDelegate, (DFStringColumn)data["LastName"] );

Exporting Column Data

Data from a column can be exported in various ways:

ToArray() exports the contents of a column to a strongly-typed array.

ToDoubleArray() extracts the contents of a column to an array of doubles (numeric columns only).

ToDoubleVector() extracts the contents of a column to a DoubleVector (numeric columns only).

ToIntArray() extracts the contents of a column to an array of integers (integer columns only).

ToString() returns a formatted string representation of a column.

ToStringArray() exports the contents of a column to an array of strings.


Top

Top