37.1 Column Types (.NET, C#, CSharp, VB, Visual Basic, F#)
A DataFrame may contain columns of different types—the only constraint is that the columns must be of the same length. DFColumn, which implements the IDFColumn interface, is the abstract base class for data frame columns. NMath provides the following derived classes for column types:
● DFBoolColumn represents a column of logical data.
● DFDateTimeColumn represents a column of temporal data.
● DFGenericColumn represents a column of generic data.
● DFIntColumn represents a column of integer data.
● DFNumericColumn represents a column of double-precision floating point data.
● DFStringColumn represents a column of string data.
Empty columns are constructed by simply supplying a name for the column. For example:
Code Example – C#
var col = new DFDateTimeColumn( "myCol" );
Code Example – VB
Dim Col As New DFDateTimeColumn("myCol")
The name of a column can be used to access the column in a data frame. Once a column instance is constructed, the name cannot be changed.
NOTE—Columns also provide a modifiable Label property for display purposes; see below.
Columns can also be initialized with an array of data at construction time:
Code Example – C#
var bArray =
new bool[] { true, false, true, true, true, false, false };
var col = new DFBoolColumn( "myCol", bArray );
Code Example – VB
Dim BArray() As Boolean = {True, False, True, True, True, False, False}
Dim Col As New DFBoolColumn("myCol", BArray)
Constructors that take an array of data use the params keyword, so values may also be passed as parameters:
Code Example – C#
var col =
new DFStringColumn( "myCol", "Jane", "Joe", "Mary", "Bill" );
Code Example – VB
Dim Col As
New DFStringColumn("myCol", "Jane", "Joe", "Mary", "Bill")
Some column types provide additional options for initializing data at construction time. For instance, this code initializes a numeric column with data from a DoubleVector:
Code Example – C#
var v = new DoubleVector( 50, 0, .1 );
var col = new DFNumericColumn( "myCol", v );
Code Example – VB
Dim V As New DoubleVector(50, 0, 0.1)
Dim Col As New DFNumericColumn("myCol", V)
This code initializes a generic column with data from an ICollection:
Code Example – C#
var list = new ArrayList( 3 );
list.Add( 3.14 );
list.Add( "Hello World" );
list.Add( DateTime.Now );
var col = new DFGenericColumn( "myCol", list );
Code Example – VB
Dim List As New ArrayList(3)
List.Add(3.14)
List.Add("Hello World")
List.Add(DateTime.Now)
Dim Col As New DFGenericColumn("myCol", List)
Lastly, you can create a column from another column. For example, this code creates a DFIntColumn from a DFStringColumn:
Code Example – C#
var col = new DFStringColumn( "Col1", "1", "2", "3", "4" );
var col2 = new DFIntColumn( "Col2", col1 );
Code Example – VB
Dim Col As New DFStringColumn("Col1", "1", "2", "3", "4")
Dim Col2 As New DFIntColumn("Col2", Col1)
A NMathFormatException is raised if the data in the given column cannot be converted to the appropriate type.
Once a column is constructed you can add or remove data from it. The Add() method appends an element to the end of the column:
Code Example – C#
var col = new DFStringColumn( "Name" );
col.Add( "Joe Smith" );
col.Add( "Jane Doe" );
col.Add( "John Davis" );
Code Example – VB
Dim Col As New DFStringColumn("Name")
Col.Add("Joe Smith")
Col.Add("Jane Doe")
Col.Add("John Davis")
The Insert() method inserts an element into a column at a given index. For instance, this code insert a new element at the top of the column:
Code Example – C#
col.Insert( 0, "Sally Jones" );
Code Example – VB
Col.Insert(0, "Sally Jones")
The RemoveAt() method removes the element at a given index:
Code Example – C#
col.RemoveAt( 3 );
Code Example – VB
Col.RemoveAt(3)
The data frame column classes provide standard indexing operators for getting and setting element values. Thus, col[i] always returns the ith element of the column:
Code Example – C#
var col =
new DFStringColumn( "Names", "Jane", "Joe", "Mary", "Bill" );
col[0] = "Janet";
Code Example – VB
Dim Col As
New DFStringColumn("Names", "Jane", "Joe", "Mary", "Bill")
Col(0) = "Janet"
The GetEnumerator() method returns an enumerator for the column data:
Code Example – C#
IEnumerator enumerator = col.GetEnumerator();
while ( enumerator.MoveNext() )
{
// Do something with enumerator.Current
}
Code Example – VB
Dim Enumerator As IEnumerator = Col.GetEnumerator()
While (Enumerator.MoveNext())
'' Do something with enumerator.Current
End While
Data frame column types provide the following properties:
● ColumnType gets the type of the objects held by the column.
● Count gets the number of ojects in the column.
● IsNumeric returns true if a column is of type DFIntColumn or DFNumericColumn.
● Label gets and sets the label in the header of the column.
● MissingValue gets and sets the value used to represent missing values in the column (see below).
● Name gets the name of the column.
NOTE—The Name of a column can only be set in a constructor. Once a column is constructed, the name cannot be changed. For a modifiable label, see the Label property.
You can use the Permute() method to arbitrarily reorder the elements in a column. This method accepts a permutation array of element indices and reorders the elements such that this[ permutation[i] ] is set to the ith object in the original column.
For example, this code moves the last two elements to the head of the column:
Code Example – C#
var col =
new DFStringColumn( "myCol", "a", "b", "c", "d", "e" );
col.Permute( 2, 3, 4, 0, 1 );
Code Example – VB
Dim Col As New DFStringColumn("myCol", "a", "b", "c", "d", "e")
Col.Permute(2, 3, 4, 0, 1)
All column types—except DFBoolColumn, which has only two valid values—support missing values. Most statistical functions in NMath are accompanied by a paired function that ignores missing values (Section 38.2).
NOTE—To represent missing values in boolean data, use a DFIntColumn. For example, use 1 for true, 0 for false, and -1 for missing.
At construction time, the missing value for a column is defined using a static variable in class StatsSettings, as shown in Table 24.
Column Type |
StatsSettings Variable |
Default Value |
DateTimeMissingValue |
DateTime.MinValue |
|
GenericMissingValue |
null |
|
IntegerMissingValue |
int.MinValue |
|
NumericMissingValue |
Double.NaN |
|
StringMissingValue |
"." |
For instance, this code computes the mean of a column of integers, ignoring any missing values:
Code Example – C#
var col = new DFIntColumn( "myCol", 5, 2, -1, 1, 0, 7 );
double mean = StatsFunctions.NaNMean( col );
Code Example – VB
Dim Col As New DFIntColumn("myCol", 5, 2, -1, 1, 0, 7)
Dim Mean As Double = StatsFunctions.NaNMean(Col)
By default, a missing value in a DFIntColumn is represented using the default setting of StatsFunctions.IntegerMissingValue, which is int.MinValue. You can change the way a missing value is represented for a particular column instance using the MissingValue property:
Code Example – C#
col.MissingValue = -1;
double mean = StatsFunctions.NaNMean( col );
Code Example – VB
Col.MissingValue = -1
Dim Mean As Double = StatsFunctions.NaNMean(Col)
In this example, all values in col equal to -1 are ignored when computing the mean.
NOTE—For DFNumericColumn instances you can use the MissingValue property to indicate that missing values are represented by something other than the default value Double.NaN. However, Double.NaN will continue to be treated as missing, in addition to whatever value you set.
You can also change the default missing value for all columns of a particular type by setting the appropriate static variable in StatsSettings. Thus, this code sets the default missing value for integer columns to -1 for all subsequently constructed DFIntColumn instances:
Code Example – C#
StatsSettings.IntegerMissingValue = -1;
Code Example – VB
StatsSettings.IntegerMissingValue = -1
The Clean() method returns a new column with missing values removed.
NMath provides convenience methods for applying functions to elements of a column. Each of these methods takes a function delegate. The Apply() method returns a new column whose contents are the result of applying the given function to each element of the column. The Transform() method modifies a column object by applying the given function to each of its elements.
Suppose, for example, that you want to cap all numeric values in a DFNumericColumn at 100.0. You could write a simple function like this:
Code Example – C#
private static double Cap( double x )
{
return x > 100.0 ? 100.0 : x;
}
Code Example – VB
Private Shared Function Cap(X As Double) As Double
If X > 100 Then
Return 100
Else
Return X
End If
End Function
Then encapsulate the function in a Func<double, double> delegate:
Code Example – C#
var capDelegate = new Func<double, double>( Cap );
Code Example – VB
Dim CapDelegate As New Func(Of Double, Double)(AddressOf Cap)
This code caps all numeric values in column col:
Code Example – C#
col.Transform( capDelegate );
Code Example – VB
Col.Transform( capDelegate )
A common use of the Apply() functions is to create a new column whose values are a function of values in one or two existing column. For example, suppose you have FirstName and LastName string columns in data frame df, and want to create a new column containing customers' full names. You could write a simple function like this:
Code Example – C#
private static string Cat( string first, string last )
{
return first + " " + last;
}
Code Example – VB
Private Shared Function Cat(First As String, Last As String) As String
Return First & Last
End Function
Then encapsulate the function in a Func<String, String, String> delegate:
Code Example – C#
var catDelegate = new Func<String, String, String>( Cat );
Code Example – VB
Dim CatDelegate As New Func(Of String, String, String)(AddressOf Cat)
This code creates a new column containing the concatenated names:
Code Example – C#
DFStringColumn col =
( (DFStringColumn)data["FirstName"] ).Apply( "FullName",
catDelegate, (DFStringColumn)data["LastName"] );
Code Example – VB
Dim First As DFStringColumn =
CType(Data["FirstName"], DFStringColumn )
Dim Last As DFStringColumn =
CType(Data["LastName"], DFStringcolumn )
Dim Col As DFStringColumn =
First.Apply("FullName", CatDelegate, Last)
Data from a column can be exported in various ways:
● ToArray() exports the contents of a column to a strongly-typed array.
● ToDoubleArray() extracts the contents of a column to an array of doubles (numeric columns only).
● ToDoubleVector() extracts the contents of a column to a DoubleVector (numeric columns only).
● ToIntArray() extracts the contents of a column to an array of integers (integer columns only).
● ToString() returns a formatted string representation of a column.
● ToStringArray() exports the contents of a column to an array of strings.