NMath - Chapter 37. Data Frames (.NET, C#, CSharp, VB, Visual Basic, F#)

Chapter 37. Data Frames (.NET, C#, CSharp, VB, Visual Basic, F#)

Statistical functions generally support the NMath types DoubleVector and DoubleMatrix, as well as native arrays of doubles. In many cases, these types are sufficient for storing and manipulating your statistical data. However, they suffer from two limitations: they can only store numeric data, and they have limited support for adding, inserting, removing, and reordering data. Because the underlying data is an array of doubles, data must be copied to new storage every time manipulation operations such as these are performed.

For these reasons, NMath provides the DataFrame class which represents a two-dimensional data object consisting of a list of columns of the same length. Columns are themselves lists of different types of data: numeric, string, boolean, generic, and so on.

Methods are provided for appending, inserting, removing, sorting, and permuting rows and columns in a data frame. Because the underlying data is in a list, elements can be added, removed, and reordered without having to copy all of the data to new storage.

A DataFrame can be viewed as a kind of virtual database table. Columns can be accessed by numeric index (0...n-1) or by a string name supplied at construction time. Rows can be accessed by numeric index (0...n-1) or by a key object. Column names and row keys do not need to be unique. For example, this output shows a formatted string representation of data from a sample data frame:

#             State  Weight  Married

John Smith    OR     165     true

Ruth Barnes   WA     147     true

Jane Jones    VT     115     false

Tim Travis    AK     230     false

Betsy Young   MA     130     true

Arthur Smith  CA     152     false

Emma Allen    OK     135     false

Roy Wilkenson WI     182     true

This data frame contains three columns: column 0, named State, contains string data; column 1, named Weight, contains integer data; column 2, named Married, contains boolean data. There are eight rows of data in this data frame, and the subjects' names are used as row keys.

This chapter describes how to use the DataFrame class.

Top