Statistical functions generally support the NMath types DoubleVector and DoubleMatrix, as well as native arrays of doubles. In many cases, these types are sufficient for storing and manipulating your statistical data. However, they suffer from two limitations: they can only store numeric data, and they have limited support for adding, inserting, removing, and reordering data. Because the underlying data is an array of doubles, data must be copied to new storage every time manipulation operations such as these are performed.
For these reasons, NMath provides the DataFrame class which represents a two-dimensional data object consisting of a list of columns of the same length. Columns are themselves lists of different types of data: numeric, string, boolean, generic, and so on.
Methods are provided for appending, inserting, removing, sorting, and permuting rows and columns in a data frame. Because the underlying data is in a list, elements can be added, removed, and reordered without having to copy all of the data to new storage.
A DataFrame can be viewed as a kind of virtual database table. Columns can be accessed by numeric index (0...n-1) or by a string name supplied at construction time. Rows can be accessed by numeric index (0...n-1) or by a key object. Column names and row keys do not need to be unique. For example, this output shows a formatted string representation of data from a sample data frame:
# State Weight Married
John Smith OR 165 true
Ruth Barnes WA 147 true
Jane Jones VT 115 false
Tim Travis AK 230 false
Betsy Young MA 130 true
Arthur Smith CA 152 false
Emma Allen OK 135 false
Roy Wilkenson WI 182 true
This data frame contains three columns: column 0, named State, contains string data; column 1, named Weight, contains integer data; column 2, named Married, contains boolean data. There are eight rows of data in this data frame, and the subjects' names are used as row keys.
This chapter describes how to use the DataFrame class.