The statistical functions in NMath Stats support the NMath types DoubleVector and DoubleMatrix, as well as simple arrays of doubles. In many cases, these types are sufficient for storing and manipulating your statistical data. However, they suffer from two limitations: they can only store numeric data, and they have limited support for adding, inserting, removing, and reordering data. Because the underlying data is an array of doubles, data must be copied to new storage every time manipulation operations such as these are performed.
For these reasons, NMath Stats provides the DataFrame class which represents a two-dimensional data object consisting of a list of columns of the same length. Columns are themselves lists of different types of data: numeric, string, boolean, generic, and so on.
Methods are provided for appending, inserting, removing, sorting, and permuting rows and columns in a data frame. Because the underlying data is in a list, elements can be added, removed, and reordered without having to copy all of the data to new storage.
A DataFrame can be viewed as a kind of virtual database table. Columns can be accessed by numeric index (0...n-1) or by a string name supplied at construction time. Rows can be accessed by numeric index (0...n-1) or by a key object. Column names and row keys do not need to be unique. For example, this output shows a formatted string representation of data from a sample data frame:
# State Weight Married John Smith OR 165 true Ruth Barnes WA 147 true Jane Jones VT 115 false Tim Travis AK 230 false Betsy Young MA 130 true Arthur Smith CA 152 false Emma Allen OK 135 false Roy Wilkenson WI 182 true
This data frame contains three columns: column 0, named State, contains string data; column 1, named Weight, contains integer data; column 2, named Married, contains boolean data. There are eight rows of data in this data frame, and the subjects' names are used as row keys.
This chapter describes how to use the DataFrame class.