Home
Products
Support
Blog
Resources
Company
NMath Stats User's Guide
TOC |  Previous |  Next |  Index

2.10 Factors

The Factor class represents a categorical vector in which all elements are drawn from a finite number of factor levels. Thus, a Factor contains two parts:

For example, this string data:

"A", "A", "C", "B", "A", "C", "B"

could be presented as a Factor with the following levels and categorical data:

object[] levels = { "A", "B", "C" };
int[] data = { 0, 0, 2, 1, 0, 2, 1 };

Factors are usually constructed from a data frame column using the GetFactor() method, but they can also be constructed independently.

Creating Factors

The GetFactor() method on DataFrame accepts a column index or name and returns a Factor with levels for the sorted, unique elements in the given column:

Factor myColFactor = df.GetFactor( "myCol" );

Alternatively, you can provide the factor levels yourself. The order is preserved. Thus:

object[] levels = new object[] { "Q1", "Q2", "Q3", "Q4" };
Factor myColFactor = df.GetFactor( "myCol", levels );

An InvalidArgumentException is raised if the specified column contains a value not present in the given array of levels.

You can also construct a Factor independently of a DataFrame. For example, you can construct a Factor from an array of values:

object[] strArray = { 1, 1, 3, 2, 1, 3, 2 };
Factor factor = new Factor( strArray );

Factor levels are constructed from a sorted list of unique values in the passed array.

Alternatively, you can construct a Factor from an array of factor levels, and a data array consisting of indices into the factor levels:

object[] levels = { 1, 2, 3 };
int[] data = { 0, 0, 2, 1, 0, 2, 1 };
Factor factor = new Factor( levels, data );

An InvalidArgumentException is thrown if the given data array contains an invalid index.

Properties of Factors

The Factor class provides the following properties:

Accessing Factors

A standard indexer is provided for accessing the element at a given index:

string str = (string)factor[2];

The indexer returns Levels[ Data[index] ]-that is, it returns the level at the given position.

Creating Groupings with Factors

The principal use of factors is in conjunction with the GetGroupings() methods on Subset. One overload of this method accepts a single Factor and returns an array of subsets containing the indices for each level of the given factor. Another overload accepts two Factor objects and returns a two-dimensional jagged array of subsets containing the indices for each combination of levels in the two factors.

For example, suppose we weigh human subjects based on sex and age group. The data for 15 subject might look like this:

Table 4 - Sample data
 
Male
Female
Child
45, 42
30, 35, 60, 40
Adult
182, 170
115, 130, 110
Senior
142, 155
115, 123

In a DataFrame, each observation would be a row, like so:

DataFrame df = new DataFrame();
df.AddColumn( new DFStringColumn( "Sex" ) );  
df.AddColumn( new DFStringColumn( "AgeGroup" ));
df.AddColumn( new DFIntColumn( "Weight" ) );

df.AddRow( "John Smith", "Male", "Child", 45 );
df.AddRow( "Ruth Barnes", "Female", "Senior", 115 );
df.AddRow( "Jane Jones", "Female", "Adult", 115 );
df.AddRow( "Timmy Toddler", "Male", "Child", 42 );
df.AddRow( "Betsy Young", "Female", "Adult", 130 );
df.AddRow( "Arthur Smith", "Male", "Senior", 142 );
df.AddRow( "Lucy Young", "Female", "Child", 30 );
df.AddRow( "Emma Allen", "Female", "Child", 35 );
df.AddRow( "Roy Wilkenson", "Male", "Adult", 182 );
df.AddRow( "Susan Schwarz", "Female", "Senior", 110 );
df.AddRow( "Ming Tao", "Female", "Senior", 123 );
df.AddRow( "Johanna Glynn", "Female", "Child", 60 );
df.AddRow( "Randall Harvey", "Male", "Adult", 170 );
df.AddRow( "Tom Howard", "Male", "Senior", 155 );
df.AddRow( "Jennifer Watson", "Female", "Child", 40 );

In this case, we're using the subjects' names as row keys.

It is natural to construct factors from the Sex and AgeGroup columns:

Factor sex = df.GetFactor( "Sex" );
Factor age = df.GetFactor( "AgeGroup" );

We can then use these factors in conjunction with the GetGroupings() methods on Subset to create subsets representing the original rows, columns, and cells in Table 4:

Subset[] sexGroups = Subset.GetGroupings( sex );
Subset[] ageGroups = Subset.GetGroupings( age );
Subset[,] cellGroups = Subset.GetGroupings( sex, age );

These subsets can then be used to operate on the relevant portions of the data frame. For instance, this code prints out row means, column means, and cell means for Table 4:

Console.WriteLine( "\nTABLE ROW MEANS" ); 
for ( int i = 0; i < age.NumberOfLevels; i++ )
{
  double mean = StatsFunctions.Mean(
    df[ df.IndexOfColumn( "Weight" ), ageGroups[i] ] );
  Console.WriteLine( "Mean for {0} = {1}", age.Levels[i], mean );
}

Console.WriteLine( "\nTABLE COLUMN MEANS" ); 
for ( int i = 0; i < sex.NumberOfLevels; i++ )
{
  double mean = StatsFunctions.Mean(
    df[ df.IndexOfColumn( "Weight" ), sexGroups[i] ] );
  Console.WriteLine( "Mean for {0} = {1}", sex.Levels[i], mean );
}

Console.WriteLine( "\nTABLE CELL MEANS" );
for ( int i = 0; i < sex.NumberOfLevels; i++ )
{
  for ( int j = 0; j < age.NumberOfLevels; j++ )
  {
    double mean = StatsFunctions.Mean(
      df[ df.IndexOfColumn( "Weight" ), cellGroups[i,j] ] );
    Console.WriteLine( "Mean for {0} {1} = {2}",
      sex.Levels[i], age.Levels[j], mean );
  }
}

The output is:

TABLE ROW MEANS
Mean for Adult = 149.25
Mean for Child = 42
Mean for Senior = 129

TABLE COLUMN MEANS
Mean for Female = 84.2222222222222
Mean for Male = 122.666666666667

TABLE CELL MEANS
Mean for Female Adult = 122.5
Mean for Female Child = 41.25
Mean for Female Senior = 116
Mean for Male Adult = 176
Mean for Male Child = 43.5
Mean for Male Senior = 148.5

See also the Tabulate() convenience methods on class DataFrame, as described in Section 2.11.

TOC |  Previous |  Next |  Index

Copyright © 2008 CenterSpace Software, LLC. All rights reserved.
All trademarks and registered trademarks mentioned on this web site are the property of their respective owners.
Contact Webmaster