NMath User's Guide

TOC | Previous | Next | Index

37.10 Factors (.NET, C#, CSharp, VB, Visual Basic, F#)

The Factor class represents a categorical vector in which all elements are drawn from a finite number of factor levels. Thus, a Factor contains two parts:

an object array of factor levels

an integer array of categorical data, of which each element is an index into the array of levels

For example, this string data:



"A", "A", "C", "B", "A", "C", "B"

could be presented as a Factor with the following levels and categorical data:

Code Example – C#

object[] levels = { "A", "B", "C" };
int[] data = { 0, 0, 2, 1, 0, 2, 1 };

Code Example – VB

Dim Levels As Object() = {"A", "B", "C"}
Dim Data As Integer() = {0, 0, 2, 1, 0, 2, 1}

Factors are usually constructed from a data frame column using the GetFactor() method, but they can also be constructed independently.

Creating Factors

The GetFactor() method on DataFrame accepts a column index or name and returns a Factor with levels for the sorted, unique elements in the given column:

Code Example – C#

Factor myColFactor = df.GetFactor( "myCol" );

Code Example – VB

Dim ColFactor As Factor = DF.GetFactor("myCol")

Alternatively, you can provide the factor levels yourself. The order is preserved. Thus:

Code Example – C#

var levels = new object[] { "Q1", "Q2", "Q3", "Q4" };
Factor myColFactor = df.GetFactor( "myCol", levels );

Code Example – VB

Dim Levels As Object() = {"Q1", "Q2", "Q3", "Q4"}
Dim ColFactor As Factor = DF.GetFactor("myCol", Levels)

An InvalidArgumentException is raised if the specified column contains a value not present in the given array of levels.

You can also construct a Factor independently of a DataFrame. For example, you can construct a Factor from an array of values:

Code Example – C#

var strArray = new object[] { 1, 1, 3, 2, 1, 3, 2 };
var factor = new Factor( strArray );

Code Example – VB

Dim StrArray As Object() = {1, 1, 3, 2, 1, 3, 2}
Dim Factor As New Factor(StrArray)

Factor levels are constructed from a sorted list of unique values in the passed array.

Alternatively, you can construct a Factor from an array of factor levels, and a data array consisting of indices into the factor levels:

Code Example – C#

var levels = new object[] { 1, 2, 3 };
var data = new int[] { 0, 0, 2, 1, 0, 2, 1 };
var factor = new Factor( levels, data );

Code Example – VB

Dim Levels As Object() = {1, 2, 3}
Dim Data As Integer() = {0, 0, 2, 1, 0, 2, 1}
Dim Factor As New Factor(Levels, Data)

An InvalidArgumentException is thrown if the given data array contains an invalid index.

Properties of Factors

The Factor class provides the following properties:

Data gets the categorical data for the factor. Each element in the returned integer array is an index into Levels.

Levels gets the levels of the factor as an array of objects.

Length gets the length of the Data in the factor.

Name gets and set the name of the factor.

NumberOfLevels gets the number of levels in the factor.

Accessing Factors

A standard indexer is provided for accessing the element at a given index:

Code Example – C#

string str = (string)factor[2];

Code Example – VB

Dim Str As String = CType(Factor(2), String)

The indexer returns Levels[ Data[index] ]—that is, it returns the level at the given position.

Creating Groupings with Factors

The principal use of factors is in conjunction with the GetGroupings() methods on Subset. One overload of this method accepts a single Factor and returns an array of subsets containing the indices for each level of the given factor. Another overload accepts two Factor objects and returns a two-dimensional jagged array of subsets containing the indices for each combination of levels in the two factors.

For example, suppose we weigh human subjects based on sex and age group. The data for 15 subject might look like this:

Table 26 – Sample data

 

Male

Female

Child

45, 42

30, 35, 60, 40

Adult

182, 170

115, 130, 110

Senior

142, 155

115, 123

In a DataFrame, each observation would be a row, like so:

Code Example – C#

var df = new DataFrame();
df.AddColumn( new DFStringColumn( "Sex" ) );  
df.AddColumn( new DFStringColumn( "AgeGroup" ));
df.AddColumn( new DFIntColumn( "Weight" ) );



df.AddRow( "John Smith", "Male", "Child", 45 );
df.AddRow( "Ruth Barnes", "Female", "Senior", 115 );
df.AddRow( "Jane Jones", "Female", "Adult", 115 );
df.AddRow( "Timmy Toddler", "Male", "Child", 42 );
df.AddRow( "Betsy Young", "Female", "Adult", 130 );
df.AddRow( "Arthur Smith", "Male", "Senior", 142 );
df.AddRow( "Lucy Young", "Female", "Child", 30 );
df.AddRow( "Emma Allen", "Female", "Child", 35 );
df.AddRow( "Roy Wilkenson", "Male", "Adult", 182 );
df.AddRow( "Susan Schwarz", "Female", "Senior", 110 );
df.AddRow( "Ming Tao", "Female", "Senior", 123 );
df.AddRow( "Johanna Glynn", "Female", "Child", 60 );
df.AddRow( "Randall Harvey", "Male", "Adult", 170 );
df.AddRow( "Tom Howard", "Male", "Senior", 155 );
df.AddRow( "Jennifer Watson", "Female", "Child", 40 );

Code Example – VB

Dim DF As New DataFrame()
DF.AddColumn(New DFStringColumn("Sex"))
DF.AddColumn(New DFStringColumn("AgeGroup"))
DF.AddColumn(New DFIntColumn("Weight"))



DF.AddRow("John Smith", "Male", "Child", 45)
DF.AddRow("Ruth Barnes", "Female", "Senior", 115)
DF.AddRow("Jane Jones", "Female", "Adult", 115)
DF.AddRow("Timmy Toddler", "Male", "Child", 42)
DF.AddRow("Betsy Young", "Female", "Adult", 130)
DF.AddRow("Arthur Smith", "Male", "Senior", 142)
DF.AddRow("Lucy Young", "Female", "Child", 30)
DF.AddRow("Emma Allen", "Female", "Child", 35)
DF.AddRow("Roy Wilkenson", "Male", "Adult", 182)
DF.AddRow("Susan Schwarz", "Female", "Senior", 110)
DF.AddRow("Ming Tao", "Female", "Senior", 123)
DF.AddRow("Johanna Glynn", "Female", "Child", 60)
DF.AddRow("Randall Harvey", "Male", "Adult", 170)
DF.AddRow("Tom Howard", "Male", "Senior", 155)
DF.AddRow("Jennifer Watson", "Female", "Child", 40)

In this case, we're using the subjects' names as row keys.

It is natural to construct factors from the Sex and AgeGroup columns:

Code Example – C#

Factor sex = df.GetFactor( "Sex" );
Factor age = df.GetFactor( "AgeGroup" );

Code Example – VB

Dim Sex As Factor = DF.GetFactor("Sex")
Dim Age As Factor = DF.GetFactor("AgeGroup")

We can then use these factors in conjunction with the GetGroupings() methods on Subset to create subsets representing the original rows, columns, and cells in Table 26:

Code Example – C#

Subset[] sexGroups = Subset.GetGroupings( sex );
Subset[] ageGroups = Subset.GetGroupings( age );
Subset[,] cellGroups = Subset.GetGroupings( sex, age );

Code Example – VB

Dim SexGroups As Subset() = Subset.GetGroupings(Sex)
Dim AgeGroups As Subset() = Subset.GetGroupings(Age)
Dim CellGroups As Subset(,) = Subset.GetGroupings(Sex, Age)

These subsets can then be used to operate on the relevant portions of the data frame. For instance, this code prints out row means, column means, and cell means for Table 26:

Code Example – C#

Console.WriteLine( "\nTABLE ROW MEANS" ); 
for ( int i = 0; i < age.NumberOfLevels; i++ )
{
  double mean = StatsFunctions.Mean(
    df[ df.IndexOfColumn( "Weight" ), ageGroups[i] ] );
  Console.WriteLine( "Mean for {0} = {1}", age.Levels[i], mean );
}



Console.WriteLine( "\nTABLE COLUMN MEANS" ); 
for ( int i = 0; i < sex.NumberOfLevels; i++ )
{
  double mean = StatsFunctions.Mean(
    df[ df.IndexOfColumn( "Weight" ), sexGroups[i] ] );
  Console.WriteLine( "Mean for {0} = {1}", sex.Levels[i], mean );
}



Console.WriteLine( "\nTABLE CELL MEANS" );
for ( int i = 0; i < sex.NumberOfLevels; i++ )
{
  for ( int j = 0; j < age.NumberOfLevels; j++ )
  {
    double mean = StatsFunctions.Mean(
      df[ df.IndexOfColumn( "Weight" ), cellGroups[i,j] ] );
    Console.WriteLine( "Mean for {0} {1} = {2}",
      sex.Levels[i], age.Levels[j], mean );
  }
}

Code Example – VB

Console.WriteLine(Environment.NewLine & "TABLE ROW MEANS")
For I As Integer = 0 To Age.NumberOfLevels - 1
  Dim Mean As Double =   
    StatsFunctions.Mean(DF(DF.IndexOfColumn("Weight"),       
    AgeGroups(I)))
  Console.WriteLine("Mean for {0} = {1}", Age.Levels(I), Mean)
Next



Console.WriteLine(Environment.NewLine & "TABLE COLUMN MEANS")
For I As Integer = 0 To Sex.NumberOfLevels - 1
  Dim Mean As Double = 
    StatsFunctions.Mean(DF(DF.IndexOfColumn("Weight"), 
    SexGroups(I)))



  Console.WriteLine("Mean for {0} = {1}", Sex.Levels(I), Mean)
Next



Console.WriteLine(Environment.NewLine & "TABLE CELL MEANS")
For I As Integer = 0 To Sex.NumberOfLevels - 1
  For J As Integer = 0 To Age.NumberOfLevels - 1
    Dim Mean As Double =   
      StatsFunctions.Mean(DF(DF.IndexOfColumn("Weight"), 
      CellGroups(I, J)))
    Console.WriteLine("Mean for {0} {1} = {2}", Sex.Levels(I), 
      Age.Levels(J), Mean)
  Next
Next

The output is:



TABLE ROW MEANS
Mean for Adult = 149.25
Mean for Child = 42
Mean for Senior = 129



TABLE COLUMN MEANS
Mean for Female = 84.2222222222222
Mean for Male = 122.666666666667



TABLE CELL MEANS
Mean for Female Adult = 122.5
Mean for Female Child = 41.25
Mean for Female Senior = 116
Mean for Male Adult = 176
Mean for Male Child = 43.5
Mean for Male Senior = 148.5

See also the Tabulate() convenience methods on class DataFrame, as described in Section 37.11.


Top

Top