NMath Stats User's Guide

TOC | Previous | Next | Index

10.2 Factor Analysis (.NET, C#, CSharp, VB, Visual Basic, F#)

Factor analysis describes the variability among observed, correlated variables in terms of a potentially lower number of unobserved variables, called factors.

In general, factor analysis consists of two steps:

In the extraction step, factors are extracted from the data.

In NMath Stats, IFactorExtraction is the interface for factor extraction algo­rithms. Class PCFactorExtraction implements the principle component (PC) algorithm for factor extraction.

In the rotation step, the factors are rotated in order to maximize the relationship between the variables and the factors.

In NMath Stats, IFactorRotation is the interface for factor rotation algo­rithms. Class VarimaxRotation computes the varimax rotation of the factors. Factors are rotated to maximize the sum of the variances of the squared loadings. Kaiser normalization is optionally performed. Class NoRotation can be used when no rotation is desired.

Creating Factor Analyses

NMath Stats provides three classes for performing factor analysis:

FactorAnalysisCorrelation performs a factor analysis on given case data by forming the correlation matrix for the variables.

FactorAnalysisCovariance performs a factor analysis on given case data using the covariance matrix.

DoubleFactorAnalysis performs a factor analysis on a symmetric matrix of data, assumed to be either a correlation or covariance matrix, if you don't have access to the original case data.

When case data is used, the data should provided in matrix form—the variable values in columns and each row representing a case.

All factor analysis are templatized on the extraction and rotation algorithm to use. For example:

Code Example – C# factor analysis

var fa = new FactorAnalysisCorrelation<PCFactorExtraction, 
  VarimaxRotation>( data );

For greater control, construct the extraction and rotation objects explicitly. For example, a PCFactorExtraction instance can be constructed from a delegate for determining the number of factors to extract. The type of this argument is Func<DoubleVector, DoubleMatrix, int>. It takes as arguments the vector of eigenvalues and the matrix of eigenvectors, and returns the number of factors to extract. Class NumberOfFactors contains static methods for creating functors for several common strategies. This code extracts factors whose eigenvalues are greater than 1.2 times the mean of the eigenvalues:

Code Example – C# factor analysis

var factorExtraction = new PCFactorExtraction( 
  NumberOfFactors.EigenvaluesGreaterThanMean( 1.2 ) );

The following code constructs a VarimaxRotation instance with a specified tolerance. Iteration stops when the relative change in the sum of the singular values is less than this number. We also specify that we do not want Kaiser normalization to be performed.

Code Example – C# factor analysis

var factorRotation = new VarimaxRotation
  Tolerance = 1e-6,
  Normalize = false

Once you've constructed your extraction and rotation objects, you can construct the factor analysis instance:

Code Example – C# factor analysis

var fa = new FactorAnalysisCovariance<PCFactorExtraction, 
  VarimaxRotation>( data, BiasType.Biased, factorExtraction, 
    factorRotation );

Factor Analysis Results

Once you've constructed a factor analysis instance, you can access the results using the following properties:

NumberOfFactors get the number of factors extracted.

Factors gets the extracted factors. Each column of the matrix is a factor.

RotatedFactors gets the rotated factors. Each column of the matrix is a factor.

VarianceProportions gets a vector of proportion of variance explained by each factor.

CumulativeVarianceProportions gets the cumulative variance proportions.

ExtractedCommunalities get the proportion of each variable's variance that can be explained by the extracted factors jointly.

InitialCommunalities get the proportion of each variable's variance that can be explained by the factors jointly.

SumOfSquaredLoadings gets the sum of squared loadings for each extracted factor.

RotatedSumOfSquaredLoadings gets the sum of squared loadings for each rotated extracted factor.

For instance:

Code Example – C# factor analysis

DoubleVector extractedCommunalities = fa.ExtractedCommunalities;
for ( int i = 0; i < data.Cols; i++ )
  Console.WriteLine( "{0}\t{1}", data[i].Name, 
    extractedCommunalities[i] );

for ( int i = 0; i < fa.VarianceProportions.Length; i++ )
  double varProportion = fa.VarianceProportions[i] * 100.0;
  double cummlativeVarProportion = 
    fa.CumulativeVarianceProportions[i] * 100.0;
  double eigenValue = fa.FactorExtraction.Eigenvalues[i];
  Console.WriteLine( "{0}\t\t{1}\t{2}\t\t{3}", i, eigenValue, 
    varProportion, cummlativeVarProportion );

double eigenValueSum =
  NMathFunctions.Sum( fa.FactorExtraction.Eigenvalues );
DoubleVector RotatedSSLoadingsVarianceProportions = 
  fa.RotatedSumOfSquaredLoadings / eigenValueSum;
  "\nRotated Extraction Sums of Squared Loadings - " );
Console.WriteLine( "factor\tTotal\t% of Variance\tCummlative %" );
  "----------------------------------------------------" );
double cummlative = 0;

for ( int i = 0; i < fa.NumberOfFactors; i++ )
  double varProportion =
    RotatedSSLoadingsVarianceProportions[i] * 100.0;
  cummlative += RotatedSSLoadingsVarianceProportions[i];
  double cummlativeVarProportion = cummlative * 100.0;
  double sumSquaredLoading = fa.RotatedSumOfSquaredLoadings[i];
  Console.WriteLine( "{0}\t\t{1}\t{2}\t\t{3}", i, 
    sumSquaredLoading, varProportion, cummlativeVarProportion );

DoubleMatrix rotatedComponentMatrix = fa.RotatedFactors;
for ( int i = 0; i < data.Cols; i++ )
  var formatString = "{0}\t\t{1}\t{2}\t{3}";
  double comp0 = rotatedComponentMatrix.Row( i )[0];
  double comp1 = rotatedComponentMatrix.Row( i )[1];
  double comp2 = rotatedComponentMatrix.Row( i )[2];
  Console.WriteLine( "{0}\t{1}\t{2}\t{3}", data[i].Name,
    comp0, comp1, comp2 );

Factor Scores

The case data values for new factor variables are contained in the factor scores matrix. The score for a given factor is a linear combination of all of the measures, weighted by the corresponding factor loading.

There are different algorithms for producing the factors scores. The FactorScores()method can be passed an object implementing the IFactorScores interface, specifying the algorithm to be used. If no argument is passed, the regression algorithm for computing factor scores is used, implemented in class RegressionFactorScores.

For example, this code print the factor scores for the first three cases. Data is normalized.

Code Example – C# factor analysis

var rowSlice = new Slice( 0, 3 );
  fa.FactorScores()[rowSlice, Slice.All].ToTabDelimited() );

Factor scores are a linear combination of the original variable values. The coefficients used for the linear combination are found in the factor score coefficients matrix. This matrix may be obtained from the FactorScoreCoefficients() method on the factor analysis class. Like factor scores, the algorithm to use may be specified by passing an object implementing the IFactorScores interface to this method. By default, the regression algorithm is used.

The factor score coefficients can be used to compute scores for novel case data. For instance:

Code Example – C# factor analysis

DoubleMatrix scoreCoefficients = fa.FactorScoreCoefficients();
var newCaseData = new DoubleMatrix(
  "2x10 [0.0 38.9 3.8 196.0 115.4 71.9 177.0 3.972 17.5 27.8  " + 
        "1.0 46.0 2.5 220.0 101.6 73.4 168.6 3.75  19.0 20.0]" );
  NMathFunctions.Product( newCaseData, scoreCoefficients ) );