**10.2****
****Factor Analysis** (.NET, C#, CSharp, VB, Visual Basic, F#)

Factor analysis describes the
variability among observed, correlated variables in terms of a potentially
lower number of unobserved variables, called *factors*.

In general, factor analysis consists of two steps:

● In the *extraction* step, factors are extracted from
the data.

In **NMath Stats**,
**IFactorExtraction** is the interface
for factor extraction algorithms.
Class **PCFactorExtraction** implements
the principle component (PC) algorithm for factor extraction.

● In the *rotation* step, the factors are rotated in
order to maximize the relationship between the variables and the factors.

In **NMath Stats**,
**IFactorRotation** is the interface
for factor rotation algorithms.
Class **VarimaxRotation** computes
the varimax rotation of the factors.
Factors are rotated to maximize the sum of the variances of the squared
loadings. Kaiser normalization is optionally performed. Class **NoRotation** can be used when no rotation
is desired.

**Creating Factor Analyses**

**NMath Stats**
provides three classes for performing factor analysis:

● **FactorAnalysisCorrelation** performs a
factor analysis on given case data by forming the correlation matrix
for the variables.

● **FactorAnalysisCovariance** performs a factor
analysis on given case data using the covariance matrix.

● **DoubleFactorAnalysis** performs a factor
analysis on a symmetric matrix of data, assumed to be either a correlation
or covariance matrix, if you don't have access to the original case data.

When case data is used, the data should provided in matrix form—the variable values in columns and each row representing a case.

All factor analysis are templatized on the extraction and rotation algorithm to use. For example:

Code Example – C# factor analysis

var fa = new FactorAnalysisCorrelation<PCFactorExtraction, VarimaxRotation>( data );

For greater control, construct the extraction and rotation
objects explicitly. For example, a **PCFactorExtraction**
instance can be constructed from a delegate for determining the number
of factors to extract. The type of this argument is Func<DoubleVector,
DoubleMatrix, int>. It takes as arguments the vector of eigenvalues
and the matrix of eigenvectors, and returns the number of factors to
extract. Class **NumberOfFactors**
contains static methods for creating functors for several common strategies.
This code extracts factors whose eigenvalues are greater than 1.2 times the mean of the eigenvalues:

Code Example – C# factor analysis

var factorExtraction = new PCFactorExtraction( NumberOfFactors.EigenvaluesGreaterThanMean( 1.2 ) );

The following code constructs a **VarimaxRotation**
instance with a specified tolerance. Iteration stops when the relative
change in the sum of the singular values is less than this number. We
also specify that we do not want Kaiser normalization to be performed.

Code Example – C# factor analysis

var factorRotation = new VarimaxRotation { Tolerance = 1e-6, Normalize = false };

Once you've constructed your extraction and rotation objects, you can construct the factor analysis instance:

Code Example – C# factor analysis

var fa = new FactorAnalysisCovariance<PCFactorExtraction, VarimaxRotation>( data, BiasType.Biased, factorExtraction, factorRotation );

**Factor Analysis Results**

Once you've constructed a factor analysis instance, you can access the results using the following properties:

● NumberOfFactors get the number of factors extracted.

● Factors gets the extracted factors. Each column of the matrix is a factor.

● RotatedFactors gets the rotated factors. Each column of the matrix is a factor.

● VarianceProportions gets a vector of proportion of variance explained by each factor.

● CumulativeVarianceProportions gets the cumulative variance proportions.

● ExtractedCommunalities get the proportion of each variable's variance that can be explained by the extracted factors jointly.

● InitialCommunalities get the proportion of each variable's variance that can be explained by the factors jointly.

● SumOfSquaredLoadings gets the sum of squared loadings for each extracted factor.

● RotatedSumOfSquaredLoadings gets the sum of squared loadings for each rotated extracted factor.

For instance:

Code Example – C# factor analysis

DoubleVector extractedCommunalities = fa.ExtractedCommunalities; for ( int i = 0; i < data.Cols; i++ ) { Console.WriteLine( "{0}\t{1}", data[i].Name, extractedCommunalities[i] ); } Console.WriteLine(); for ( int i = 0; i < fa.VarianceProportions.Length; i++ ) { double varProportion = fa.VarianceProportions[i] * 100.0; double cummlativeVarProportion = fa.CumulativeVarianceProportions[i] * 100.0; double eigenValue = fa.FactorExtraction.Eigenvalues[i]; Console.WriteLine( "{0}\t\t{1}\t{2}\t\t{3}", i, eigenValue, varProportion, cummlativeVarProportion ); } Console.WriteLine(); double eigenValueSum = NMathFunctions.Sum( fa.FactorExtraction.Eigenvalues ); DoubleVector RotatedSSLoadingsVarianceProportions = fa.RotatedSumOfSquaredLoadings / eigenValueSum; Console.WriteLine( "\nRotated Extraction Sums of Squared Loadings - " ); Console.WriteLine( "factor\tTotal\t% of Variance\tCummlative %" ); Console.WriteLine( "----------------------------------------------------" ); double cummlative = 0; for ( int i = 0; i < fa.NumberOfFactors; i++ ) { double varProportion = RotatedSSLoadingsVarianceProportions[i] * 100.0; cummlative += RotatedSSLoadingsVarianceProportions[i]; double cummlativeVarProportion = cummlative * 100.0; double sumSquaredLoading = fa.RotatedSumOfSquaredLoadings[i]; Console.WriteLine( "{0}\t\t{1}\t{2}\t\t{3}", i, sumSquaredLoading, varProportion, cummlativeVarProportion ); } Console.WriteLine(); DoubleMatrix rotatedComponentMatrix = fa.RotatedFactors; for ( int i = 0; i < data.Cols; i++ ) { var formatString = "{0}\t\t{1}\t{2}\t{3}"; double comp0 = rotatedComponentMatrix.Row( i )[0]; double comp1 = rotatedComponentMatrix.Row( i )[1]; double comp2 = rotatedComponentMatrix.Row( i )[2]; Console.WriteLine( "{0}\t{1}\t{2}\t{3}", data[i].Name, comp0, comp1, comp2 ); }

**Factor Scores**

The case data values for new factor variables are contained
in the *factor scores*
matrix. The score for a given factor is a linear combination of all of
the measures, weighted by the corresponding factor loading.

There are different algorithms for producing the factors
scores. The FactorScores()method can be
passed an object implementing the **IFactorScores**
interface, specifying the algorithm to be used. If no argument is passed,
the regression algorithm for computing factor scores is used, implemented
in class **RegressionFactorScores**.

For example, this code print the factor scores for the first three cases. Data is normalized.

Code Example – C# factor analysis

var rowSlice = new Slice( 0, 3 ); Console.WriteLine( fa.FactorScores()[rowSlice, Slice.All].ToTabDelimited() );

Factor scores are a linear combination of the original
variable values. The coefficients used for the linear combination are
found in the *factor score coefficients
matrix*. This matrix may be obtained from the FactorScoreCoefficients() method on the factor
analysis class. Like factor scores, the algorithm to use may be specified
by passing an object implementing the **IFactorScores**
interface to this method. By default, the regression algorithm is used.

The factor score coefficients can be used to compute scores for novel case data. For instance:

Code Example – C# factor analysis

DoubleMatrix scoreCoefficients = fa.FactorScoreCoefficients(); var newCaseData = new DoubleMatrix( "2x10 [0.0 38.9 3.8 196.0 115.4 71.9 177.0 3.972 17.5 27.8 " + "1.0 46.0 2.5 220.0 101.6 73.4 168.6 3.75 19.0 20.0]" ); Console.WriteLine( NMathFunctions.Product( newCaseData, scoreCoefficients ) );