← All NMath Code Examples
Imports System
Imports System.IO
Imports CenterSpace.NMath.Core
Namespace CenterSpace.NMath.Examples.VisualBasic
A .NET example in Visual Basic
Module FactorAnalysisAdvancedExample
Sub Main()
NMath Stats provide classes for performing a factor analysis on a set of case data.
Case data should be provided to these classes in matrix form - the variable values
in columns and each row representing a case. In this example we look at
a hypothetical sample of 300 responses on 6 items from a survey of college students
favorite subject matter. The items range in value from 1 to 5, which represent a scale
from Strongly Dislike to Strongly Like. Our 6 items asked students to rate their liking
of different college subject matter areas, including biology (BIO), geology (GEO),
chemistry (CHEM), algebra (ALG), calculus (CALC), and statistics (STAT).
First load the data, which is in a comma delimited form.
Dim FavoriteSubject As DataFrame = DataFrame.Load("advanced_factor_analysis.csv", True, False, ", ", True).CleanRows()
NMath Stats provides three classes for
performing factor analysis. All will perform analysis on the correlation matrix
or the covariance matrix of case data. In addition each of these classes has
two class parameters, on specifying the algorithm used to extract the factors,
and the other specifying a factor rotation method. Here we use the class
FactorAnalysisCovariance, which analyzes the covariance matrix of the case data,
with principal factors extraction and varimax rotation.
The other two factor analysis classes are FactorAnalysisCorrelation, for analyzing
the correlation matrix, and DoubleFactorAnalysis which can be used if you dont
have access to the original case data, just the correlation or covariance matrix
(DoubleFactorAnalysis is a base class for FactorAnalysisCorrelation and
FactorAnalysisCovariance).
Construct the factor analysis object we use for our analysis. Here we
first construct instance of the factor extraction and rotation classes
and use them in the factor analysis object construction. This gives
us control of the parameters affecting these algorithms.
Construct a principal components factor extraction object specifying the
function object for determining the number of factors to extract. The
type of this argument is Func<DoubleVector, DoubleMatrix, int>, it
takes as arguments the vector of eigenvalues and the matrix of eigenvectors
and returns the number of factors to extract. The class NumberOfFactors
contains static methods for creating functors for several common
strategies. Here we extract factors whose eigenvalues are greater
than 1.2 times the mean of the eigenvalues.
Dim FactorExtraction As New PCFactorExtraction(NumberOfFactors.EigenvaluesGreaterThanMean(1.2))
Next construct an instance of the rotation algorithm we want to use,
which is the varimax algorithm. Here we specify convergence criteria
be setting the tolerance to 1e-6. Iteration will stop when the relative
change in the sum of the singular values is less than this number.
We also specify that we do NOT want Kaiser normalization to be performed.
Dim FactorRotation As New VarimaxRotation
FactorRotation.Tolerance = 0.000001
FactorRotation.Normalize = False
We now construct our factor analysis object. We provide the case data as a matrix (columns
correspond to variables and rows correspond to cases), the bias type - variances will be
computed as biased, and our extraction and rotation objects.
Dim FA As New FactorAnalysisCovariance(Of PCFactorExtraction, VarimaxRotation)(FavoriteSubject.ToDoubleMatrix(), _
BiasType.Biased, FactorExtraction, FactorRotation)
Console.WriteLine()
Console.WriteLine("Number of factors extracted: " & FA.NumberOfFactors)
Looks like we will retain two factors.
Extracted communalities are estimates of the proportion of variance in each variable
accounted for by the factors.
Dim ExtractedCommunalities As DoubleVector = FA.ExtractedCommunalities
Console.WriteLine()
Console.WriteLine("Predictor" & ControlChars.Tab & "Extracted Communality")
Console.WriteLine("-------------------------------------")
For I As Integer = 0 To FavoriteSubject.Cols - 1
Console.Write(FavoriteSubject(I).Name & ControlChars.Tab & ControlChars.Tab)
Console.WriteLine(ExtractedCommunalities(I).ToString("G3"))
Next
Console.WriteLine()
We can get a little better picture of the communalities by looking at their
rescaled values. The FactorAnalysisCovariance class provides many rescaled
results for calculations involving the extracted factors. In the rescaled
version the factors are first rescaled by dividing by the standard deviations
of the case variables before being used in the calculation.
The rescaled communalities have their values are between 0 and 1. Most of the values
are close to 1, except for STAT. Maybe we should extract another factor?
Dim RescaledCommunalities As DoubleVector = FA.RescaledExtractedCommunalities
Console.WriteLine("Predictor" & ControlChars.Tab & "Rescaled Communality")
Console.WriteLine("-------------------------------------")
For I = 0 To FavoriteSubject.Cols - 1
Console.Write(FavoriteSubject(I).Name & ControlChars.Tab & ControlChars.Tab)
Console.WriteLine(RescaledCommunalities(I).ToString("G3"))
Next
Console.WriteLine()
Next we look at the variance explained by the initial solution
by printing out a table of these values.
The first column will just be the extracted factor number.
The second Totalcolumn gives the eigenvalue, or amount of
variance in the original variables accounted for by each factor.
Note that only the first two factors will be kept because their
value is greater than 1.2 times the mean of the eigenvalues.
The % of Variance column gives the ratio, expressed as a percentage,
of the variance accounted for by each factor to the total
variance in all of the variables.
The Cumulative % column gives the percentage of variance accounted
for by the first n factors. For example, the cumulative percentage
for the second factor is the sum of the percentage of variance
for the first and second factors.
Console.WriteLine("factor" & ControlChars.Tab & "Total" & ControlChars.Tab & "Variance" & ControlChars.Tab _
& "Cumulative")
Console.WriteLine("----------------------------------------------------")
For I = 0 To FA.VarianceProportions.Length - 1
Console.Write(I)
Console.Write(ControlChars.Tab & FA.FactorExtraction.Eigenvalues(I).ToString("G4") & ControlChars.Tab)
Console.Write(FA.VarianceProportions(I).ToString("P4") & ControlChars.Tab)
Console.WriteLine(FA.CumulativeVarianceProportions(I).ToString("P4"))
Next
Looks like we retain over 75% of the variance with just two factors.
Next we look at the the percentages of variance explained by the
extracted rotated factors. Comparing this table with the first
three rows of the previous one (three factors are extracted)
we see that the cumulative percentage of variation explained by the
extracted factors is maintained by the rotated factors,
but that variation is now spread more evenly over the factors,
but not by a lot. Maybe we could skip rotation, or try a
different rotation type.
Dim EigenValueSum As Double = NMathFunctions.Sum(FA.FactorExtraction.Eigenvalues)
Dim RotatedSSLoadingsVarianceProportions As DoubleVector = FA.RotatedSumOfSquaredLoadings / EigenValueSum
Console.WriteLine()
Console.WriteLine("Rotated Extraction Sums of Squared Loadings...")
Console.WriteLine()
Console.WriteLine("Factor" & ControlChars.Tab & "Total" & ControlChars.Tab & "Variance" & ControlChars.Tab & "Cumulative")
Console.WriteLine("----------------------------------------------------")
Dim Cumulative As Double = 0
For I = 0 To FA.NumberOfFactors - 1
Cumulative = Cumulative + RotatedSSLoadingsVarianceProportions(I)
Console.Write(I)
Console.Write(ControlChars.Tab & FA.RotatedSumOfSquaredLoadings(I).ToString("G4"))
Console.Write(ControlChars.Tab & RotatedSSLoadingsVarianceProportions(I).ToString("P4"))
Console.WriteLine(ControlChars.Tab & Cumulative.ToString("P4"))
Next
Console.WriteLine()
The rotated factor matrix helps you to determine what the factors represent.
Dim RotatedComponentMatrix As DoubleMatrix = FA.RotatedFactors
Console.WriteLine("Rotated Factor Matrix")
Console.WriteLine()
Console.WriteLine("Predictor" & ControlChars.Tab & "Factor")
Console.WriteLine("" & ControlChars.Tab & ControlChars.Tab & "1" & ControlChars.Tab & "2")
Console.WriteLine("-------------------------------------")
For I = 0 To FavoriteSubject.Cols - 1
Console.Write(FavoriteSubject(I).Name & ControlChars.Tab & ControlChars.Tab)
If FavoriteSubject(I).Name.Length >= 8 Then
Console.Write(ControlChars.Tab)
End If
Console.Write(RotatedComponentMatrix(I, 0).ToString("G4") & ControlChars.Tab & ControlChars.Tab)
Console.WriteLine(RotatedComponentMatrix(I, 1).ToString("G4"))
Next
The first factor is most highly correlated with BIO, GEO, CHEM.
CHEM a better representative, however, because it is less correlated
with the other factor.
The second factor is most highly correlated ALG, CALC, and STAT.
Console.WriteLine()
Console.WriteLine("Press Enter Key")
Console.Read()
End Sub
End Module
End Namespace
← All NMath Code Examples