Imports System Imports System.IO Imports CenterSpace.NMath.Core Namespace CenterSpace.NMath.Examples.VisualBasic A .NET example in Visual Basic Module FactorAnalysisAdvancedExample Sub Main() NMath Stats provide classes for performing a factor analysis on a set of case data. Case data should be provided to these classes in matrix form - the variable values in columns and each row representing a case. In this example we look at a hypothetical sample of 300 responses on 6 items from a survey of college students favorite subject matter. The items range in value from 1 to 5, which represent a scale from Strongly Dislike to Strongly Like. Our 6 items asked students to rate their liking of different college subject matter areas, including biology (BIO), geology (GEO), chemistry (CHEM), algebra (ALG), calculus (CALC), and statistics (STAT). First load the data, which is in a comma delimited form. Dim FavoriteSubject As DataFrame = DataFrame.Load("advanced_factor_analysis.csv", True, False, ", ", True).CleanRows() NMath Stats provides three classes for performing factor analysis. All will perform analysis on the correlation matrix or the covariance matrix of case data. In addition each of these classes has two class parameters, on specifying the algorithm used to extract the factors, and the other specifying a factor rotation method. Here we use the class FactorAnalysisCovariance, which analyzes the covariance matrix of the case data, with principal factors extraction and varimax rotation. The other two factor analysis classes are FactorAnalysisCorrelation, for analyzing the correlation matrix, and DoubleFactorAnalysis which can be used if you dont have access to the original case data, just the correlation or covariance matrix (DoubleFactorAnalysis is a base class for FactorAnalysisCorrelation and FactorAnalysisCovariance). Construct the factor analysis object we use for our analysis. Here we first construct instance of the factor extraction and rotation classes and use them in the factor analysis object construction. This gives us control of the parameters affecting these algorithms. Construct a principal components factor extraction object specifying the function object for determining the number of factors to extract. The type of this argument is Func<DoubleVector, DoubleMatrix, int>, it takes as arguments the vector of eigenvalues and the matrix of eigenvectors and returns the number of factors to extract. The class NumberOfFactors contains static methods for creating functors for several common strategies. Here we extract factors whose eigenvalues are greater than 1.2 times the mean of the eigenvalues. Dim FactorExtraction As New PCFactorExtraction(NumberOfFactors.EigenvaluesGreaterThanMean(1.2)) Next construct an instance of the rotation algorithm we want to use, which is the varimax algorithm. Here we specify convergence criteria be setting the tolerance to 1e-6. Iteration will stop when the relative change in the sum of the singular values is less than this number. We also specify that we do NOT want Kaiser normalization to be performed. Dim FactorRotation As New VarimaxRotation FactorRotation.Tolerance = 0.000001 FactorRotation.Normalize = False We now construct our factor analysis object. We provide the case data as a matrix (columns correspond to variables and rows correspond to cases), the bias type - variances will be computed as biased, and our extraction and rotation objects. Dim FA As New FactorAnalysisCovariance(Of PCFactorExtraction, VarimaxRotation)(FavoriteSubject.ToDoubleMatrix(), _ BiasType.Biased, FactorExtraction, FactorRotation) Console.WriteLine() Console.WriteLine("Number of factors extracted: " & FA.NumberOfFactors) Looks like we will retain two factors. Extracted communalities are estimates of the proportion of variance in each variable accounted for by the factors. Dim ExtractedCommunalities As DoubleVector = FA.ExtractedCommunalities Console.WriteLine() Console.WriteLine("Predictor" & ControlChars.Tab & "Extracted Communality") Console.WriteLine("-------------------------------------") For I As Integer = 0 To FavoriteSubject.Cols - 1 Console.Write(FavoriteSubject(I).Name & ControlChars.Tab & ControlChars.Tab) Console.WriteLine(ExtractedCommunalities(I).ToString("G3")) Next Console.WriteLine() We can get a little better picture of the communalities by looking at their rescaled values. The FactorAnalysisCovariance class provides many rescaled results for calculations involving the extracted factors. In the rescaled version the factors are first rescaled by dividing by the standard deviations of the case variables before being used in the calculation. The rescaled communalities have their values are between 0 and 1. Most of the values are close to 1, except for STAT. Maybe we should extract another factor? Dim RescaledCommunalities As DoubleVector = FA.RescaledExtractedCommunalities Console.WriteLine("Predictor" & ControlChars.Tab & "Rescaled Communality") Console.WriteLine("-------------------------------------") For I = 0 To FavoriteSubject.Cols - 1 Console.Write(FavoriteSubject(I).Name & ControlChars.Tab & ControlChars.Tab) Console.WriteLine(RescaledCommunalities(I).ToString("G3")) Next Console.WriteLine() Next we look at the variance explained by the initial solution by printing out a table of these values. The first column will just be the extracted factor number. The second Totalcolumn gives the eigenvalue, or amount of variance in the original variables accounted for by each factor. Note that only the first two factors will be kept because their value is greater than 1.2 times the mean of the eigenvalues. The % of Variance column gives the ratio, expressed as a percentage, of the variance accounted for by each factor to the total variance in all of the variables. The Cumulative % column gives the percentage of variance accounted for by the first n factors. For example, the cumulative percentage for the second factor is the sum of the percentage of variance for the first and second factors. Console.WriteLine("factor" & ControlChars.Tab & "Total" & ControlChars.Tab & "Variance" & ControlChars.Tab _ & "Cumulative") Console.WriteLine("----------------------------------------------------") For I = 0 To FA.VarianceProportions.Length - 1 Console.Write(I) Console.Write(ControlChars.Tab & FA.FactorExtraction.Eigenvalues(I).ToString("G4") & ControlChars.Tab) Console.Write(FA.VarianceProportions(I).ToString("P4") & ControlChars.Tab) Console.WriteLine(FA.CumulativeVarianceProportions(I).ToString("P4")) Next Looks like we retain over 75% of the variance with just two factors. Next we look at the the percentages of variance explained by the extracted rotated factors. Comparing this table with the first three rows of the previous one (three factors are extracted) we see that the cumulative percentage of variation explained by the extracted factors is maintained by the rotated factors, but that variation is now spread more evenly over the factors, but not by a lot. Maybe we could skip rotation, or try a different rotation type. Dim EigenValueSum As Double = NMathFunctions.Sum(FA.FactorExtraction.Eigenvalues) Dim RotatedSSLoadingsVarianceProportions As DoubleVector = FA.RotatedSumOfSquaredLoadings / EigenValueSum Console.WriteLine() Console.WriteLine("Rotated Extraction Sums of Squared Loadings...") Console.WriteLine() Console.WriteLine("Factor" & ControlChars.Tab & "Total" & ControlChars.Tab & "Variance" & ControlChars.Tab & "Cumulative") Console.WriteLine("----------------------------------------------------") Dim Cumulative As Double = 0 For I = 0 To FA.NumberOfFactors - 1 Cumulative = Cumulative + RotatedSSLoadingsVarianceProportions(I) Console.Write(I) Console.Write(ControlChars.Tab & FA.RotatedSumOfSquaredLoadings(I).ToString("G4")) Console.Write(ControlChars.Tab & RotatedSSLoadingsVarianceProportions(I).ToString("P4")) Console.WriteLine(ControlChars.Tab & Cumulative.ToString("P4")) Next Console.WriteLine() The rotated factor matrix helps you to determine what the factors represent. Dim RotatedComponentMatrix As DoubleMatrix = FA.RotatedFactors Console.WriteLine("Rotated Factor Matrix") Console.WriteLine() Console.WriteLine("Predictor" & ControlChars.Tab & "Factor") Console.WriteLine("" & ControlChars.Tab & ControlChars.Tab & "1" & ControlChars.Tab & "2") Console.WriteLine("-------------------------------------") For I = 0 To FavoriteSubject.Cols - 1 Console.Write(FavoriteSubject(I).Name & ControlChars.Tab & ControlChars.Tab) If FavoriteSubject(I).Name.Length >= 8 Then Console.Write(ControlChars.Tab) End If Console.Write(RotatedComponentMatrix(I, 0).ToString("G4") & ControlChars.Tab & ControlChars.Tab) Console.WriteLine(RotatedComponentMatrix(I, 1).ToString("G4")) Next The first factor is most highly correlated with BIO, GEO, CHEM. CHEM a better representative, however, because it is less correlated with the other factor. The second factor is most highly correlated ALG, CALC, and STAT. Console.WriteLine() Console.WriteLine("Press Enter Key") Console.Read() End Sub End Module End Namespace← All NMath Code Examples