ï»¿Imports System Imports System.IO Imports CenterSpace.NMath.Core Imports CenterSpace.NMath.Stats Namespace CenterSpace.NMath.Stats.Examples.VisualBasic ' A .NET example in Visual Basic Module FactorAnalysisAdvancedExample Sub Main() ' NMath Stats provide classes for performing a factor analysis on a set of case data. ' Case data should be provided to these classes in matrix form - the variable values ' in columns and each row representing a case. In this example we look at ' a hypothetical sample of 300 responses on 6 items from a survey of college students' ' favorite subject matter. The items range in value from 1 to 5, which represent a scale ' from Strongly Dislike to Strongly Like. Our 6 items asked students to rate their liking ' of different college subject matter areas, including biology (BIO), geology (GEO), ' chemistry (CHEM), algebra (ALG), calculus (CALC), and statistics (STAT). ' First load the data, which is in a comma delimited form. Dim FavoriteSubject As DataFrame = DataFrame.Load("advanced_factor_analysis.csv", True, False, ", ", True).CleanRows() ' NMath Stats provides three classes for ' performing factor analysis. All will perform analysis on the correlation matrix ' or the covariance matrix of case data. In addition each of these classes has ' two class parameters, on specifying the algorithm used to extract the factors, ' and the other specifying a factor rotation method. Here we use the class ' FactorAnalysisCovariance, which analyzes the covariance matrix of the case data, ' with principal factors extraction and varimax rotation. ' The other two factor analysis classes are FactorAnalysisCorrelation, for analyzing ' the correlation matrix, and DoubleFactorAnalysis which can be used if you don't ' have access to the original case data, just the correlation or covariance matrix ' (DoubleFactorAnalysis is a base class for FactorAnalysisCorrelation and ' FactorAnalysisCovariance). ' Construct the factor analysis object we use for our analysis. Here we ' first construct instance of the factor extraction and rotation classes ' and use them in the factor analysis object construction. This gives ' us control of the parameters affecting these algorithms. ' Construct a principal components factor extraction object specifying the ' function object for determining the number of factors to extract. The ' type of this argument is Func<DoubleVector, DoubleMatrix, int>, it ' takes as arguments the vector of eigenvalues and the matrix of eigenvectors ' and returns the number of factors to extract. The class NumberOfFactors ' contains static methods for creating functors for several common ' strategies. Here we extract factors whose eigenvalues are greater ' than 1.2 times the mean of the eigenvalues. Dim FactorExtraction As New PCFactorExtraction(NumberOfFactors.EigenvaluesGreaterThanMean(1.2)) ' Next construct an instance of the rotation algorithm we want to use, ' which is the varimax algorithm. Here we specify convergence criteria ' be setting the tolerance to 1e-6. Iteration will stop when the relative ' change in the sum of the singular values is less than this number. ' We also specify that we do NOT want Kaiser normalization to be performed. Dim FactorRotation As New VarimaxRotation FactorRotation.Tolerance = 0.000001 FactorRotation.Normalize = False ' We now construct our factor analysis object. We provide the case data as a matrix (columns ' correspond to variables and rows correspond to cases), the bias type - variances will be ' computed as biased, and our extraction and rotation objects. Dim FA As New FactorAnalysisCovariance(Of PCFactorExtraction, VarimaxRotation)(FavoriteSubject.ToDoubleMatrix(), _ BiasType.Biased, FactorExtraction, FactorRotation) Console.WriteLine() Console.WriteLine("Number of factors extracted: " & FA.NumberOfFactors) ' Looks like we will retain two factors. ' Extracted communalities are estimates of the proportion of variance in each variable ' accounted for by the factors. Dim ExtractedCommunalities As DoubleVector = FA.ExtractedCommunalities Console.WriteLine() Console.WriteLine("Predictor" & ControlChars.Tab & "Extracted Communality") Console.WriteLine("-------------------------------------") For I As Integer = 0 To FavoriteSubject.Cols - 1 Console.Write(FavoriteSubject(I).Name & ControlChars.Tab & ControlChars.Tab) Console.WriteLine(ExtractedCommunalities(I).ToString("G3")) Next Console.WriteLine() ' We can get a little better picture of the communalities by looking at their ' rescaled values. The FactorAnalysisCovariance class provides many 'rescaled' ' results for calculations involving the extracted factors. In the rescaled ' version the factors are first rescaled by dividing by the standard deviations ' of the case variables before being used in the calculation. ' ' The rescaled communalities have their values are between 0 and 1. Most of the values ' are close to 1, except for STAT. Maybe we should extract another factor? Dim RescaledCommunalities As DoubleVector = FA.RescaledExtractedCommunalities Console.WriteLine("Predictor" & ControlChars.Tab & "Rescaled Communality") Console.WriteLine("-------------------------------------") For I = 0 To FavoriteSubject.Cols - 1 Console.Write(FavoriteSubject(I).Name & ControlChars.Tab & ControlChars.Tab) Console.WriteLine(RescaledCommunalities(I).ToString("G3")) Next Console.WriteLine() ' Next we look at the variance explained by the initial solution ' by printing out a table of these values. ' The first column will just be the extracted factor number. ' ' The second 'Total' column gives the eigenvalue, or amount of ' variance in the original variables accounted for by each factor. ' Note that only the first two factors will be kept because their ' value is greater than 1.2 times the mean of the eigenvalues. ' ' The % of Variance column gives the ratio, expressed as a percentage, ' of the variance accounted for by each factor to the total ' variance in all of the variables. ' ' The Cumulative % column gives the percentage of variance accounted ' for by the first n factors. For example, the cumulative percentage ' for the second factor is the sum of the percentage of variance ' for the first and second factors. Console.WriteLine("factor" & ControlChars.Tab & "Total" & ControlChars.Tab & "Variance" & ControlChars.Tab _ & "Cumulative") Console.WriteLine("----------------------------------------------------") For I = 0 To FA.VarianceProportions.Length - 1 Console.Write(I) Console.Write(ControlChars.Tab & FA.FactorExtraction.Eigenvalues(I).ToString("G4") & ControlChars.Tab) Console.Write(FA.VarianceProportions(I).ToString("P4") & ControlChars.Tab) Console.WriteLine(FA.CumulativeVarianceProportions(I).ToString("P4")) Next ' Looks like we retain over 75% of the variance with just two factors. ' Next we look at the the percentages of variance explained by the ' extracted rotated factors. Comparing this table with the first ' three rows of the previous one (three factors are extracted) ' we see that the cumulative percentage of variation explained by the ' extracted factors is maintained by the rotated factors, ' but that variation is now spread more evenly over the factors, ' but not by a lot. Maybe we could skip rotation, or try a ' different rotation type. Dim EigenValueSum As Double = NMathFunctions.Sum(FA.FactorExtraction.Eigenvalues) Dim RotatedSSLoadingsVarianceProportions As DoubleVector = FA.RotatedSumOfSquaredLoadings / EigenValueSum Console.WriteLine() Console.WriteLine("Rotated Extraction Sums of Squared Loadings...") Console.WriteLine() Console.WriteLine("Factor" & ControlChars.Tab & "Total" & ControlChars.Tab & "Variance" & ControlChars.Tab & "Cumulative") Console.WriteLine("----------------------------------------------------") Dim Cumulative As Double = 0 For I = 0 To FA.NumberOfFactors - 1 Cumulative = Cumulative + RotatedSSLoadingsVarianceProportions(I) Console.Write(I) Console.Write(ControlChars.Tab & FA.RotatedSumOfSquaredLoadings(I).ToString("G4")) Console.Write(ControlChars.Tab & RotatedSSLoadingsVarianceProportions(I).ToString("P4")) Console.WriteLine(ControlChars.Tab & Cumulative.ToString("P4")) Next Console.WriteLine() ' The rotated factor matrix helps you to determine what the factors represent. Dim RotatedComponentMatrix As DoubleMatrix = FA.RotatedFactors Console.WriteLine("Rotated Factor Matrix") Console.WriteLine() Console.WriteLine("Predictor" & ControlChars.Tab & "Factor") Console.WriteLine("" & ControlChars.Tab & ControlChars.Tab & "1" & ControlChars.Tab & "2") Console.WriteLine("-------------------------------------") For I = 0 To FavoriteSubject.Cols - 1 Console.Write(FavoriteSubject(I).Name & ControlChars.Tab & ControlChars.Tab) If FavoriteSubject(I).Name.Length >= 8 Then Console.Write(ControlChars.Tab) End If Console.Write(RotatedComponentMatrix(I, 0).ToString("G4") & ControlChars.Tab & ControlChars.Tab) Console.WriteLine(RotatedComponentMatrix(I, 1).ToString("G4")) Next ' The first factor is most highly correlated with BIO, GEO, CHEM. ' CHEM a better representative, however, because it is less correlated ' with the other factor. ' ' The second factor is most highly correlated ALG, CALC, and STAT. Console.WriteLine() Console.WriteLine("Press Enter Key") Console.Read() End Sub End Module End Namespace← All NMath Stats Code Examples