VB FA Advanced Example

← All NMath Core Code Examples

 

Imports System
Imports System.IO

Imports CenterSpace.NMath.Core

Namespace CenterSpace.NMath.Core.Examples.VisualBasic

  ' A .NET example in Visual Basic
  Module FactorAnalysisAdvancedExample

    Sub Main()

      ' NMath Stats provide classes for performing a factor analysis on a set of case data. 
      ' Case data should be provided to these classes in matrix form - the variable values
      ' in columns and each row representing a case. In this example we look at
      ' a hypothetical sample of 300 responses on 6 items from a survey of college students' 
      ' favorite subject matter. The items range in value from 1 to 5, which represent a scale
      ' from Strongly Dislike to Strongly Like. Our 6 items asked students to rate their liking
      ' of different college subject matter areas, including biology (BIO), geology (GEO), 
      ' chemistry (CHEM), algebra (ALG), calculus (CALC), and statistics (STAT). 

      ' First load the data, which is in a comma delimited form.
      Dim FavoriteSubject As DataFrame = DataFrame.Load("advanced_factor_analysis.csv", True, False, ", ", True).CleanRows()

      ' NMath Stats provides three classes for
      ' performing factor analysis. All will perform analysis on the correlation matrix
      ' or the covariance matrix of case data. In addition each of these classes has
      ' two class parameters, on specifying the algorithm used to extract the factors,
      ' and the other specifying a factor rotation method. Here we use the class
      ' FactorAnalysisCovariance, which analyzes the covariance matrix of the case data,
      ' with principal factors extraction and varimax rotation. 
      ' The other two factor analysis classes are FactorAnalysisCorrelation, for analyzing
      ' the correlation matrix, and DoubleFactorAnalysis which can be used if you don't
      ' have access to the original case data, just the correlation or covariance matrix
      ' (DoubleFactorAnalysis is a base class for FactorAnalysisCorrelation and
      ' FactorAnalysisCovariance).

      ' Construct the factor analysis object we use for our analysis. Here we
      ' first construct instance of the factor extraction and rotation classes
      ' and use them in the factor analysis object construction. This gives 
      ' us control of the parameters affecting these algorithms.

      ' Construct a principal components factor extraction object specifying the 
      ' function object for determining the number of factors to extract. The
      ' type of this argument is Func<DoubleVector, DoubleMatrix, int>, it 
      ' takes as arguments the vector of eigenvalues and the matrix of eigenvectors
      ' and returns the number of factors to extract. The class NumberOfFactors 
      ' contains static methods for creating functors for several common
      ' strategies. Here we extract factors whose eigenvalues are greater
      ' than 1.2 times the mean of the eigenvalues.
      Dim FactorExtraction As New PCFactorExtraction(NumberOfFactors.EigenvaluesGreaterThanMean(1.2))

      ' Next construct an instance of the rotation algorithm we want to use,
      ' which is the varimax algorithm. Here we specify convergence criteria
      ' be setting the tolerance to 1e-6. Iteration will stop when the relative 
      ' change in the sum of the singular values is less than this number.
      ' We also specify that we do NOT want Kaiser normalization to be performed.
      Dim FactorRotation As New VarimaxRotation
      FactorRotation.Tolerance = 0.000001
      FactorRotation.Normalize = False

      ' We now construct our factor analysis object. We provide the case data as a matrix (columns
      ' correspond to variables and rows correspond to cases), the bias type - variances will be
      ' computed as biased, and our extraction and rotation objects.
      Dim FA As New FactorAnalysisCovariance(Of PCFactorExtraction, VarimaxRotation)(FavoriteSubject.ToDoubleMatrix(), _
                                                                                     BiasType.Biased, FactorExtraction, FactorRotation)

      Console.WriteLine()
      Console.WriteLine("Number of factors extracted:  " & FA.NumberOfFactors)
      ' Looks like we will retain two factors.

      ' Extracted communalities are estimates of the proportion of variance in each variable
      ' accounted for by the factors. 
      Dim ExtractedCommunalities As DoubleVector = FA.ExtractedCommunalities
      Console.WriteLine()
      Console.WriteLine("Predictor" & ControlChars.Tab & "Extracted Communality")
      Console.WriteLine("-------------------------------------")

      For I As Integer = 0 To FavoriteSubject.Cols - 1
        Console.Write(FavoriteSubject(I).Name & ControlChars.Tab & ControlChars.Tab)
        Console.WriteLine(ExtractedCommunalities(I).ToString("G3"))
      Next

      Console.WriteLine()

      ' We can get a little better picture of the communalities by looking at their
      ' rescaled values. The FactorAnalysisCovariance class provides many 'rescaled'
      ' results for calculations involving the extracted factors. In the rescaled
      ' version the factors are first rescaled by dividing by the standard deviations
      ' of the case variables before being used in the calculation.
      '
      ' The rescaled communalities have their values are between 0 and 1. Most of the values
      ' are close to 1, except for STAT. Maybe we should extract another factor?
      Dim RescaledCommunalities As DoubleVector = FA.RescaledExtractedCommunalities
      Console.WriteLine("Predictor" & ControlChars.Tab & "Rescaled Communality")
      Console.WriteLine("-------------------------------------")

      For I = 0 To FavoriteSubject.Cols - 1
        Console.Write(FavoriteSubject(I).Name & ControlChars.Tab & ControlChars.Tab)
        Console.WriteLine(RescaledCommunalities(I).ToString("G3"))
      Next
      Console.WriteLine()

      ' Next we look at the variance explained by the initial solution 
      ' by printing out a table of these values. 
      ' The first column will just be the extracted factor number. 
      '
      ' The second 'Total' column gives the eigenvalue, or amount of 
      ' variance in the original variables accounted for by each factor.
      ' Note that only the first two factors will be kept because their
      ' value is greater than 1.2 times the mean of the eigenvalues. 
      '
      ' The % of Variance column gives the ratio, expressed as a percentage, 
      ' of the variance accounted for by each factor to the total 
      ' variance in all of the variables.
      '
      ' The Cumulative % column gives the percentage of variance accounted 
      ' for by the first n factors. For example, the cumulative percentage
      ' for the second factor is the sum of the percentage of variance
      ' for the first and second factors.
      Console.WriteLine("factor" & ControlChars.Tab & "Total" & ControlChars.Tab & "Variance" & ControlChars.Tab _
                        & "Cumulative")
      Console.WriteLine("----------------------------------------------------")

      For I = 0 To FA.VarianceProportions.Length - 1
        Console.Write(I)
        Console.Write(ControlChars.Tab & FA.FactorExtraction.Eigenvalues(I).ToString("G4") & ControlChars.Tab)
        Console.Write(FA.VarianceProportions(I).ToString("P4") & ControlChars.Tab)
        Console.WriteLine(FA.CumulativeVarianceProportions(I).ToString("P4"))
      Next

      ' Looks like we retain over 75% of the variance with just two factors.

      ' Next we look at the the percentages of variance explained by the
      ' extracted rotated factors. Comparing this table with the first
      ' three rows of the previous one (three factors are extracted)
      ' we see that the cumulative percentage of variation explained by the
      ' extracted factors is maintained by the rotated factors, 
      ' but that variation is now spread more evenly over the factors,
      ' but not by a lot. Maybe we could skip rotation, or try a 
      ' different rotation type.
      Dim EigenValueSum As Double = NMathFunctions.Sum(FA.FactorExtraction.Eigenvalues)
      Dim RotatedSSLoadingsVarianceProportions As DoubleVector = FA.RotatedSumOfSquaredLoadings / EigenValueSum
      Console.WriteLine()
      Console.WriteLine("Rotated Extraction Sums of Squared Loadings...")
      Console.WriteLine()
      Console.WriteLine("Factor" & ControlChars.Tab & "Total" & ControlChars.Tab & "Variance" & ControlChars.Tab & "Cumulative")
      Console.WriteLine("----------------------------------------------------")

      Dim Cumulative As Double = 0
      For I = 0 To FA.NumberOfFactors - 1
        Cumulative = Cumulative + RotatedSSLoadingsVarianceProportions(I)
        Console.Write(I)
        Console.Write(ControlChars.Tab & FA.RotatedSumOfSquaredLoadings(I).ToString("G4"))
        Console.Write(ControlChars.Tab & RotatedSSLoadingsVarianceProportions(I).ToString("P4"))
        Console.WriteLine(ControlChars.Tab & Cumulative.ToString("P4"))
      Next

      Console.WriteLine()

      ' The rotated factor matrix helps you to determine what the factors represent.
      Dim RotatedComponentMatrix As DoubleMatrix = FA.RotatedFactors
      Console.WriteLine("Rotated Factor Matrix")
      Console.WriteLine()
      Console.WriteLine("Predictor" & ControlChars.Tab & "Factor")
      Console.WriteLine("" & ControlChars.Tab & ControlChars.Tab & "1" & ControlChars.Tab & "2")
      Console.WriteLine("-------------------------------------")

      For I = 0 To FavoriteSubject.Cols - 1
        Console.Write(FavoriteSubject(I).Name & ControlChars.Tab & ControlChars.Tab)
        If FavoriteSubject(I).Name.Length >= 8 Then
          Console.Write(ControlChars.Tab)
        End If
        Console.Write(RotatedComponentMatrix(I, 0).ToString("G4") & ControlChars.Tab & ControlChars.Tab)
        Console.WriteLine(RotatedComponentMatrix(I, 1).ToString("G4"))
      Next

      ' The first factor is most highly correlated with BIO, GEO, CHEM.
      ' CHEM a better representative, however, because it is less correlated
      ' with the other factor.
      '
      ' The second factor is most highly correlated ALG, CALC, and STAT.

      Console.WriteLine()
      Console.WriteLine("Press Enter Key")
      Console.Read()
    End Sub

  End Module

End Namespace
← All NMath Stats Code Examples
Top