VB FA Advanced Example

← All NMath Code Examples

 

Imports System
Imports System.IO

Imports CenterSpace.NMath.Core


Namespace CenterSpace.NMath.Examples.VisualBasic

  A .NET example in Visual Basic
  Module FactorAnalysisAdvancedExample

    Sub Main()

      NMath Stats provide classes for performing a factor analysis on a set of case data. 
      Case data should be provided to these classes in matrix form - the variable values
      in columns and each row representing a case. In this example we look at
      a hypothetical sample of 300 responses on 6 items from a survey of college students
      favorite subject matter. The items range in value from 1 to 5, which represent a scale
      from Strongly Dislike to Strongly Like. Our 6 items asked students to rate their liking
      of different college subject matter areas, including biology (BIO), geology (GEO), 
      chemistry (CHEM), algebra (ALG), calculus (CALC), and statistics (STAT). 

      First load the data, which is in a comma delimited form.
      Dim FavoriteSubject As DataFrame = DataFrame.Load("advanced_factor_analysis.csv", True, False, ", ", True).CleanRows()

      NMath Stats provides three classes for
      performing factor analysis. All will perform analysis on the correlation matrix
      or the covariance matrix of case data. In addition each of these classes has
      two class parameters, on specifying the algorithm used to extract the factors,
      and the other specifying a factor rotation method. Here we use the class
      FactorAnalysisCovariance, which analyzes the covariance matrix of the case data,
      with principal factors extraction and varimax rotation. 
      The other two factor analysis classes are FactorAnalysisCorrelation, for analyzing
      the correlation matrix, and DoubleFactorAnalysis which can be used if you dont
      have access to the original case data, just the correlation or covariance matrix
      (DoubleFactorAnalysis is a base class for FactorAnalysisCorrelation and
      FactorAnalysisCovariance).

      Construct the factor analysis object we use for our analysis. Here we
      first construct instance of the factor extraction and rotation classes
      and use them in the factor analysis object construction. This gives 
      us control of the parameters affecting these algorithms.

      Construct a principal components factor extraction object specifying the 
      function object for determining the number of factors to extract. The
      type of this argument is Func<DoubleVector, DoubleMatrix, int>, it 
      takes as arguments the vector of eigenvalues and the matrix of eigenvectors
      and returns the number of factors to extract. The class NumberOfFactors 
      contains static methods for creating functors for several common
      strategies. Here we extract factors whose eigenvalues are greater
      than 1.2 times the mean of the eigenvalues.
      Dim FactorExtraction As New PCFactorExtraction(NumberOfFactors.EigenvaluesGreaterThanMean(1.2))

      Next construct an instance of the rotation algorithm we want to use,
      which is the varimax algorithm. Here we specify convergence criteria
      be setting the tolerance to 1e-6. Iteration will stop when the relative 
      change in the sum of the singular values is less than this number.
      We also specify that we do NOT want Kaiser normalization to be performed.
      Dim FactorRotation As New VarimaxRotation
      FactorRotation.Tolerance = 0.000001
      FactorRotation.Normalize = False

      We now construct our factor analysis object. We provide the case data as a matrix (columns
      correspond to variables and rows correspond to cases), the bias type - variances will be
      computed as biased, and our extraction and rotation objects.
      Dim FA As New FactorAnalysisCovariance(Of PCFactorExtraction, VarimaxRotation)(FavoriteSubject.ToDoubleMatrix(), _
                                                                                     BiasType.Biased, FactorExtraction, FactorRotation)

      Console.WriteLine()
      Console.WriteLine("Number of factors extracted:  " & FA.NumberOfFactors)
      Looks like we will retain two factors.

      Extracted communalities are estimates of the proportion of variance in each variable
      accounted for by the factors. 
      Dim ExtractedCommunalities As DoubleVector = FA.ExtractedCommunalities
      Console.WriteLine()
      Console.WriteLine("Predictor" & ControlChars.Tab & "Extracted Communality")
      Console.WriteLine("-------------------------------------")

      For I As Integer = 0 To FavoriteSubject.Cols - 1
        Console.Write(FavoriteSubject(I).Name & ControlChars.Tab & ControlChars.Tab)
        Console.WriteLine(ExtractedCommunalities(I).ToString("G3"))
      Next

      Console.WriteLine()

      We can get a little better picture of the communalities by looking at their
      rescaled values. The FactorAnalysisCovariance class provides many rescaled
      results for calculations involving the extracted factors. In the rescaled
      version the factors are first rescaled by dividing by the standard deviations
      of the case variables before being used in the calculation.
      
      The rescaled communalities have their values are between 0 and 1. Most of the values
      are close to 1, except for STAT. Maybe we should extract another factor?
      Dim RescaledCommunalities As DoubleVector = FA.RescaledExtractedCommunalities
      Console.WriteLine("Predictor" & ControlChars.Tab & "Rescaled Communality")
      Console.WriteLine("-------------------------------------")

      For I = 0 To FavoriteSubject.Cols - 1
        Console.Write(FavoriteSubject(I).Name & ControlChars.Tab & ControlChars.Tab)
        Console.WriteLine(RescaledCommunalities(I).ToString("G3"))
      Next
      Console.WriteLine()

      Next we look at the variance explained by the initial solution 
      by printing out a table of these values. 
      The first column will just be the extracted factor number. 
      
      The second Totalcolumn gives the eigenvalue, or amount of 
      variance in the original variables accounted for by each factor.
      Note that only the first two factors will be kept because their
      value is greater than 1.2 times the mean of the eigenvalues. 
      
      The % of Variance column gives the ratio, expressed as a percentage, 
      of the variance accounted for by each factor to the total 
      variance in all of the variables.
      
      The Cumulative % column gives the percentage of variance accounted 
      for by the first n factors. For example, the cumulative percentage
      for the second factor is the sum of the percentage of variance
      for the first and second factors.
      Console.WriteLine("factor" & ControlChars.Tab & "Total" & ControlChars.Tab & "Variance" & ControlChars.Tab _
                        & "Cumulative")
      Console.WriteLine("----------------------------------------------------")

      For I = 0 To FA.VarianceProportions.Length - 1
        Console.Write(I)
        Console.Write(ControlChars.Tab & FA.FactorExtraction.Eigenvalues(I).ToString("G4") & ControlChars.Tab)
        Console.Write(FA.VarianceProportions(I).ToString("P4") & ControlChars.Tab)
        Console.WriteLine(FA.CumulativeVarianceProportions(I).ToString("P4"))
      Next

      Looks like we retain over 75% of the variance with just two factors.

      Next we look at the the percentages of variance explained by the
      extracted rotated factors. Comparing this table with the first
      three rows of the previous one (three factors are extracted)
      we see that the cumulative percentage of variation explained by the
      extracted factors is maintained by the rotated factors, 
      but that variation is now spread more evenly over the factors,
      but not by a lot. Maybe we could skip rotation, or try a 
      different rotation type.
      Dim EigenValueSum As Double = NMathFunctions.Sum(FA.FactorExtraction.Eigenvalues)
      Dim RotatedSSLoadingsVarianceProportions As DoubleVector = FA.RotatedSumOfSquaredLoadings / EigenValueSum
      Console.WriteLine()
      Console.WriteLine("Rotated Extraction Sums of Squared Loadings...")
      Console.WriteLine()
      Console.WriteLine("Factor" & ControlChars.Tab & "Total" & ControlChars.Tab & "Variance" & ControlChars.Tab & "Cumulative")
      Console.WriteLine("----------------------------------------------------")

      Dim Cumulative As Double = 0
      For I = 0 To FA.NumberOfFactors - 1
        Cumulative = Cumulative + RotatedSSLoadingsVarianceProportions(I)
        Console.Write(I)
        Console.Write(ControlChars.Tab & FA.RotatedSumOfSquaredLoadings(I).ToString("G4"))
        Console.Write(ControlChars.Tab & RotatedSSLoadingsVarianceProportions(I).ToString("P4"))
        Console.WriteLine(ControlChars.Tab & Cumulative.ToString("P4"))
      Next

      Console.WriteLine()

      The rotated factor matrix helps you to determine what the factors represent.
      Dim RotatedComponentMatrix As DoubleMatrix = FA.RotatedFactors
      Console.WriteLine("Rotated Factor Matrix")
      Console.WriteLine()
      Console.WriteLine("Predictor" & ControlChars.Tab & "Factor")
      Console.WriteLine("" & ControlChars.Tab & ControlChars.Tab & "1" & ControlChars.Tab & "2")
      Console.WriteLine("-------------------------------------")

      For I = 0 To FavoriteSubject.Cols - 1
        Console.Write(FavoriteSubject(I).Name & ControlChars.Tab & ControlChars.Tab)
        If FavoriteSubject(I).Name.Length >= 8 Then
          Console.Write(ControlChars.Tab)
        End If
        Console.Write(RotatedComponentMatrix(I, 0).ToString("G4") & ControlChars.Tab & ControlChars.Tab)
        Console.WriteLine(RotatedComponentMatrix(I, 1).ToString("G4"))
      Next

      The first factor is most highly correlated with BIO, GEO, CHEM.
      CHEM a better representative, however, because it is less correlated
      with the other factor.
      
      The second factor is most highly correlated ALG, CALC, and STAT.

      Console.WriteLine()
      Console.WriteLine("Press Enter Key")
      Console.Read()
    End Sub

  End Module

End Namespace
← All NMath Code Examples
Top