VB FA Correlation Example

← All NMath Code Examples

 

Imports System
Imports System.IO

Imports CenterSpace.NMath.Core


Namespace CenterSpace.NMath.Examples.VisualBasic

  A .NET example in Visual Basic
  Module FACorrelationExample

    Sub Main()

      NMath Stats provide classes for performing a factor analysis on a set of case data. 
      Case data should be provided to these classes in matrix form - the variable values
      in columns and each row representing a case. In this example we look at some data
      for car sales. For each sale ten variable values are recorded:
      
      type - sedan, mini-van, etc, recorded as integers 0, 1,...
      price - the price in thousands,
      engine_s - engine size,
      horsepow - horsepower,
      wheelbas - wheelbase,
      width - car width,
      length - car length,
      curb_wgt - weight,
      fuel_cap - fuel capacity,
      mpg - fuel miles per gallon.
      
      We would like to predict car sales from this set of predictors. However, many 
      of the predictors are correlated, and we fear that this might adversely affect
      our results. So we use factor analysis to focus on a manageable set of 
      predictors.

      First load the data, which is in a comma delimited form.
      Dim CarSalesData As DataFrame = DataFrame.Load("car_sales.csv", True, False, ",", True).CleanRows()

      NMath Stats provides three classes for
      performing factor analysis. All will perform analysis on the correlation matrix
      or the covariance matrix of case data. In addition each of these classes has
      two class parameters, one specifying the algorithm used to extract the factors,
      and the other specifying a factor rotation method. Here we use the class
      FactorAnalysisCorrelation, which analyzes the correlation matrix, with 
      principal factors extraction and varimax rotation.
      The other two factor analysis classes are FactorAnalysisCovariance, for analyzing
      the covariance matrix, and DoubleFactorAnalysis which can be used if you dont
      have access to the original case data, just the correlation or covariance matrix
      (DoubleFactorAnalysis is a base class for FactorAnalysisCorrelation and
      FactorAnalysisCovariance).

      Construct the factor analysis object we use for our analysis. The simplest
      constructor takes only the case data as an argument. Other constructors take
      instances of the class parameter classes Extraction and Rotation, 
      allowing you to pre-configure the options on these objects. When not specified,
      instances Extraction and Rotation classes will be created with their no-argument
      constructors for use in the analysis.
      Dim FA As New FactorAnalysisCorrelation(Of PCFactorExtraction, VarimaxRotation)(CarSalesData.ToDoubleMatrix())

      Console.WriteLine()

      First, look at the extracted communalities. 
      Extracted communalities are estimates of the proportion of variance in each variable
      accounted for by the factors. 
      Dim ExtractedCommunalities As DoubleVector = FA.ExtractedCommunalities
      Console.WriteLine("Predictor" & ControlChars.Tab & "Extracted Communality")
      Console.WriteLine("-------------------------------------")

      For I As Integer = 0 To CarSalesData.Cols - 1
        Console.Write(CarSalesData(I).Name & ControlChars.Tab)
        If CarSalesData(I).Name.Length < 8 Then
          Console.Write(ControlChars.Tab)
        End If
        Console.WriteLine(ExtractedCommunalities(I).ToString("G3"))
      Next

      The communalities are all high (close to 1.0) indicating that the extracted 
      factors represent the variables well.

      Console.WriteLine()

      Next we look at the variance explained by the initial solution 
      by printing out a table of these values. 
      The first column will just be the extracted factor number. 
      
      The second Totalcolumn gives the eigenvalue, or amount of 
      variance in the original variables accounted for by each factor.
      Since by default factors with eigenvalues greater than one will be
      extracted, the first three factors will be extracted.
      
      The % of Variance column gives the ratio, expressed as a percentage, 
      of the variance accounted for by each factor to the total 
      variance in all of the variables.
      
      The Cumulative % column gives the percentage of variance accounted 
      for by the first n factors. For example, the cumulative percentage
      for the second factor is the sum of the percentage of variance
      for the first and second factors.
      Console.WriteLine("Factor" & ControlChars.Tab & "Total" & ControlChars.Tab & "Variance" _
                        & ControlChars.Tab & "Cumulative")
      Console.WriteLine("----------------------------------------_-")

      For I = 0 To FA.VarianceProportions.Length - 1
        Console.Write(I & ControlChars.Tab)
        Console.Write(FA.FactorExtraction.Eigenvalues(I).ToString("G4") & ControlChars.Tab)
        Console.Write(FA.VarianceProportions(I).ToString("P4") & ControlChars.Tab)
        Console.WriteLine(FA.CumulativeVarianceProportions(I).ToString("P4"))
      Next

      We can see from this table that the first three factors account for nearly
      88% of the total variance.

      Next we look at the the percentages of variance explained by the
      extracted rotated factors. Comparing this table with the first
      three rows of the previous one (three factors are extracted)
      we see that the cumulative percentage of variation explained by the
      extracted factors is maintained by the rotated factors, 
      but that variation is now spread more evenly over the factors.
      This suggests that that the rotated factor matrix will be 
      easier to interpret than the unrotated matrix.
      Dim EigenValueSum As Double = NMathFunctions.Sum(FA.FactorExtraction.Eigenvalues)
      Dim RotatedSSLoadingsVarianceProportions As DoubleVector = FA.RotatedSumOfSquaredLoadings / EigenValueSum
      Console.WriteLine()
      Console.WriteLine("Rotated Extraction Sums of Squared Loadings")
      Console.WriteLine()
      Console.WriteLine("Factor" & ControlChars.Tab & "Total" & ControlChars.Tab & "Variance" & ControlChars.Tab & "Cumulative")
      Console.WriteLine("------------------------------------------")
      Dim Cumulative As Double = 0

      For I = 0 To FA.NumberOfFactors - 1
        Cumulative += RotatedSSLoadingsVarianceProportions(I)
        Console.Write(I)
        Console.Write(ControlChars.Tab)
        Console.Write(FA.RotatedSumOfSquaredLoadings(I).ToString("G3"))
        Console.Write(ControlChars.Tab)
        Console.Write(RotatedSSLoadingsVarianceProportions(I).ToString("P3"))
        Console.Write(ControlChars.Tab)
        Console.WriteLine(Cumulative.ToString("P3"))
      Next

      Console.WriteLine()

      The rotated factor matrix helps you to determine what the factors represent.
      Dim RotatedComponentMatrix As DoubleMatrix = FA.RotatedFactors
      Console.WriteLine(Environment.NewLine & "Rotated Factor Matrix")
      Console.WriteLine()
      Console.WriteLine("Predictor" & ControlChars.Tab & "Factor")
      Console.WriteLine(ControlChars.Tab & ControlChars.Tab & "1" & ControlChars.Tab & "2" & ControlChars.Tab & "3")
      Console.WriteLine("-------------------------------------")

      For I = 0 To CarSalesData.Cols - 1
        Console.Write(CarSalesData(I).Name & ControlChars.Tab)
        If CarSalesData(I).Name.Length < 8 Then
          Console.Write(ControlChars.Tab)
        End If
        Console.Write(RotatedComponentMatrix(I, 0).ToString("F3") & ControlChars.Tab)
        Console.Write(RotatedComponentMatrix(I, 1).ToString("F3") & ControlChars.Tab)
        Console.WriteLine(RotatedComponentMatrix(I, 2).ToString("F3"))
      Next
      Console.WriteLine()

      The first factor is most highly correlated with price (in thousands) 
      and horsepow (horsepower). Price in thousands is a better representative, 
      however, because it is less correlated with the other two factors.
      
      The second factor is most highly correlated with Length.
      
      The third factor is most highly correlated with vehicle type.
      This suggests that you can focus on price, length, 
      and type in further analyses. To do so, however, would ignore
      any input the other variables might contribute to the analysis.
      It is therefore preferable to use the three new factors as
      our new variables. They are representative of all ten original
      variables and are not linearly correlated with one another.
      The case data values for new factor variables are contained in the factor
      scores matrix. There are different algorithms for producing the factors
      scores. The FactorScores function can be passed an object implementing
      the IFactorScores interface, thus specifying the algorithm to be used.
      If no argument is passed to the FactorScores method, the regression
      algorithm for computing factor scores will be used. The method is 
      implemented in the class RegressionFactorScores.

      Print out the factor scores for the first three cases.
      Console.WriteLine()
      Console.WriteLine("Factor scores for the first three cases (normalized)")
      Console.WriteLine("----------------------------------------------------")
      Dim RowSlice As New Slice(0, 3)
      Console.WriteLine(FA.FactorScores()(RowSlice, Slice.All).ToTabDelimited("G3"))

      Factor scores are a linear combination of the ten original variable values. 
      The coefficients used for the linear combination are found in the 
      factor score coefficients matrix. This matrix may be obtained from the
      FactorScoreCoefficients method on the factor analysis class. Like factor
      scores, the algorithm for their computation may be specified by passing
      an object implementing the IFactorScores interface to this method. If
      no method is passed, scores coefficients will be computed Imports the 
      regression algorithm implemented in the class RegressionFactorScores.
      
      Suppose we receive two new cases containing values for the ten car sales
      predictor variables. We can compute the values, or scores, for our three
      new factor variables by multiplying by the factor score coefficients:
      Dim ScoreCoefficients As DoubleMatrix = FA.FactorScoreCoefficients()
      Dim NewCaseData As New DoubleMatrix("2x10 [0.0 38.9 3.8 196.0 115.4 71.9 177.0 3.972 17.5 27.8 1.0 46.0 2.5 220.0 101.6 73.4 168.6 3.75  19.0 20.0]")
      Dim Scores As DoubleMatrix = NMathFunctions.Product(NewCaseData, ScoreCoefficients)
      Console.WriteLine("Scores for new case data")
      Console.WriteLine("------------------------")
      Console.WriteLine(Scores.ToTabDelimited("G3"))

      Console.WriteLine()
      Console.WriteLine("Press Enter Key")
      Console.Read()
    End Sub

  End Module

End Namespace
← All NMath Code Examples
Top