VB FA Correlation Example

← All NMath Stats Code Examples

 

Imports System
Imports System.IO

Imports CenterSpace.NMath.Core
Imports CenterSpace.NMath.Stats

Namespace CenterSpace.NMath.Stats.Examples.VisualBasic

  ' A .NET example in Visual Basic
  Module FACorrelationExample

    Sub Main()

      ' NMath Stats provide classes for performing a factor analysis on a set of case data. 
      ' Case data should be provided to these classes in matrix form - the variable values
      ' in columns and each row representing a case. In this example we look at some data
      ' for car sales. For each sale ten variable values are recorded:
      '
      ' type - sedan, mini-van, etc, recorded as integers 0, 1,...
      ' price - the price in thousands,
      ' engine_s - engine size,
      ' horsepow - horsepower,
      ' wheelbas - wheelbase,
      ' width - car width,
      ' length - car length,
      ' curb_wgt - weight,
      ' fuel_cap - fuel capacity,
      ' mpg - fuel miles per gallon.
      '
      ' We would like to predict car sales from this set of predictors. However, many 
      ' of the predictors are correlated, and we fear that this might adversely affect
      ' our results. So we use factor analysis to focus on a manageable set of 
      ' predictors.

      ' First load the data, which is in a comma delimited form.
      Dim CarSalesData As DataFrame = DataFrame.Load("car_sales.csv", True, False, ",", True).CleanRows()

      ' NMath Stats provides three classes for
      ' performing factor analysis. All will perform analysis on the correlation matrix
      ' or the covariance matrix of case data. In addition each of these classes has
      ' two class parameters, one specifying the algorithm used to extract the factors,
      ' and the other specifying a factor rotation method. Here we use the class
      ' FactorAnalysisCorrelation, which analyzes the correlation matrix, with 
      ' principal factors extraction and varimax rotation.
      ' The other two factor analysis classes are FactorAnalysisCovariance, for analyzing
      ' the covariance matrix, and DoubleFactorAnalysis which can be used if you don't
      ' have access to the original case data, just the correlation or covariance matrix
      ' (DoubleFactorAnalysis is a base class for FactorAnalysisCorrelation and
      ' FactorAnalysisCovariance).

      ' Construct the factor analysis object we use for our analysis. The simplest
      ' constructor takes only the case data as an argument. Other constructors take
      ' instances of the class parameter classes Extraction and Rotation, 
      ' allowing you to pre-configure the options on these objects. When not specified,
      ' instances Extraction and Rotation classes will be created with their no-argument
      ' constructors for use in the analysis.
      Dim FA As New FactorAnalysisCorrelation(Of PCFactorExtraction, VarimaxRotation)(CarSalesData.ToDoubleMatrix())

      Console.WriteLine()

      ' First, look at the extracted communalities. 
      ' Extracted communalities are estimates of the proportion of variance in each variable
      ' accounted for by the factors. 
      Dim ExtractedCommunalities As DoubleVector = FA.ExtractedCommunalities
      Console.WriteLine("Predictor" & ControlChars.Tab & "Extracted Communality")
      Console.WriteLine("-------------------------------------")

      For I As Integer = 0 To CarSalesData.Cols - 1
        Console.Write(CarSalesData(I).Name & ControlChars.Tab)
        If CarSalesData(I).Name.Length < 8 Then
          Console.Write(ControlChars.Tab)
        End If
        Console.WriteLine(ExtractedCommunalities(I).ToString("G3"))
      Next

      ' The communalities are all high (close to 1.0) indicating that the extracted 
      ' factors represent the variables well.

      Console.WriteLine()

      ' Next we look at the variance explained by the initial solution 
      ' by printing out a table of these values. 
      ' The first column will just be the extracted factor number. 
      '
      ' The second 'Total' column gives the eigenvalue, or amount of 
      ' variance in the original variables accounted for by each factor.
      ' Since by default factors with eigenvalues greater than one will be
      ' extracted, the first three factors will be extracted.
      '
      ' The % of Variance column gives the ratio, expressed as a percentage, 
      ' of the variance accounted for by each factor to the total 
      ' variance in all of the variables.
      '
      ' The Cumulative % column gives the percentage of variance accounted 
      ' for by the first n factors. For example, the cumulative percentage
      ' for the second factor is the sum of the percentage of variance
      ' for the first and second factors.
      Console.WriteLine("Factor" & ControlChars.Tab & "Total" & ControlChars.Tab & "Variance" _
                        & ControlChars.Tab & "Cumulative")
      Console.WriteLine("----------------------------------------_-")

      For I = 0 To FA.VarianceProportions.Length - 1
        Console.Write(I & ControlChars.Tab)
        Console.Write(FA.FactorExtraction.Eigenvalues(I).ToString("G4") & ControlChars.Tab)
        Console.Write(FA.VarianceProportions(I).ToString("P4") & ControlChars.Tab)
        Console.WriteLine(FA.CumulativeVarianceProportions(I).ToString("P4"))
      Next

      ' We can see from this table that the first three factors account for nearly
      ' 88% of the total variance.

      ' Next we look at the the percentages of variance explained by the
      ' extracted rotated factors. Comparing this table with the first
      ' three rows of the previous one (three factors are extracted)
      ' we see that the cumulative percentage of variation explained by the
      ' extracted factors is maintained by the rotated factors, 
      ' but that variation is now spread more evenly over the factors.
      ' This suggests that that the rotated factor matrix will be 
      ' easier to interpret than the unrotated matrix.
      Dim EigenValueSum As Double = NMathFunctions.Sum(FA.FactorExtraction.Eigenvalues)
      Dim RotatedSSLoadingsVarianceProportions As DoubleVector = FA.RotatedSumOfSquaredLoadings / EigenValueSum
      Console.WriteLine()
      Console.WriteLine("Rotated Extraction Sums of Squared Loadings")
      Console.WriteLine()
      Console.WriteLine("Factor" & ControlChars.Tab & "Total" & ControlChars.Tab & "Variance" & ControlChars.Tab & "Cumulative")
      Console.WriteLine("------------------------------------------")
      Dim Cumulative As Double = 0

      For I = 0 To FA.NumberOfFactors - 1
        Cumulative += RotatedSSLoadingsVarianceProportions(I)
        Console.Write(I)
        Console.Write(ControlChars.Tab)
        Console.Write(FA.RotatedSumOfSquaredLoadings(I).ToString("G3"))
        Console.Write(ControlChars.Tab)
        Console.Write(RotatedSSLoadingsVarianceProportions(I).ToString("P3"))
        Console.Write(ControlChars.Tab)
        Console.WriteLine(Cumulative.ToString("P3"))
      Next

      Console.WriteLine()

      ' The rotated factor matrix helps you to determine what the factors represent.
      Dim RotatedComponentMatrix As DoubleMatrix = FA.RotatedFactors
      Console.WriteLine(Environment.NewLine & "Rotated Factor Matrix")
      Console.WriteLine()
      Console.WriteLine("Predictor" & ControlChars.Tab & "Factor")
      Console.WriteLine(ControlChars.Tab & ControlChars.Tab & "1" & ControlChars.Tab & "2" & ControlChars.Tab & "3")
      Console.WriteLine("-------------------------------------")

      For I = 0 To CarSalesData.Cols - 1
        Console.Write(CarSalesData(I).Name & ControlChars.Tab)
        If CarSalesData(I).Name.Length < 8 Then
          Console.Write(ControlChars.Tab)
        End If
        Console.Write(RotatedComponentMatrix(I, 0).ToString("F3") & ControlChars.Tab)
        Console.Write(RotatedComponentMatrix(I, 1).ToString("F3") & ControlChars.Tab)
        Console.WriteLine(RotatedComponentMatrix(I, 2).ToString("F3"))
      Next
      Console.WriteLine()

      ' The first factor is most highly correlated with price (in thousands) 
      ' and horsepow (horsepower). Price in thousands is a better representative, 
      ' however, because it is less correlated with the other two factors.
      '
      ' The second factor is most highly correlated with Length.
      '
      ' The third factor is most highly correlated with vehicle type.
      ' This suggests that you can focus on price, length, 
      ' and type in further analyses. To do so, however, would ignore
      ' any input the other variables might contribute to the analysis.
      ' It is therefore preferable to use the three new factors as
      ' our new variables. They are representative of all ten original
      ' variables and are not linearly correlated with one another.
      ' The case data values for new factor variables are contained in the factor
      ' scores matrix. There are different algorithms for producing the factors
      ' scores. The FactorScores function can be passed an object implementing
      ' the IFactorScores interface, thus specifying the algorithm to be used.
      ' If no argument is passed to the FactorScores method, the regression
      ' algorithm for computing factor scores will be used. The method is 
      ' implemented in the class RegressionFactorScores.

      ' Print out the factor scores for the first three cases.
      Console.WriteLine()
      Console.WriteLine("Factor scores for the first three cases (normalized)")
      Console.WriteLine("----------------------------------------------------")
      Dim RowSlice As New Slice(0, 3)
      Console.WriteLine(FA.FactorScores()(RowSlice, Slice.All).ToTabDelimited("G3"))

      ' Factor scores are a linear combination of the ten original variable values. 
      ' The coefficients used for the linear combination are found in the 
      ' factor score coefficients matrix. This matrix may be obtained from the
      ' FactorScoreCoefficients method on the factor analysis class. Like factor
      ' scores, the algorithm for their computation may be specified by passing
      ' an object implementing the IFactorScores interface to this method. If
      ' no method is passed, scores coefficients will be computed Imports the 
      ' regression algorithm implemented in the class RegressionFactorScores.
      '
      ' Suppose we receive two new cases containing values for the ten car sales
      ' predictor variables. We can compute the values, or scores, for our three
      ' new factor variables by multiplying by the factor score coefficients:
      Dim ScoreCoefficients As DoubleMatrix = FA.FactorScoreCoefficients()
      Dim NewCaseData As New DoubleMatrix("2x10 [0.0 38.9 3.8 196.0 115.4 71.9 177.0 3.972 17.5 27.8 1.0 46.0 2.5 220.0 101.6 73.4 168.6 3.75  19.0 20.0]")
      Dim Scores As DoubleMatrix = NMathFunctions.Product(NewCaseData, ScoreCoefficients)
      Console.WriteLine("Scores for new case data")
      Console.WriteLine("------------------------")
      Console.WriteLine(Scores.ToTabDelimited("G3"))

      Console.WriteLine()
      Console.WriteLine("Press Enter Key")
      Console.Read()
    End Sub

  End Module

End Namespace
← All NMath Stats Code Examples
Top