# VB FA Correlation Example

← All NMath Stats Code Examples

```ï»¿Imports System
Imports System.IO

Imports CenterSpace.NMath.Core
Imports CenterSpace.NMath.Stats

Namespace CenterSpace.NMath.Stats.Examples.VisualBasic

' A .NET example in Visual Basic
Module FACorrelationExample

Sub Main()

' NMath Stats provide classes for performing a factor analysis on a set of case data.
' Case data should be provided to these classes in matrix form - the variable values
' in columns and each row representing a case. In this example we look at some data
' for car sales. For each sale ten variable values are recorded:
'
' type - sedan, mini-van, etc, recorded as integers 0, 1,...
' price - the price in thousands,
' engine_s - engine size,
' horsepow - horsepower,
' wheelbas - wheelbase,
' width - car width,
' length - car length,
' curb_wgt - weight,
' fuel_cap - fuel capacity,
' mpg - fuel miles per gallon.
'
' We would like to predict car sales from this set of predictors. However, many
' of the predictors are correlated, and we fear that this might adversely affect
' our results. So we use factor analysis to focus on a manageable set of
' predictors.

' First load the data, which is in a comma delimited form.
Dim CarSalesData As DataFrame = DataFrame.Load("car_sales.csv", True, False, ",", True).CleanRows()

' NMath Stats provides three classes for
' performing factor analysis. All will perform analysis on the correlation matrix
' or the covariance matrix of case data. In addition each of these classes has
' two class parameters, one specifying the algorithm used to extract the factors,
' and the other specifying a factor rotation method. Here we use the class
' FactorAnalysisCorrelation, which analyzes the correlation matrix, with
' principal factors extraction and varimax rotation.
' The other two factor analysis classes are FactorAnalysisCovariance, for analyzing
' the covariance matrix, and DoubleFactorAnalysis which can be used if you don't
' have access to the original case data, just the correlation or covariance matrix
' (DoubleFactorAnalysis is a base class for FactorAnalysisCorrelation and
' FactorAnalysisCovariance).

' Construct the factor analysis object we use for our analysis. The simplest
' constructor takes only the case data as an argument. Other constructors take
' instances of the class parameter classes Extraction and Rotation,
' allowing you to pre-configure the options on these objects. When not specified,
' instances Extraction and Rotation classes will be created with their no-argument
' constructors for use in the analysis.
Dim FA As New FactorAnalysisCorrelation(Of PCFactorExtraction, VarimaxRotation)(CarSalesData.ToDoubleMatrix())

Console.WriteLine()

' First, look at the extracted communalities.
' Extracted communalities are estimates of the proportion of variance in each variable
' accounted for by the factors.
Dim ExtractedCommunalities As DoubleVector = FA.ExtractedCommunalities
Console.WriteLine("Predictor" & ControlChars.Tab & "Extracted Communality")
Console.WriteLine("-------------------------------------")

For I As Integer = 0 To CarSalesData.Cols - 1
Console.Write(CarSalesData(I).Name & ControlChars.Tab)
If CarSalesData(I).Name.Length < 8 Then
Console.Write(ControlChars.Tab)
End If
Console.WriteLine(ExtractedCommunalities(I).ToString("G3"))
Next

' The communalities are all high (close to 1.0) indicating that the extracted
' factors represent the variables well.

Console.WriteLine()

' Next we look at the variance explained by the initial solution
' by printing out a table of these values.
' The first column will just be the extracted factor number.
'
' The second 'Total' column gives the eigenvalue, or amount of
' variance in the original variables accounted for by each factor.
' Since by default factors with eigenvalues greater than one will be
' extracted, the first three factors will be extracted.
'
' The % of Variance column gives the ratio, expressed as a percentage,
' of the variance accounted for by each factor to the total
' variance in all of the variables.
'
' The Cumulative % column gives the percentage of variance accounted
' for by the first n factors. For example, the cumulative percentage
' for the second factor is the sum of the percentage of variance
' for the first and second factors.
Console.WriteLine("Factor" & ControlChars.Tab & "Total" & ControlChars.Tab & "Variance" _
& ControlChars.Tab & "Cumulative")
Console.WriteLine("----------------------------------------_-")

For I = 0 To FA.VarianceProportions.Length - 1
Console.Write(I & ControlChars.Tab)
Console.Write(FA.FactorExtraction.Eigenvalues(I).ToString("G4") & ControlChars.Tab)
Console.Write(FA.VarianceProportions(I).ToString("P4") & ControlChars.Tab)
Console.WriteLine(FA.CumulativeVarianceProportions(I).ToString("P4"))
Next

' We can see from this table that the first three factors account for nearly
' 88% of the total variance.

' Next we look at the the percentages of variance explained by the
' extracted rotated factors. Comparing this table with the first
' three rows of the previous one (three factors are extracted)
' we see that the cumulative percentage of variation explained by the
' extracted factors is maintained by the rotated factors,
' but that variation is now spread more evenly over the factors.
' This suggests that that the rotated factor matrix will be
' easier to interpret than the unrotated matrix.
Dim EigenValueSum As Double = NMathFunctions.Sum(FA.FactorExtraction.Eigenvalues)
Console.WriteLine()
Console.WriteLine()
Console.WriteLine("Factor" & ControlChars.Tab & "Total" & ControlChars.Tab & "Variance" & ControlChars.Tab & "Cumulative")
Console.WriteLine("------------------------------------------")
Dim Cumulative As Double = 0

For I = 0 To FA.NumberOfFactors - 1
Console.Write(I)
Console.Write(ControlChars.Tab)
Console.Write(ControlChars.Tab)
Console.Write(ControlChars.Tab)
Console.WriteLine(Cumulative.ToString("P3"))
Next

Console.WriteLine()

' The rotated factor matrix helps you to determine what the factors represent.
Dim RotatedComponentMatrix As DoubleMatrix = FA.RotatedFactors
Console.WriteLine(Environment.NewLine & "Rotated Factor Matrix")
Console.WriteLine()
Console.WriteLine("Predictor" & ControlChars.Tab & "Factor")
Console.WriteLine(ControlChars.Tab & ControlChars.Tab & "1" & ControlChars.Tab & "2" & ControlChars.Tab & "3")
Console.WriteLine("-------------------------------------")

For I = 0 To CarSalesData.Cols - 1
Console.Write(CarSalesData(I).Name & ControlChars.Tab)
If CarSalesData(I).Name.Length < 8 Then
Console.Write(ControlChars.Tab)
End If
Console.Write(RotatedComponentMatrix(I, 0).ToString("F3") & ControlChars.Tab)
Console.Write(RotatedComponentMatrix(I, 1).ToString("F3") & ControlChars.Tab)
Console.WriteLine(RotatedComponentMatrix(I, 2).ToString("F3"))
Next
Console.WriteLine()

' The first factor is most highly correlated with price (in thousands)
' and horsepow (horsepower). Price in thousands is a better representative,
' however, because it is less correlated with the other two factors.
'
' The second factor is most highly correlated with Length.
'
' The third factor is most highly correlated with vehicle type.
' This suggests that you can focus on price, length,
' and type in further analyses. To do so, however, would ignore
' any input the other variables might contribute to the analysis.
' It is therefore preferable to use the three new factors as
' our new variables. They are representative of all ten original
' variables and are not linearly correlated with one another.
' The case data values for new factor variables are contained in the factor
' scores matrix. There are different algorithms for producing the factors
' scores. The FactorScores function can be passed an object implementing
' the IFactorScores interface, thus specifying the algorithm to be used.
' If no argument is passed to the FactorScores method, the regression
' algorithm for computing factor scores will be used. The method is
' implemented in the class RegressionFactorScores.

' Print out the factor scores for the first three cases.
Console.WriteLine()
Console.WriteLine("Factor scores for the first three cases (normalized)")
Console.WriteLine("----------------------------------------------------")
Dim RowSlice As New Slice(0, 3)
Console.WriteLine(FA.FactorScores()(RowSlice, Slice.All).ToTabDelimited("G3"))

' Factor scores are a linear combination of the ten original variable values.
' The coefficients used for the linear combination are found in the
' factor score coefficients matrix. This matrix may be obtained from the
' FactorScoreCoefficients method on the factor analysis class. Like factor
' scores, the algorithm for their computation may be specified by passing
' an object implementing the IFactorScores interface to this method. If
' no method is passed, scores coefficients will be computed Imports the
' regression algorithm implemented in the class RegressionFactorScores.
'
' Suppose we receive two new cases containing values for the ten car sales
' predictor variables. We can compute the values, or scores, for our three
' new factor variables by multiplying by the factor score coefficients:
Dim ScoreCoefficients As DoubleMatrix = FA.FactorScoreCoefficients()
Dim NewCaseData As New DoubleMatrix("2x10 [0.0 38.9 3.8 196.0 115.4 71.9 177.0 3.972 17.5 27.8 1.0 46.0 2.5 220.0 101.6 73.4 168.6 3.75  19.0 20.0]")
Dim Scores As DoubleMatrix = NMathFunctions.Product(NewCaseData, ScoreCoefficients)
Console.WriteLine("Scores for new case data")
Console.WriteLine("------------------------")
Console.WriteLine(Scores.ToTabDelimited("G3"))

Console.WriteLine()
Console.WriteLine("Press Enter Key")
End Sub

End Module

End Namespace
```
← All NMath Stats Code Examples
Top