Imports System Imports System.IO Imports CenterSpace.NMath.Core Namespace CenterSpace.NMath.Examples.VisualBasic A .NET example in Visual Basic Module FACorrelationExample Sub Main() NMath Stats provide classes for performing a factor analysis on a set of case data. Case data should be provided to these classes in matrix form - the variable values in columns and each row representing a case. In this example we look at some data for car sales. For each sale ten variable values are recorded: type - sedan, mini-van, etc, recorded as integers 0, 1,... price - the price in thousands, engine_s - engine size, horsepow - horsepower, wheelbas - wheelbase, width - car width, length - car length, curb_wgt - weight, fuel_cap - fuel capacity, mpg - fuel miles per gallon. We would like to predict car sales from this set of predictors. However, many of the predictors are correlated, and we fear that this might adversely affect our results. So we use factor analysis to focus on a manageable set of predictors. First load the data, which is in a comma delimited form. Dim CarSalesData As DataFrame = DataFrame.Load("car_sales.csv", True, False, ",", True).CleanRows() NMath Stats provides three classes for performing factor analysis. All will perform analysis on the correlation matrix or the covariance matrix of case data. In addition each of these classes has two class parameters, one specifying the algorithm used to extract the factors, and the other specifying a factor rotation method. Here we use the class FactorAnalysisCorrelation, which analyzes the correlation matrix, with principal factors extraction and varimax rotation. The other two factor analysis classes are FactorAnalysisCovariance, for analyzing the covariance matrix, and DoubleFactorAnalysis which can be used if you dont have access to the original case data, just the correlation or covariance matrix (DoubleFactorAnalysis is a base class for FactorAnalysisCorrelation and FactorAnalysisCovariance). Construct the factor analysis object we use for our analysis. The simplest constructor takes only the case data as an argument. Other constructors take instances of the class parameter classes Extraction and Rotation, allowing you to pre-configure the options on these objects. When not specified, instances Extraction and Rotation classes will be created with their no-argument constructors for use in the analysis. Dim FA As New FactorAnalysisCorrelation(Of PCFactorExtraction, VarimaxRotation)(CarSalesData.ToDoubleMatrix()) Console.WriteLine() First, look at the extracted communalities. Extracted communalities are estimates of the proportion of variance in each variable accounted for by the factors. Dim ExtractedCommunalities As DoubleVector = FA.ExtractedCommunalities Console.WriteLine("Predictor" & ControlChars.Tab & "Extracted Communality") Console.WriteLine("-------------------------------------") For I As Integer = 0 To CarSalesData.Cols - 1 Console.Write(CarSalesData(I).Name & ControlChars.Tab) If CarSalesData(I).Name.Length < 8 Then Console.Write(ControlChars.Tab) End If Console.WriteLine(ExtractedCommunalities(I).ToString("G3")) Next The communalities are all high (close to 1.0) indicating that the extracted factors represent the variables well. Console.WriteLine() Next we look at the variance explained by the initial solution by printing out a table of these values. The first column will just be the extracted factor number. The second Totalcolumn gives the eigenvalue, or amount of variance in the original variables accounted for by each factor. Since by default factors with eigenvalues greater than one will be extracted, the first three factors will be extracted. The % of Variance column gives the ratio, expressed as a percentage, of the variance accounted for by each factor to the total variance in all of the variables. The Cumulative % column gives the percentage of variance accounted for by the first n factors. For example, the cumulative percentage for the second factor is the sum of the percentage of variance for the first and second factors. Console.WriteLine("Factor" & ControlChars.Tab & "Total" & ControlChars.Tab & "Variance" _ & ControlChars.Tab & "Cumulative") Console.WriteLine("----------------------------------------_-") For I = 0 To FA.VarianceProportions.Length - 1 Console.Write(I & ControlChars.Tab) Console.Write(FA.FactorExtraction.Eigenvalues(I).ToString("G4") & ControlChars.Tab) Console.Write(FA.VarianceProportions(I).ToString("P4") & ControlChars.Tab) Console.WriteLine(FA.CumulativeVarianceProportions(I).ToString("P4")) Next We can see from this table that the first three factors account for nearly 88% of the total variance. Next we look at the the percentages of variance explained by the extracted rotated factors. Comparing this table with the first three rows of the previous one (three factors are extracted) we see that the cumulative percentage of variation explained by the extracted factors is maintained by the rotated factors, but that variation is now spread more evenly over the factors. This suggests that that the rotated factor matrix will be easier to interpret than the unrotated matrix. Dim EigenValueSum As Double = NMathFunctions.Sum(FA.FactorExtraction.Eigenvalues) Dim RotatedSSLoadingsVarianceProportions As DoubleVector = FA.RotatedSumOfSquaredLoadings / EigenValueSum Console.WriteLine() Console.WriteLine("Rotated Extraction Sums of Squared Loadings") Console.WriteLine() Console.WriteLine("Factor" & ControlChars.Tab & "Total" & ControlChars.Tab & "Variance" & ControlChars.Tab & "Cumulative") Console.WriteLine("------------------------------------------") Dim Cumulative As Double = 0 For I = 0 To FA.NumberOfFactors - 1 Cumulative += RotatedSSLoadingsVarianceProportions(I) Console.Write(I) Console.Write(ControlChars.Tab) Console.Write(FA.RotatedSumOfSquaredLoadings(I).ToString("G3")) Console.Write(ControlChars.Tab) Console.Write(RotatedSSLoadingsVarianceProportions(I).ToString("P3")) Console.Write(ControlChars.Tab) Console.WriteLine(Cumulative.ToString("P3")) Next Console.WriteLine() The rotated factor matrix helps you to determine what the factors represent. Dim RotatedComponentMatrix As DoubleMatrix = FA.RotatedFactors Console.WriteLine(Environment.NewLine & "Rotated Factor Matrix") Console.WriteLine() Console.WriteLine("Predictor" & ControlChars.Tab & "Factor") Console.WriteLine(ControlChars.Tab & ControlChars.Tab & "1" & ControlChars.Tab & "2" & ControlChars.Tab & "3") Console.WriteLine("-------------------------------------") For I = 0 To CarSalesData.Cols - 1 Console.Write(CarSalesData(I).Name & ControlChars.Tab) If CarSalesData(I).Name.Length < 8 Then Console.Write(ControlChars.Tab) End If Console.Write(RotatedComponentMatrix(I, 0).ToString("F3") & ControlChars.Tab) Console.Write(RotatedComponentMatrix(I, 1).ToString("F3") & ControlChars.Tab) Console.WriteLine(RotatedComponentMatrix(I, 2).ToString("F3")) Next Console.WriteLine() The first factor is most highly correlated with price (in thousands) and horsepow (horsepower). Price in thousands is a better representative, however, because it is less correlated with the other two factors. The second factor is most highly correlated with Length. The third factor is most highly correlated with vehicle type. This suggests that you can focus on price, length, and type in further analyses. To do so, however, would ignore any input the other variables might contribute to the analysis. It is therefore preferable to use the three new factors as our new variables. They are representative of all ten original variables and are not linearly correlated with one another. The case data values for new factor variables are contained in the factor scores matrix. There are different algorithms for producing the factors scores. The FactorScores function can be passed an object implementing the IFactorScores interface, thus specifying the algorithm to be used. If no argument is passed to the FactorScores method, the regression algorithm for computing factor scores will be used. The method is implemented in the class RegressionFactorScores. Print out the factor scores for the first three cases. Console.WriteLine() Console.WriteLine("Factor scores for the first three cases (normalized)") Console.WriteLine("----------------------------------------------------") Dim RowSlice As New Slice(0, 3) Console.WriteLine(FA.FactorScores()(RowSlice, Slice.All).ToTabDelimited("G3")) Factor scores are a linear combination of the ten original variable values. The coefficients used for the linear combination are found in the factor score coefficients matrix. This matrix may be obtained from the FactorScoreCoefficients method on the factor analysis class. Like factor scores, the algorithm for their computation may be specified by passing an object implementing the IFactorScores interface to this method. If no method is passed, scores coefficients will be computed Imports the regression algorithm implemented in the class RegressionFactorScores. Suppose we receive two new cases containing values for the ten car sales predictor variables. We can compute the values, or scores, for our three new factor variables by multiplying by the factor score coefficients: Dim ScoreCoefficients As DoubleMatrix = FA.FactorScoreCoefficients() Dim NewCaseData As New DoubleMatrix("2x10 [0.0 38.9 3.8 196.0 115.4 71.9 177.0 3.972 17.5 27.8 1.0 46.0 2.5 220.0 101.6 73.4 168.6 3.75 19.0 20.0]") Dim Scores As DoubleMatrix = NMathFunctions.Product(NewCaseData, ScoreCoefficients) Console.WriteLine("Scores for new case data") Console.WriteLine("------------------------") Console.WriteLine(Scores.ToTabDelimited("G3")) Console.WriteLine() Console.WriteLine("Press Enter Key") Console.Read() End Sub End Module End Namespace← All NMath Code Examples