ï»¿Imports System Imports System.IO Imports CenterSpace.NMath.Core Imports CenterSpace.NMath.Stats Namespace CenterSpace.NMath.Stats.Examples.VisualBasic ' A .NET example in Visual Basic Module FACorrelationExample Sub Main() ' NMath Stats provide classes for performing a factor analysis on a set of case data. ' Case data should be provided to these classes in matrix form - the variable values ' in columns and each row representing a case. In this example we look at some data ' for car sales. For each sale ten variable values are recorded: ' ' type - sedan, mini-van, etc, recorded as integers 0, 1,... ' price - the price in thousands, ' engine_s - engine size, ' horsepow - horsepower, ' wheelbas - wheelbase, ' width - car width, ' length - car length, ' curb_wgt - weight, ' fuel_cap - fuel capacity, ' mpg - fuel miles per gallon. ' ' We would like to predict car sales from this set of predictors. However, many ' of the predictors are correlated, and we fear that this might adversely affect ' our results. So we use factor analysis to focus on a manageable set of ' predictors. ' First load the data, which is in a comma delimited form. Dim CarSalesData As DataFrame = DataFrame.Load("car_sales.csv", True, False, ",", True).CleanRows() ' NMath Stats provides three classes for ' performing factor analysis. All will perform analysis on the correlation matrix ' or the covariance matrix of case data. In addition each of these classes has ' two class parameters, one specifying the algorithm used to extract the factors, ' and the other specifying a factor rotation method. Here we use the class ' FactorAnalysisCorrelation, which analyzes the correlation matrix, with ' principal factors extraction and varimax rotation. ' The other two factor analysis classes are FactorAnalysisCovariance, for analyzing ' the covariance matrix, and DoubleFactorAnalysis which can be used if you don't ' have access to the original case data, just the correlation or covariance matrix ' (DoubleFactorAnalysis is a base class for FactorAnalysisCorrelation and ' FactorAnalysisCovariance). ' Construct the factor analysis object we use for our analysis. The simplest ' constructor takes only the case data as an argument. Other constructors take ' instances of the class parameter classes Extraction and Rotation, ' allowing you to pre-configure the options on these objects. When not specified, ' instances Extraction and Rotation classes will be created with their no-argument ' constructors for use in the analysis. Dim FA As New FactorAnalysisCorrelation(Of PCFactorExtraction, VarimaxRotation)(CarSalesData.ToDoubleMatrix()) Console.WriteLine() ' First, look at the extracted communalities. ' Extracted communalities are estimates of the proportion of variance in each variable ' accounted for by the factors. Dim ExtractedCommunalities As DoubleVector = FA.ExtractedCommunalities Console.WriteLine("Predictor" & ControlChars.Tab & "Extracted Communality") Console.WriteLine("-------------------------------------") For I As Integer = 0 To CarSalesData.Cols - 1 Console.Write(CarSalesData(I).Name & ControlChars.Tab) If CarSalesData(I).Name.Length < 8 Then Console.Write(ControlChars.Tab) End If Console.WriteLine(ExtractedCommunalities(I).ToString("G3")) Next ' The communalities are all high (close to 1.0) indicating that the extracted ' factors represent the variables well. Console.WriteLine() ' Next we look at the variance explained by the initial solution ' by printing out a table of these values. ' The first column will just be the extracted factor number. ' ' The second 'Total' column gives the eigenvalue, or amount of ' variance in the original variables accounted for by each factor. ' Since by default factors with eigenvalues greater than one will be ' extracted, the first three factors will be extracted. ' ' The % of Variance column gives the ratio, expressed as a percentage, ' of the variance accounted for by each factor to the total ' variance in all of the variables. ' ' The Cumulative % column gives the percentage of variance accounted ' for by the first n factors. For example, the cumulative percentage ' for the second factor is the sum of the percentage of variance ' for the first and second factors. Console.WriteLine("Factor" & ControlChars.Tab & "Total" & ControlChars.Tab & "Variance" _ & ControlChars.Tab & "Cumulative") Console.WriteLine("----------------------------------------_-") For I = 0 To FA.VarianceProportions.Length - 1 Console.Write(I & ControlChars.Tab) Console.Write(FA.FactorExtraction.Eigenvalues(I).ToString("G4") & ControlChars.Tab) Console.Write(FA.VarianceProportions(I).ToString("P4") & ControlChars.Tab) Console.WriteLine(FA.CumulativeVarianceProportions(I).ToString("P4")) Next ' We can see from this table that the first three factors account for nearly ' 88% of the total variance. ' Next we look at the the percentages of variance explained by the ' extracted rotated factors. Comparing this table with the first ' three rows of the previous one (three factors are extracted) ' we see that the cumulative percentage of variation explained by the ' extracted factors is maintained by the rotated factors, ' but that variation is now spread more evenly over the factors. ' This suggests that that the rotated factor matrix will be ' easier to interpret than the unrotated matrix. Dim EigenValueSum As Double = NMathFunctions.Sum(FA.FactorExtraction.Eigenvalues) Dim RotatedSSLoadingsVarianceProportions As DoubleVector = FA.RotatedSumOfSquaredLoadings / EigenValueSum Console.WriteLine() Console.WriteLine("Rotated Extraction Sums of Squared Loadings") Console.WriteLine() Console.WriteLine("Factor" & ControlChars.Tab & "Total" & ControlChars.Tab & "Variance" & ControlChars.Tab & "Cumulative") Console.WriteLine("------------------------------------------") Dim Cumulative As Double = 0 For I = 0 To FA.NumberOfFactors - 1 Cumulative += RotatedSSLoadingsVarianceProportions(I) Console.Write(I) Console.Write(ControlChars.Tab) Console.Write(FA.RotatedSumOfSquaredLoadings(I).ToString("G3")) Console.Write(ControlChars.Tab) Console.Write(RotatedSSLoadingsVarianceProportions(I).ToString("P3")) Console.Write(ControlChars.Tab) Console.WriteLine(Cumulative.ToString("P3")) Next Console.WriteLine() ' The rotated factor matrix helps you to determine what the factors represent. Dim RotatedComponentMatrix As DoubleMatrix = FA.RotatedFactors Console.WriteLine(Environment.NewLine & "Rotated Factor Matrix") Console.WriteLine() Console.WriteLine("Predictor" & ControlChars.Tab & "Factor") Console.WriteLine(ControlChars.Tab & ControlChars.Tab & "1" & ControlChars.Tab & "2" & ControlChars.Tab & "3") Console.WriteLine("-------------------------------------") For I = 0 To CarSalesData.Cols - 1 Console.Write(CarSalesData(I).Name & ControlChars.Tab) If CarSalesData(I).Name.Length < 8 Then Console.Write(ControlChars.Tab) End If Console.Write(RotatedComponentMatrix(I, 0).ToString("F3") & ControlChars.Tab) Console.Write(RotatedComponentMatrix(I, 1).ToString("F3") & ControlChars.Tab) Console.WriteLine(RotatedComponentMatrix(I, 2).ToString("F3")) Next Console.WriteLine() ' The first factor is most highly correlated with price (in thousands) ' and horsepow (horsepower). Price in thousands is a better representative, ' however, because it is less correlated with the other two factors. ' ' The second factor is most highly correlated with Length. ' ' The third factor is most highly correlated with vehicle type. ' This suggests that you can focus on price, length, ' and type in further analyses. To do so, however, would ignore ' any input the other variables might contribute to the analysis. ' It is therefore preferable to use the three new factors as ' our new variables. They are representative of all ten original ' variables and are not linearly correlated with one another. ' The case data values for new factor variables are contained in the factor ' scores matrix. There are different algorithms for producing the factors ' scores. The FactorScores function can be passed an object implementing ' the IFactorScores interface, thus specifying the algorithm to be used. ' If no argument is passed to the FactorScores method, the regression ' algorithm for computing factor scores will be used. The method is ' implemented in the class RegressionFactorScores. ' Print out the factor scores for the first three cases. Console.WriteLine() Console.WriteLine("Factor scores for the first three cases (normalized)") Console.WriteLine("----------------------------------------------------") Dim RowSlice As New Slice(0, 3) Console.WriteLine(FA.FactorScores()(RowSlice, Slice.All).ToTabDelimited("G3")) ' Factor scores are a linear combination of the ten original variable values. ' The coefficients used for the linear combination are found in the ' factor score coefficients matrix. This matrix may be obtained from the ' FactorScoreCoefficients method on the factor analysis class. Like factor ' scores, the algorithm for their computation may be specified by passing ' an object implementing the IFactorScores interface to this method. If ' no method is passed, scores coefficients will be computed Imports the ' regression algorithm implemented in the class RegressionFactorScores. ' ' Suppose we receive two new cases containing values for the ten car sales ' predictor variables. We can compute the values, or scores, for our three ' new factor variables by multiplying by the factor score coefficients: Dim ScoreCoefficients As DoubleMatrix = FA.FactorScoreCoefficients() Dim NewCaseData As New DoubleMatrix("2x10 [0.0 38.9 3.8 196.0 115.4 71.9 177.0 3.972 17.5 27.8 1.0 46.0 2.5 220.0 101.6 73.4 168.6 3.75 19.0 20.0]") Dim Scores As DoubleMatrix = NMathFunctions.Product(NewCaseData, ScoreCoefficients) Console.WriteLine("Scores for new case data") Console.WriteLine("------------------------") Console.WriteLine(Scores.ToTabDelimited("G3")) Console.WriteLine() Console.WriteLine("Press Enter Key") Console.Read() End Sub End Module End Namespace← All NMath Stats Code Examples