NMath - 46.1 Principal Component Analysis (.NET, C#, CSharp, VB, Visual Basic, F#)

46.1 Principal Component Analysis (.NET, C#, CSharp, VB, Visual Basic, F#)

Principal component analysis (PCA) finds a smaller set of synthetic variables that capture the variance in an original data set. The first principal component accounts for as much of the variability in the data as possible, and each succeeding orthogonal component accounts for as much of the remaining variability as possible. In NMath Stats, classes DoublePCA and FloatPCA perform principal component analyses.

Creating Principal Component Analyses

A DoublePCA or FloatPCA instance is constructed from a matrix or a dataframe containing numeric data. Each column represents a variable, and each row represents an observation:

Code Example – C# principal component analysis (PCA)

var pca = new DoublePCA( data );

Code Example – VB principal component analysis (PCA)

Dim PCA As New DoublePCA(Data)

The data may optionally be zero-centered and scaled to have unit variance:

Code Example – C# principal component analysis (PCA)

bool center = true;

bool scale = true;

var pca = new DoublePCA( data, center, scale );

Code Example – VB principal component analysis (PCA)

Dim Center As Boolean = True

Dim Scale As Boolean = True

Dim PCA As New DoublePCA(Data, Center, Scale)

By default, variables are centered but not scaled.

After construction, you can retrieve information about the data set using the provided read-only properties:

● Data gets the data matrix. If centering or scaling were specified at construction time, the returned matrix may not match the original data.

● NumberOfObservations gets the number of observations in the data matrix.

● NumberOfVariables gets the number of variables in the data matrix.

● IsCentered returns true if the data supplied at construction time was shifted to be zero-centered.

● IsScaled returns true if the data supplied at construction time was scaled to have unit variance.

● Means gets the column means of the data matrix. If centering is specified, the column means are substracted from the column values before analysis takes place.

● Norms gets the column norms (1-norm). If scaling is specified, column values are scaled to have unit variance before analysis by dividing by the column norm.

Principal Component Analysis Results

The Loadings property gets the complete loading matrix. Each column in the loading matrix is a principal component. The first principal component accounts for as much of the variability in the data as possible, and each succeeding orthogonal component accounts for as much of the remaining variability as possible.

Code Example – C# principal component analysis (PCA)

Console.WriteLine( "Loading Martrix = " + pca.Loadings );

Code Example – VB principal component analysis (PCA)

Console.WriteLine("Loading Matrix = " & PCA.Loadings)

The provided indexer also gets a specified principal component, referenced by zero-based index. For example:

Code Example – C# principal component analysis (PCA)

Console.WriteLine( "First principal component = " + pca[0] );

Console.WriteLine( "Second principal component = " + pca[1] );

Code Example – VB principal component analysis (PCA)

Console.WriteLine("First principal component = " & PCA(0))

Console.WriteLine("Second principal component = " & PCA(1))

The VarianceProportions property gets an ordered vector containing the proportion of the total variance accounted for by each principal component. CumulativeVarianceProportions gets the cumulative variance proportions. Thus:

Code Example – C# principal component analysis (PCA)

Console.WriteLine( "Variance Proportions = " +

                   pca.VarianceProportions );

Console.WriteLine( "Cumulative Variance Proportions = " +

                   pca.CumulativeVarianceProportions );

Code Example – VB principal component analysis (PCA)

Console.WriteLine("Variance Proportions = " &

  PCA.VarianceProportions)

Console.WriteLine("Cumulative Variance Proportions = " &

  PCA.CumulativeVarianceProportions)

The Threshold() method calculates the number of principal components required to account for a given proportion of the total variance:

Code Example – C# principal component analysis (PCA)

Console.WriteLine( "PCs that account for 99% of the variance = " +

                   pca.Threshold( .99 ) );

Code Example – VB principal component analysis (PCA)

Console.WriteLine("PCs that account for 99% of the variance = " &

  PCA.Threshold(0.99))

The StandardDeviations property gets the standard deviations of the principal components. Eigenvalues gets the eigenvalues of the covariance/correlation matrix, though the calculation is actually performed using the singular values of the data matrix. The eigenvalues of the covariance/correlation matrix are equal to the squares of the standard deviations of the principal components.

Lastly, the Scores property gets the score matrix. The scores are the data formed by transforming the original data into the space of the principal components:

Code Example – C# principal component analysis (PCA)

Console.WriteLine( "Scores = " + pca.Scores );

Code Example – VB principal component analysis (PCA)

Console.WriteLine("Scores = " & PCA.Scores)

This code displays the data in the minimal synthetic dimensions required to account for 99% of the variance:

Code Example – C# principal component analysis (PCA)

Slice rowSlice = Slice.All;

var colSlice = new Slice( 0, pca.Threshold( .99 ) );

Console.WriteLine( pca.Scores[ rowSlice, colSlice ] );

Code Example – VB principal component analysis (PCA)

Dim RowSlice As Slice = Slice.All

Dim ColSlice As New Slice(0, PCA.Threshold(0.99))

Console.WriteLine(PCA.Scores(RowSlice, ColSlice))

Top