← All NMath Code Examples
Imports System
Imports Microsoft.VisualBasic
Imports CenterSpace.NMath.Core
Imports System.IO
Namespace CenterSpace.NMath.Examples.VisualBasic
A .NET example in Visual Basic showing how to perform a principal component analysis on a data set.
Module PrincipalComponentExample
Sub Main()
Read in data from a file. These data give air pollution and related values
for 41 U.S. cities.
SO2: Sulfur dioxide content of air in micrograms per cubic meter
Temp: Average annual temperature in degrees Fahrenheit
Man: Number of manufacturing enterprises employing 20 or more workers
Pop: Population size in thousands from the 1970 census
Wind: Average annual wind speed in miles per hour
Rain: Average annual precipitation in inches
RainDays: Average number of days with precipitation per year
Source: http://lib.stat.cmu.edu/DASL/Datafiles/AirPollution.html
Dim DF As DataFrame = DataFrame.Load("PrincipalComponentExample.dat", True, True, ControlChars.Tab, True)
Console.WriteLine()
Console.WriteLine(DF)
Console.WriteLine()
Class DoublePCA performs a double-precision principal component
analysis on a given data set. The data may optionally be centered and
scaled before analysis takes place. By default, variables are centered
but not scaled.
Dim PCA As New DoublePCA(DF)
Once your data is analyzed, you can can retrieve information about the data.
If centering was specified, the column means are subtracted from
the column values before analysis takes place. If scaling was specified,
column values are scaled to have unit variance before analysis by dividing
by the column norm.
Console.WriteLine("Number of Observations = " & PCA.NumberOfObservations)
Console.WriteLine("Number of Variables = " & PCA.NumberOfVariables)
Console.WriteLine()
Console.WriteLine("Column Means = " & PCA.Means.ToString("G5"))
Console.WriteLine()
Console.WriteLine("Column Norms = " & PCA.Norms.ToString("G5"))
Console.WriteLine()
Console.WriteLine("Data was centered? = " & PCA.IsCentered)
Console.WriteLine("Data was scaled? = " & PCA.IsScaled)
Console.WriteLine()
The Loadings property gets the loading matrix. Each column is a principal component.
Console.WriteLine("Loadings =")
Console.WriteLine(PCA.Loadings.ToTabDelimited("G9"))
Console.WriteLine()
You can retrieve a particular principal component using the indexer.
Console.WriteLine("First principal component = " & PCA(0).ToString("G5"))
Console.WriteLine()
Console.WriteLine("Second principal component = " & PCA(1).ToString("G5"))
Console.WriteLine()
The first principal component accounts for as much of the variability in the
data as possible, and each succeeding component accounts for as much of the
remaining variability as possible.
Console.WriteLine("Variance Proportions = " & PCA.VarianceProportions.ToString("G5"))
Console.WriteLine()
Console.WriteLine("Cumulative Variance Proportions = " & PCA.CumulativeVarianceProportions.ToString("G9"))
Console.WriteLine()
You can also get the number of principal components required to account for
a given proportion of the total variance. In this case, a plane fit to the
original 7-dimensional space accounts for 99% of the variance.
Console.WriteLine("PCs that account for 99% of the variance = " & PCA.Threshold(0.99))
Console.WriteLine()
The Score matrix is the data formed by transforming the original data into
the space of the principal components.
Console.WriteLine("Scores =")
Console.WriteLine(PCA.Scores.ToTabDelimited("G9"))
Console.WriteLine()
Console.WriteLine("Press Enter Key")
Console.Read()
End Sub
End Module
End Namespace
← All NMath Code Examples