NMath Stats User's Guide

TOC | Previous | Next | Index

12.7 Partial Least Squares Discriminant Analysis (.NET, C#, CSharp, VB, Visual Basic, F#)

Partial least squares Discriminant Analysis (PLS-DA) is a variant used when the response variable is categorical. Three classes are provided for performing PLS-DA:

SparsePlsDa performs Discriminant Analysis (DA) using a classical sparse PLS regression (sPLS), but where the response variable is categorical. The response vector Y is qualitative and is recoded as a dummy block matrix where each of the response categories are coded via an indicator variable. PLS-DA is then run as if Y was a continuous matrix. SparsePlsDa inherits from PLS2.

SparsePls performs a sparse PLS calculation with variable selection. The LASSO penalization is used on the pairs of loading vectors. SparsePls implements IPLS2Calc.

SparsePLSDACrossValidation performs an evaluation of a PLS model. Evaluation consists of dividing the data into two subsets: a training subset and a testing subset. A PLS calculation is performed on the training subset and the resulting model is used to predict the values of the dependent variables in the testing set. The mean square error between the actual and predicted dependent values is then calculated. Usually, the data is divided up into several training and testing subsets and calculations are done on each of these. In this case the average mean square error over each PLS calculation is reported. (The individual mean square errors are available as well.)

The subsets to use in the cross validation are specified by providing an implementation of the ICrossValidationSubsets interface. Classes that implement this interface generate training and testing subsets from PLS data.

For example, if X is the predictor data and y the corresponding observed factor levels, this code calculates the sparse PLS-DA:

Code Example – C# Partial Least Squares Discriminant Analysis (PLS-DA)

int ncomp = 3;
int numXvarsToKeep = (int) Math.Round( X.Cols * 0.66 );
int[] keepX = Enumerable.Repeat( numXvarsToKeep, ncomp ).ToArray();
var splsda = new SparsePlsDa( X, y, ncomp, keepX );

The number of components to keep in the model is specified, as well as the number of predictor variables to keep for each of the components (about two thirds, in this case).

Because SparsePlsDa is a PLS2, you can use the PLS2Anova class to perform an ANOVA (Section 12.4).

Code Example – C# Partial Least Squares Discriminant Analysis (PLS-DA)

var anova = new PLS2Anova( splsda );
Console.WriteLine( "Rsqr: " + anova.CoefficientOfDetermination );
Console.WriteLine( "MSE Prediction: " + 
  anova.RootMeanSqrErrorPrediction );

You can also do cross validation using class SparsePLSDACrossValidation.

Code Example – C# Partial Least Squares Discriminant Analysis (PLS-DA)

var subsetGenerator = new LeaveOneOutSubsets();
var crossValidation =
  new SparsePLSDACrossValidation( subsetGenerator );

crossValidation.DoCrossValidation( X, yFactor, ncomp, keepX );

Console.WriteLine( "Cross validation average MSE: " + 
  crossValidation.AverageMeanSqrError );


Top

Top