**12.7****
****Partial Least Squares Discriminant Analysis** (.NET, C#, CSharp, VB, Visual Basic, F#)

Partial least squares Discriminant Analysis (PLS-DA) is a variant used when the response variable is categorical. Three classes are provided for performing PLS-DA:

● **SparsePlsDa** performs Discriminant Analysis
(DA) using a classical sparse PLS regression (sPLS), but where the response
variable is categorical. The response vector *Y*
is qualitative and is recoded as a dummy block matrix where each of the
response categories are coded via an indicator variable. PLS-DA is then
run as if *Y* was a continuous matrix.
**SparsePlsDa** inherits from **PLS2**.

● **SparsePls** performs a sparse PLS calculation
with variable selection. The LASSO penalization is used on the pairs
of loading vectors. **SparsePls** implements
**IPLS2Calc**.

● **SparsePLSDACrossValidation** performs an
evaluation of a PLS model. Evaluation consists of dividing the data into
two subsets: a training subset and a testing subset. A PLS calculation
is performed on the training subset and the resulting model is used to
predict the values of the dependent variables in the testing set. The
mean square error between the actual and predicted dependent values is
then calculated. Usually, the data is divided up into several training
and testing subsets and calculations are done on each of these. In this
case the average mean square error over each PLS calculation is reported.
(The individual mean square errors are available as well.)

The subsets to use in the cross validation are
specified by providing an implementation of the **ICrossValidationSubsets**
interface. Classes that implement this interface generate training and
testing subsets from PLS data.

For example, if X is the predictor data and y the corresponding observed factor levels, this code calculates the sparse PLS-DA:

Code Example – C# Partial Least Squares Discriminant Analysis (PLS-DA)

int ncomp = 3; int numXvarsToKeep = (int) Math.Round( X.Cols * 0.66 ); int[] keepX = Enumerable.Repeat( numXvarsToKeep, ncomp ).ToArray(); var splsda = new SparsePlsDa( X, y, ncomp, keepX );

The number of components to keep in the model is specified, as well as the number of predictor variables to keep for each of the components (about two thirds, in this case).

Because **SparsePlsDa**
is a **PLS2**, you can use the **PLS2Anova** class to perform an ANOVA (Section 12.4).

Code Example – C# Partial Least Squares Discriminant Analysis (PLS-DA)

var anova = new PLS2Anova( splsda ); Console.WriteLine( "Rsqr: " + anova.CoefficientOfDetermination ); Console.WriteLine( "MSE Prediction: " + anova.RootMeanSqrErrorPrediction );

You can also do cross validation using class **SparsePLSDACrossValidation**.

Code Example – C# Partial Least Squares Discriminant Analysis (PLS-DA)

var subsetGenerator = new LeaveOneOutSubsets(); var crossValidation = new SparsePLSDACrossValidation( subsetGenerator ); crossValidation.DoCrossValidation( X, yFactor, ncomp, keepX ); Console.WriteLine( "Cross validation average MSE: " + crossValidation.AverageMeanSqrError );