Cross validation is a model evaluation method which measures how well a model makes predictions for data that it has not already sees (as with residuals). To accomplish this, some of the data is removed before the model is constructed. Once the model is constructed, the data that was removed can be used to test the performance of the model on the "new" data. The following methods are typically used:
NMath Stats provides two classes for doing k-fold cross validation on PLS models. PLS1CrossValidation is used when the response data is univariate, and PLS2CrossValidation is used when the response data is multivariate. To perform a cross validation calculation, you need to specify the data (Section 12.1), a PLS calculation algorithm (Section 12.5), and an algorithm for dividing the data into subsets.
To specify how subsets for k-fold cross validation are generated from the data, you must provide the cross validation class with an object implementing the ICrossValidationSubsets interface. NMath Stats provides classes LeaveOneOutSubsets, which implement the leave-one-out strategy, and KFoldSubsets, which implements k-fold with arbitrary k.
The average mean square error for the cross validation calculation is available as a property on the cross validation object. Also available is an array of PLS1CrossValidationResult or PLS2CrossValidationResult objects. Each result object contains testing and training data that was used for each cross validation calculation and the associated mean square error.
TOC | Previous | Next | Index