**Chapter
48. **** Partial Least Squares** (.NET, C#, CSharp, VB, Visual Basic, F#)

Partial Least Squares (PLS) is a technique that generalizes and combines features from principal component analysis (Section 46.1) and multiple linear regression (Chapter 42). It is particularly useful when you need to predict a set of response (dependent) variables from a large set of predictor (independent variables).

As in multiple linear regression, the goal of PLS regression is to construct a linear model

where *Y* is *n* cases by *m*
variables response matrix, *X* is a
*n* cases by *p*
variables predictor matrix, *B* is a
*p* by *m*
regression coefficients matrix, and *E*
is a noise term for the model which has the same dimensions as *Y*.

As in principal components regression, PLS regression
produces factor scores as linear combinations of the original predictor
variables, so that there is no correlation between the factor score variables used in
the predictive regression model. For example, suppose that we have a
matrix of response variables *Y*, and
a large number of predictive variables *X*
(in matrix form), some of which may be highly correlated. A regression
using factor extraction for this data computes the score matrix *T=XW* for an appropriate matrix of weights
*W*, and then considers the linear regression
model *Y=TQ+E*, where *Q*
is a matrix of regression coefficient, called loadings, for *T*, and *E*
is an error term. Once the loadings *Q*
are computed, the above regression model is equivalent to *Y=XB+E*, with *B=WQ*,
which can be used as a predictive model.

PLS regression differs from principal components regression
in the methods used for extracting factor scores. While principal components
regression computes the weight matrix *W*
reflecting the covariance structure between predictor variables, PLS
regression produces the weight matrix *W*
reflecting the covariance structure between the predictor and response
variables.

For establishing the model with *c*
factors, or components, PLS regression produces a *p*
by *c* weight matrix *W*
for *X* such that *T=XW*.
These weights are computed so that each of them maximizes the covariance
between responses and the corresponding factor scores. Ordinary least
squares regression of *Y* on *T* are then performed to produce *Q*, the loadings for *Y*
(or weights for *Y*) such that *Y=TQ+E*. Once *Q*
is computed, we have *Y=XB+E*, where
*B=WQ*.