Principal Component Regression: Part 1 – The Magic of the SVD.
February 8th, 2010Introduction
This is the first part of a multi-part series on Principal Component Regression, or PCR for short. We will eventually end up with a computational algorithm for PCR and code it up using C# using the NMath libraries. PCR is a method for constructing a linear regression model in the case that we have a large number of predictor variables which are highly correlated. Of course, we don’t know exactly which variables are correlated, otherwise we’d just throw them out and perform a normal linear regression.
In order to understand what is going on in the PCR algorithm, we need to know a little bit about the SVD (Singular Value Decomposition). Understanding a bit about the SVD and it’s relationship to the eigenvalue decomposition will go a long way in understanding the PCR algorithm.
The Singular Value Decomposition
The SVD (Singular Value Decomposition) is one of the most revealing matrix decompositions in linear algebra. A bit expensive to compute, but the bounty of information it yields is awe inspiring. Understanding a little about the SVD will illuminate the Principal Components Regression (PCR) algorithm. The SVD may seem like a deep and mysterious thing, at least I thought it was until I read the chapters covering it in the book “Numerical Linear Algebra”, by Lloyd N. Trefethen, and David Bau, III, which I summarize below.
We begin with an easy to state, and not too difficult to prove geometric statement about linear transformations.
A Geometric Fact
Let be the unit sphere in
, and let
be any matrix mapping
into
and suppose, for the moment, that
has full rank. Then the image,
of
under
is a hyperellipse in
(see the book for the proof).
Given this fact we make the following definitions (refer to Figure 1.):
Define the singular values ,
Read the rest of this entry »




