<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>CenterSpace Blog &#187; Theory</title>
	<atom:link href="http://www.centerspace.net/blog/category/theory/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.centerspace.net/blog</link>
	<description>The CenterSpace blog about our NMath mathematical and statistical libraries in .NET/C#, and object-oriented numerics in general.</description>
	<lastBuildDate>Tue, 13 Dec 2011 17:35:17 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.4</generator>
		<item>
		<title>Principal Components Regression: Part 2 &#8211; The Problem With Linear Regression</title>
		<link>http://www.centerspace.net/blog/statistics/priniciple-components-regression-in-csharp/</link>
		<comments>http://www.centerspace.net/blog/statistics/priniciple-components-regression-in-csharp/#comments</comments>
		<pubDate>Thu, 04 Mar 2010 17:17:07 +0000</pubDate>
		<dc:creator>Steve Sneller</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Theory]]></category>
		<category><![CDATA[PCR]]></category>
		<category><![CDATA[PCR c#]]></category>
		<category><![CDATA[PCR estimator]]></category>
		<category><![CDATA[principal component analysis C#]]></category>
		<category><![CDATA[principal component regression]]></category>

		<guid isPermaLink="false">http://centerspace.net/blog/?p=1816</guid>
		<description><![CDATA[Multiple Linear Regression (MLR) is a powerful approach to modeling the relationship between one or two or more <em>explanatory</em> variables and a <em>response </em>variable by fitting a linear equation to observed data.  This is the second part in a three part series on PCR. ]]></description>
			<content:encoded><![CDATA[<p><small> This is the second part in a three part series on PCR, the first article on the topic can be found <a href="http://www.centerspace.net/blog/statistics/theoretical-motivation-behind-pcr/">here</a>.</small></p>
<h3><strong>The Linear Regression Model</strong></h3>
<p>Multiple Linear Regression (MLR) is a common approach to modeling the relationship between one or two or more <em>explanatory</em> variables and a <em>response </em>variable by fitting a linear equation to observed data. First let’s set up some notation. I will be rather brief, assuming the audience is somewhat familiar with MLR.</p>
<p>In multiple linear regression it is assumed that a <em>response variable</em>, <img title="Y" src="http://latex.codecogs.com/gif.latex?Y" alt="" /> depends on k <em>explanatory variables</em>, <img title="X_1,...X_k" src="http://latex.codecogs.com/gif.latex?X_1,...,X_k" alt="" />, by way of a linear relationship:</p>
<p><img title="Y=b_1X_1+b_2X_2...+b_kX_k" src="http://latex.codecogs.com/gif.latex?Y=b_1X_1+b_2X_2...+b_kX_k" alt="" /></p>
<p>The idea is to perform several observations of the response and explanatory variables and then to chose the linear coefficients <img title="b_1,...b_k" src="http://latex.codecogs.com/gif.latex?b_1,...b_k" alt="" /> which best fit the observed data.</p>
<p>Thus, a multiple linear regression model is:</p>
<p><img title="y_{i} = c + x_{i1}b_{1} + \cdots + x_{ik}b_{k} + f" src="http://latex.codecogs.com/gif.latex?y_{i} = c + x_{i1}b_{1} + \cdots + x_{ik}b_{k} + f_{i}" alt="" /><br />
<img title="i = 1,\cdots,n,\textrm{ where:}" src="http://latex.codecogs.com/gif.latex?i = 1,\cdots,n,\textrm{ where:}" alt="" /><br />
<img title="y_{i}\textrm{ is the }i\textrm{th value of the response variable}" src="http://latex.codecogs.com/gif.latex?y_{i}\textrm{ is the }i\textrm{th value of the response variable}" alt="" /><br />
<img title="x_{ij}\textrm{ is the }i\textrm{th value of the }j\textrm{th explanatory variable}" src="http://latex.codecogs.com/gif.latex?x_{ij}\textrm{ is the }i\textrm{th value of the }j\textrm{th explanatory variable}" alt="" /><br />
<img title="n \textrm{ is the sample size}" src="http://latex.codecogs.com/gif.latex?n \textrm{ is the sample size}" alt="" /><br />
<img title="k \textrm{ is the number of }x \textrm{-variables}" src="http://latex.codecogs.com/gif.latex?k \textrm{ is the number of }x \textrm{-variables}" alt="" /><br />
<img title="c \textrm{ is the intercpet of the regression model}" src="http://latex.codecogs.com/gif.latex?c \textrm{ is the intercpet of the regression model}" alt="" /><br />
<img title="b_{j} \textrm{ is the regression coefficient for the }j\textrm{th explanatory variable}" src="http://latex.codecogs.com/gif.latex?b_{j} \textrm{ is the regression coefficient for the }j\textrm{th explanatory variable}" alt="" /><br />
<img title="f\textrm{ is the random noise term, assumed independent, with zero mean and common variance }\sigma ^{2}" src="http://latex.codecogs.com/gif.latex?f\textrm{ is the random noise term, assumed independent, with zero mean and common variance }\sigma ^{2}" alt="" /><br />
<img title="c, b_{1},\cdots,b_{k}\textrm{ and }\sigma^{2}\textrm{ are unknown parameters, to be estimated from the data.}" src="http://latex.codecogs.com/gif.latex?c, b_{1},\cdots,b_{k}\textrm{ and }\sigma^{2}\textrm{ are unknown parameters, to be estimated from the data.}" alt="" /></p>
<p>In matrix notation we have</p>
<p><img title="Xb = y" src="http://latex.codecogs.com/gif.latex?\textrm{(1)  }Xb = y + f" alt="" /></p>
<p>where</p>
<p><img title="X = (x_{ij})\textrm{, } y = (y_{i}) \textrm{, and } f = f_{i}" src="http://latex.codecogs.com/gif.latex?X = (x_{ij})\textrm{, } y = (y_{i}) \textrm{, and } f = f_{i}" alt="" />.</p>
<p>The solution for the coefficient vector <img title="b" src="http://latex.codecogs.com/gif.latex?b" alt="" /> which “best” fits the data is given by the so called “normal equations”</p>
<p><img title="\textrm{(2)  }\beta=(X'X)^{-1}X'y" src="http://latex.codecogs.com/gif.latex?\textrm{(2)  }\beta=(X'X)^{-1}X'y" alt="" /></p>
<p>This is known as the least squares solution to the problem because it minimizes the sum of the squares of the errors.</p>
<p>Now, consider the following example in which<br />
<img title="X=\begin{bmatrix} 1 &amp; 1.9\ 1 &amp; 2.1\ 1 &amp; 2\\ 1&amp; 2\\ 1 &amp; 1.8 \end{bmatrix}" src="http://latex.codecogs.com/gif.latex?X=\begin{bmatrix} 1 &amp; 1.9\\ 1 &amp; 2.1\\ 1 &amp; 2\\ 1&amp; 2\\ 1 &amp; 1.8 \end{bmatrix}" alt="" /></p>
<p>and</p>
<p><img title="y=\begin{bmatrix} 6.0521\\ 7.0280\\ 7.1230\\ 4.4441\\ 5.0813 \end{bmatrix}" src="http://latex.codecogs.com/gif.latex?y=\begin{bmatrix} 6.0521\\ 7.0280\\ 7.1230\\ 4.4441\\ 5.0813 \end{bmatrix}" alt="" /></p>
<p>Solving this simple linear regression model using the normal equations yields</p>
<p><img title="\widehat{b}=\begin{bmatrix} -4.2489\\ 5.2013 \end{bmatrix}" src="http://latex.codecogs.com/gif.latex?\widehat{b}=\begin{bmatrix} -4.2489\\ 5.2013 \end{bmatrix}" alt="" /></p>
<p>which is quite far off from the actual solution</p>
<p><img title="b=\begin{bmatrix} 2\\ 2 \end{bmatrix}" src="http://latex.codecogs.com/gif.latex?b=\begin{bmatrix} 2\\ 2 \end{bmatrix}" alt="" /></p>
<p>The reason behind this is the fact that the matrix <img title="X'X" src="http://latex.codecogs.com/gif.latex?X'X" alt="" /> is ill conditioned. Since the second column of <img title="X" src="http://latex.codecogs.com/gif.latex?X" alt="" /> is approximately twice the first, the matrix <img title="X'X" src="http://latex.codecogs.com/gif.latex?X'X" alt="" /> is almost singular.</p>
<p>One solution to this problem would be to change the model. Since the second column is approximately twice the first, these two explanatory variables encode basically the same information, thus we could remove one of them from the model.<br />
However, it is usually not so easy to identify the source of the bad conditioning as it is in this example.</p>
<p>Another method for removing information from a model that is responsible for impreciseness in the least squares solution is offered by the technique of <em>principal component regression </em>(PCR). Henceforth we shall assume that the data in the matrix <img title="X" src="http://latex.codecogs.com/gif.latex?X" alt="" /> is <em>centered</em>. By this we mean that the mean of each explanatory variable has been subtracted from each column of X so that the explanatory variables all have mean zero. In particular this implies that the matrix <img title="X'X" src="http://latex.codecogs.com/gif.latex?X'X" alt="" /> is proportional to the covariance matrix for the explanatory variables.</p>
<h3>Removing the Source of Imprecision</h3>
<p>Let <img title="X" src="http://latex.codecogs.com/gif.latex?X" alt="" /> be an mxn matrix, and recall from the part 1 of this series that we can write <img title="X^TX" src="http://latex.codecogs.com/gif.latex?X^TX" alt="" /> as</p>
<p><img title="X^TX=V \Lambda V^T" src="http://latex.codecogs.com/gif.latex?X^TX=V \Lambda V^T" alt="" /></p>
<p>where <img title="\Lambda" src="http://latex.codecogs.com/gif.latex?\Lambda" alt="" /> is a diagonal matrix containing the eigenvalues (in ascending order down the diagonal) of <img title="X^TX" src="http://latex.codecogs.com/gif.latex?X^TX" alt="" />, and <img title="V" src="http://latex.codecogs.com/gif.latex?V" alt="" /> is orthogonal. The condition number <img title="\kappa (X^TX)" src="http://latex.codecogs.com/gif.latex?\kappa (X^TX)" alt="" /> for <img title="X^TX" src="http://latex.codecogs.com/gif.latex?X^TX" alt="" /> is just the absolute value of the ratio of the largest and smallest eigenvalues:</p>
<p><img title="\kappa(X^TX)=\left | \frac{\lambda_{max}}{\lambda_{min}} \right |" src="http://latex.codecogs.com/gif.latex?\kappa(X^TX)=\left | \frac{\lambda_{max}}{\lambda_{min}} \right |" alt="" /></p>
<p>Thus we can see that if the smallest eigenvalue is much smaller than the largest eigenvalue, we get a very large condition number which implies a poorly conditioned matrix. The idea then is to remove these small eigenvalues from <img title="X^TX" src="http://latex.codecogs.com/gif.latex?X^TX" alt="" /> thus giving us an approximation to <img title="X^TX" src="http://latex.codecogs.com/gif.latex?X^TX" alt="" /> that is better conditioned. To this end, suppose that we wish to retain the r (r less than or equal to n) largest eigenvalues of <img title="X^TX" src="http://latex.codecogs.com/gif.latex?X^TX" alt="" /> in our approximation, and thus write</p>
<p><img title="X^TX=(V_1,V_2)\begin{pmatrix} \Lambda_1 &amp; 0 \\ 0 &amp; \Lambda_2 \end{pmatrix} \begin{pmatrix} V_1^T \\ V_2^T \end{pmatrix}" src="http://latex.codecogs.com/gif.latex?X^TX=(V_1,V_2)\begin{pmatrix} \Lambda_1 &amp; 0 \\ 0 &amp; \Lambda_2 \end{pmatrix} \begin{pmatrix} V_1^T \\ V_2^T \end{pmatrix}" alt="" />,</p>
<p>where</p>
<p><img title="\Lambda_1" src="http://latex.codecogs.com/gif.latex?\Lambda_1" alt="" /> is an r x r diagonal matrix consisting of the r largest eigenvalues of <img title="X^TX" src="http://latex.codecogs.com/gif.latex?X^TX" alt="" />, <img title="\Lambda_2" src="http://latex.codecogs.com/gif.latex?\Lambda_2" alt="" /> is a (n-r) x (n-r) diagonal matrix consisting of the remaining n – r eigenvalues of <img title="X^TX" src="http://latex.codecogs.com/gif.latex?X^TX" alt="" />, and the n x n matrix <img title="V=(V_1,V_2)" src="http://latex.codecogs.com/gif.latex?V=(V_1,V_2)" alt="" /> is orthogonal with <img title="V_1=(v_1,...v_r)" src="http://latex.codecogs.com/gif.latex?V_1=(v_1,...v_r)" alt="" /> consisting of the first  r columns of  <img title="V" src="http://latex.codecogs.com/gif.latex?V" alt="" />, and <img title="V_2=(v_{r+1},...v_n)" src="http://latex.codecogs.com/gif.latex?V_2=(v_{r+1},...v_n)" alt="" /> consisting of the remaining n – r columns of  <img title="V" src="http://latex.codecogs.com/gif.latex?V" alt="" />. Using this formulation we can write an approximation <img title="\widehat{X^TX}" src="http://latex.codecogs.com/gif.latex?\widehat{X^TX}" alt="" /> to <img title="X^TX" src="http://latex.codecogs.com/gif.latex?X^TX" alt="" /> using the r largest eigenvalues as</p>
<p><img title="\widehat{X^TX}=V_1 \Lambda_1 V_1^T" src="http://latex.codecogs.com/gif.latex?\widehat{X^TX}=V_1 \Lambda_1 V_1^T" alt="" />.</p>
<p>If we substitute this approximation into the normal equations 2, and do some simplification, we end up with the <em>principal components estimator</em></p>
<p><img title="\textrm{(3) }\widehat{\beta^{(r)}}=V_1 \Lambda_1^{-1} V_1^T X^T y" src="http://latex.codecogs.com/gif.latex?\textrm{(3) }\widehat{\beta^{(r)}}=V_1 \Lambda_1^{-1} V_1^T X^T y" alt="" />.</p>
<p>While we could use equation 3 directly, it is usually not the best way to perform principal components regression. The next article in this series will illustrate an algorithm for PCR and implement it using the NMath libraries.</p>
<p>-Steve</p>
]]></content:encoded>
			<wfw:commentRss>http://www.centerspace.net/blog/statistics/priniciple-components-regression-in-csharp/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Principal Component Regression: Part 1 &#8211; The Magic of the SVD</title>
		<link>http://www.centerspace.net/blog/statistics/theoretical-motivation-behind-pcr/</link>
		<comments>http://www.centerspace.net/blog/statistics/theoretical-motivation-behind-pcr/#comments</comments>
		<pubDate>Mon, 08 Feb 2010 17:44:45 +0000</pubDate>
		<dc:creator>Steve Sneller</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Theory]]></category>
		<category><![CDATA[PCR]]></category>
		<category><![CDATA[PCR c#]]></category>
		<category><![CDATA[principal component regression]]></category>
		<category><![CDATA[singular value decomposition]]></category>
		<category><![CDATA[SVD]]></category>
		<category><![CDATA[svd c#]]></category>

		<guid isPermaLink="false">http://www.centerspace.net/blog/?p=1307</guid>
		<description><![CDATA[<img src="http://www.centerspace.net/blog/wp-content/uploads/2010/02/StevePCA_width400.jpg" alt="SVD of a 2x2 matrix" title="SVD of a 2x2 matrix" class="excerpt" />
This is the first part of a multi-part series on Principal Component Regression, or PCR for short. We will eventually end up with a computational algorithm for PCR and code it up using C# using the NMath libraries. PCR is a method for constructing a linear regression model in the case that we have a large number of predictor variables which are highly correlated. Of course, we don't know exactly which variables are correlated, otherwise we'd just throw them out and perform a normal linear regression.]]></description>
			<content:encoded><![CDATA[<h2>Introduction</h2>
<p>This is the first part of a multi-part series on Principal Component Regression, or PCR for short. We will eventually end up with a computational algorithm for PCR and code it up using C# using the NMath libraries. PCR is a method for constructing a linear regression model in the case that we have a large number of predictor variables which are highly correlated. Of course, we don&#8217;t know exactly which variables are correlated, otherwise we&#8217;d just throw them out and perform a normal linear regression.</p>
<p>In order to understand what is going on in the PCR algorithm, we need to know a little bit about the SVD (Singular Value Decomposition). Understanding a bit about the SVD and it&#8217;s relationship to the eigenvalue decomposition will go a long way in understanding the PCR algorithm.</p>
<h2>The Singular Value Decomposition</h2>
<p>The SVD (Singular Value Decomposition)  is one of the most revealing matrix decompositions in linear algebra. A bit expensive to compute, but the bounty of information it yields is awe inspiring. Understanding a little about the SVD will illuminate the Principal Components Regression (PCR) algorithm. The SVD may seem like a deep and mysterious thing, at least I thought it was until I read the chapters covering it in the book  <a href="http://www.ec-securehost.com/SIAM/ot50.html">&#8220;Numerical Linear Algebra&#8221;</a>, by Lloyd N. Trefethen, and David Bau, III, which I summarize below.<br />
<span id="more-1307"></span><br />
We begin with an easy to state, and not too difficult to prove geometric statement about linear transformations.</p>
<h2>A Geometric Fact</h2>
<p>Let <img title="S" src="http://latex.codecogs.com/gif.latex?S" alt="" /> be the unit sphere in <img title="\mathbb{R}^{n}" src="http://latex.codecogs.com/gif.latex?\mathbb{R}^{n}" alt="" />, and let  <img title="X \in \mathbb{R}^{mxn}" src="http://latex.codecogs.com/gif.latex?X \in \mathbb{R}^{mxn}" alt="" /> be any matrix mapping <img title="\mathbb{R}^{n}" src="http://latex.codecogs.com/gif.latex?\mathbb{R}^{n}" alt="" /> into <img title="\mathbb{R}^{n}" src="http://latex.codecogs.com/gif.latex?\mathbb{R}^{m}" alt="" /> and suppose, for the moment, that <img title="X" src="http://latex.codecogs.com/gif.latex?X" alt="" /> has full rank. Then the image, <img title="XS" src="http://latex.codecogs.com/gif.latex?XS" alt="" /> of <img title="S" src="http://latex.codecogs.com/gif.latex?S" alt="" /> under <img title="X" src="http://latex.codecogs.com/gif.latex?X" alt="" /> is a hyperellipse in <img title="\mathbb{R}^{n}" src="http://latex.codecogs.com/gif.latex?\mathbb{R}^{m}" alt="" /> (see the book for the proof).</p>
<div id="attachment_1360" class="wp-caption aligncenter" style="width: 410px"><a href="http://www.centerspace.net/blog/wp-content/uploads/2010/02/StevePCA_width400.jpg"><img class="size-full wp-image-1360" title="SVD of a 2x2 matrix" src="http://www.centerspace.net/blog/wp-content/uploads/2010/02/StevePCA_width400.jpg" alt="SVD of a 2x2 matrix" width="400" height="184" /></a><p class="wp-caption-text">Figure 1.  SVD of a 2x2 matrix</p></div>
<p>Given this fact we make the following definitions (refer to Figure 1.):</p>
<p>Define the singular values ,</p>
<p><img title="\sigma _{1}\cdots\sigma_{n}" src="http://latex.codecogs.com/gif.latex?\sigma _{1}\cdots\sigma_{n}" alt="" /></p>
<p>of <img title="X" src="http://latex.codecogs.com/gif.latex?X" alt="" /> to be the lengths of the <img title="n" src="http://latex.codecogs.com/gif.latex?n" alt="" /> principal semiaxes of the hyperellipse <img title="XS" src="http://latex.codecogs.com/gif.latex?XS" alt="" />. It is conventional to assume the singular values are numbered in descending order</p>
<p><img title="\inline \sigma {1}\geq \sigma _{2}\geq\cdots\geq \sigma_{n}" src="http://latex.codecogs.com/gif.latex?\inline \sigma {1}\geq \sigma _{2}\geq\cdots\geq \sigma_{n}" alt="" /></p>
<p>Define the left singular vectors</p>
<p><img title="u_{1},\cdots,u_{m}" src="http://latex.codecogs.com/gif.latex?u_{1},\cdots,u_{n}" alt="" /></p>
<p>to be unit vectors in the direction of the principal semiaxes of <img title="XS" src="http://latex.codecogs.com/gif.latex?XS" alt="" /> and define the right singular vectors,</p>
<p><img title="v_{1}\cdots v_{n}" src="http://latex.codecogs.com/gif.latex?v_{1}\cdots v_{n}" alt="" />,</p>
<p>to be the pre-images of the principal semiaxes of <img title="XS" src="http://latex.codecogs.com/gif.latex?XS" alt="" /> so that</p>
<p><img title="Xv_{i} = \sigma_{i}u_{i}" src="http://latex.codecogs.com/gif.latex?Xv_{i} = \sigma_{i}u_{i}" alt="" />.</p>
<p>In matrix form we have</p>
<p><img title="XV = U \Sigma" src="http://latex.codecogs.com/gif.latex?XV = U \Sigma" alt="" />,</p>
<p>where <img src="http://latex.codecogs.com/gif.latex?V" alt="" /> is the <img src="http://latex.codecogs.com/gif.latex?n\textrm{ x }n" alt="" /> orthonormal matrix whose columns are the right singular vectors of <img src="http://latex.codecogs.com/gif.latex?X" alt="" />, <img src="http://latex.codecogs.com/gif.latex?\Sigma" alt="" /> is an <img src="http://latex.codecogs.com/gif.latex?n\textrm{ x }n" alt="" /> diagonal matrix with positive entries equal to the singular values, and <img src="http://latex.codecogs.com/gif.latex?U" alt="" /> is an <img src="http://latex.codecogs.com/gif.latex?m\textrm{ x }n" alt="" /> matrix whose orthonormal columns are the left singular vectors.<br />
Since the columns of <img src="http://latex.codecogs.com/gif.latex?V" alt="" /> are orthonormal by construction, <img src="http://latex.codecogs.com/gif.latex?V" alt="" />is a <em>unitary</em> matrix, that is it&#8217;s transpose is equal to it&#8217;s inverse, thus we can write</p>
<p><img title="\textrm{(2) }X = U \Sigma V^{T}" src="http://latex.codecogs.com/gif.latex?\textrm{(2) }X = U \Sigma V^{T}" alt="" /></p>
<p>And there you have it, the SVD is all it&#8217;s majesty! Actually the above decomposition is what is known as the <em>reduced </em>SVD. Note that the columns of <img src="http://latex.codecogs.com/gif.latex?U" alt="" />are <img src="http://latex.codecogs.com/gif.latex?n" alt="" /> orthonormal vectors in <img src="http://latex.codecogs.com/gif.latex?m" alt="" /> dimensional space. <img src="http://latex.codecogs.com/gif.latex?U" alt="" /> can be extended to a unitary matrix by adjoining an additional <img src="http://latex.codecogs.com/gif.latex?m-n" alt="" /> orthonormal columns. If in addition we append <img src="http://latex.codecogs.com/gif.latex?m-n" alt="" /> rows of zeros to the bottom of the matrix <img src="http://latex.codecogs.com/gif.latex?\Sigma" alt="" />, it will effectively multiply the appended columns in <img src="http://latex.codecogs.com/gif.latex?U" alt="" /> by zero, thus preserving equation (2). When <img src="http://latex.codecogs.com/gif.latex?U" alt="" /> and <img src="http://latex.codecogs.com/gif.latex?\Sigma" alt="" /> are modified in this way equation (2) is called the <em>full</em> SVD.</p>
<h2>The Relationship Between Singular Values and Eigenvalues</h2>
<p>There is an important relationship between the singular values of <img title="X" src="http://latex.codecogs.com/gif.latex?X" alt="" /> and the eigenvalues of <img title="X^{T}X" src="http://latex.codecogs.com/gif.latex?X^{T}X" alt="" />. Recall that a vector <img title="v" src="http://latex.codecogs.com/gif.latex?v" alt="" /> is an eigenvector with corresponding eigenvalue <img title="\lambda" src="http://latex.codecogs.com/gif.latex?\lambda" alt="" /> for a matrix <img title="X" src="http://latex.codecogs.com/gif.latex?X" alt="" /> if and only if <img title="Xv=\lambda v" src="http://latex.codecogs.com/gif.latex?Xv=\lambda v" alt="" />. Now, suppose we have the full SVD for <img src="http://latex.codecogs.com/gif.latex?X" alt="" /> as in equation (2). Then</p>
<p><img title="X^{T}X=(U\Sigma V^{T})^{T}(U \Sigma V^{T})" src="http://latex.codecogs.com/gif.latex?X^{T}X=(U\Sigma V^{T})^{T}(U \Sigma V^{T})" alt="" /></p>
<p><img title="= V \Sigma ^{T}U^{T}U \Sigma V^{T}" src="http://latex.codecogs.com/gif.latex?= V \Sigma ^{T}U^{T}U \Sigma V^{T}" alt="" /></p>
<p><img title="= V \Sigma^{T} \Sigma V^{T}" src="http://latex.codecogs.com/gif.latex?= V \Sigma^{T} \Sigma V^{T}" alt="" /></p>
<p>or,</p>
<p><img title="(X^{T}X)V = V \Lambda" src="http://latex.codecogs.com/gif.latex?(X^{T}X)V = V \Lambda" alt="" /></p>
<p>where we have used the fact that <img src="http://latex.codecogs.com/gif.latex?U" alt="" /> and <img src="http://latex.codecogs.com/gif.latex?V" alt="" /> are unitary and set</p>
<p><img src="http://latex.codecogs.com/gif.latex?\Lambda = \Sigma^{T} \Sigma" alt="" />.</p>
<p>Note that <img src="http://latex.codecogs.com/gif.latex?\Lambda" alt="" /> is a diagonal matrix with the singular values squared along the diagonal. From this it follows that the columns of <img src="http://latex.codecogs.com/gif.latex?V" alt="" />are eigenvectors for <img title="X^{T}X" src="http://latex.codecogs.com/gif.latex?X^{T}X" alt="" /> and the main diagonal of <img src="http://latex.codecogs.com/gif.latex?\Lambda" alt="" /> contain the corresponding eigenvalues. Thus the nonzero singular values of <img src="http://latex.codecogs.com/gif.latex?X" alt="" /> are the square roots of the nonzero eigenvalues of <img title="X^{T}X" src="http://latex.codecogs.com/gif.latex?X^{T}X" alt="" />.</p>
<p>We need one more very cool fact about the SVD before we get to the algorithm. Low-rank approximation.</p>
<h2>Low-Rank Approximation</h2>
<p>Suppose now that <img src="http://latex.codecogs.com/gif.latex?X" alt="" /> has rank <img src="http://latex.codecogs.com/gif.latex?r" alt="" /> and write <img src="http://latex.codecogs.com/gif.latex?\Sigma" alt="" /> in equation (2) as the sum of <img src="http://latex.codecogs.com/gif.latex?r" alt="" /> rank one matrices (each <img src="http://latex.codecogs.com/gif.latex?r\textrm{ x }r" alt="" /> rank one matrix will be all zeros except for <img src="http://latex.codecogs.com/gif.latex?\sigma_{j}" alt="" /> as the <img src="http://latex.codecogs.com/gif.latex?j" alt="" />th diagonal element). We can then, using equation (2), write <img src="http://latex.codecogs.com/gif.latex?X" alt="" /> as the sum of rank one matrices,</p>
<p><img title="\textrm{(3)  }X=\sum_{j=1}^{r} \sigma_{j}u_{j}v_{j}^{T}" src="http://latex.codecogs.com/gif.latex?\textrm{(3)  }X=\sum_{j=1}^{r} \sigma_{j}u_{j}v_{j}^{T}" alt="" /></p>
<p>Equation (3) gives us a way to approximate any rank <img src="http://latex.codecogs.com/gif.latex?r" alt="" /> matrix <img src="http://latex.codecogs.com/gif.latex?X" alt="" /> by a lower rank <img src="http://latex.codecogs.com/gif.latex?k &lt; r" alt="" /> matrix. Indeed, given <img src="http://latex.codecogs.com/gif.latex?k &lt; r" alt="" />, form the <img src="http://latex.codecogs.com/gif.latex?k\textrm{th}" alt="" />partial sum</p>
<p><img title="X_{k}=\sum_{j=1}^{k} \sigma_{j}u_{j}v_{j}^{T}" src="http://latex.codecogs.com/gif.latex?X_{k}=\sum_{j=1}^{k} \sigma_{j}u_{j}v_{j}^{T}" alt="" /></p>
<p>Then <img src="http://latex.codecogs.com/gif.latex?X_{k}" alt="" /> is a rank <img src="http://latex.codecogs.com/gif.latex?k" alt="" /> approximation for <img src="http://latex.codecogs.com/gif.latex?X" alt="" />.  How good is this approximation? Turns out it&#8217;s the best rank <img src="http://latex.codecogs.com/gif.latex?k" alt="" /> approximation you can get.</p>
<h2>Computing the Low-Rank Approximations Using NMath</h2>
<p>The NMath library provides two classes for computing the SVD for a matrix (actually 8 since there SVD classes for each of the datatypes <code>Double</code>, <code>Float</code>, <code>DoubleComplex</code> and <code>FloatComplex</code>). There is a basic decomposition class for computing the standard, reduced SVD, and a decomposition server class when more control is desired. Here is a simple C# routine that constructs the low-rank approximations for a matrix <img src="http://latex.codecogs.com/gif.latex?X" alt="" /> and prints out the Frobenius norms of difference between <img src="http://latex.codecogs.com/gif.latex?X" alt="" /> and each of it&#8217;s low-rank approximations.</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #0600FF;">static</span> <span style="color: #0600FF;">void</span> LowerRankApproximations<span style="color: #000000;">&#40;</span> DoubleMatrix X <span style="color: #000000;">&#41;</span>
<span style="color: #000000;">&#123;</span>
  <span style="color: #008080; font-style: italic;">// Construct the reduced SVD for X. We will consider</span>
  <span style="color: #008080; font-style: italic;">// all singular values less than 1e-15 to be zero.</span>
  DoubleSVDecomp decomp <span style="color: #008000;">=</span> <span style="color: #008000;">new</span> DoubleSVDecomp<span style="color: #000000;">&#40;</span> X <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
  decomp.<span style="color: #0000FF;">Truncate</span><span style="color: #000000;">&#40;</span> 1e<span style="color: #008000;">-</span>15 <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
  <span style="color: #FF0000;">int</span> r <span style="color: #008000;">=</span> decomp.<span style="color: #0000FF;">Rank</span><span style="color: #008000;">;</span>
  Console.<span style="color: #0000FF;">WriteLine</span><span style="color: #000000;">&#40;</span> <span style="color: #666666;">&quot;The {0}x{1} matrix X has rank {2}&quot;</span>, X.<span style="color: #0000FF;">Rows</span>, X.<span style="color: #0000FF;">Cols</span>, r <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
  <span style="color: #008080; font-style: italic;">// Construct the best lower rank approximations to X and</span>
  <span style="color: #008080; font-style: italic;">// look at the frobenius norm of their differences.</span>
  DoubleMatrix LowerRankApprox <span style="color: #008000;">=</span>
    <span style="color: #008000;">new</span> DoubleMatrix<span style="color: #000000;">&#40;</span> X.<span style="color: #0000FF;">Rows</span>, X.<span style="color: #0000FF;">Cols</span> <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
  <span style="color: #FF0000;">double</span> differenceNorm<span style="color: #008000;">;</span>
  <span style="color: #0600FF;">for</span> <span style="color: #000000;">&#40;</span> <span style="color: #FF0000;">int</span> k <span style="color: #008000;">=</span> <span style="color: #FF0000;">0</span><span style="color: #008000;">;</span> k <span style="color: #008000;">&amp;</span>lt<span style="color: #008000;">;</span> r<span style="color: #008000;">;</span> k<span style="color: #008000;">++</span> <span style="color: #000000;">&#41;</span>
  <span style="color: #000000;">&#123;</span>
    LowerRankApprox <span style="color: #008000;">+=</span> decomp.<span style="color: #0000FF;">SingularValues</span><span style="color: #000000;">&#91;</span>k<span style="color: #000000;">&#93;</span> <span style="color: #008000;">*</span>
      NMathFunctions.<span style="color: #0000FF;">OuterProduct</span><span style="color: #000000;">&#40;</span> decomp.<span style="color: #0000FF;">LeftVectors</span>.<span style="color: #0000FF;">Col</span><span style="color: #000000;">&#40;</span> k <span style="color: #000000;">&#41;</span>, decomp.<span style="color: #0000FF;">RightVectors</span>.<span style="color: #0000FF;">Col</span><span style="color: #000000;">&#40;</span> k <span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    differenceNorm <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span> X <span style="color: #008000;">-</span> LowerRankApprox <span style="color: #000000;">&#41;</span>.<span style="color: #0000FF;">FrobeniusNorm</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    Console.<span style="color: #0000FF;">WriteLine</span><span style="color: #000000;">&#40;</span> <span style="color: #666666;">&quot;Rank {0} approximation difference
      norm = {1:F4}&quot;</span>, k<span style="color: #008000;">+</span><span style="color: #FF0000;">1</span>, differenceNorm <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
  <span style="color: #000000;">&#125;</span>
<span style="color: #000000;">&#125;</span></pre></div></div>

<p>Here&#8217;s the output for a matrix with 10 rows and 20 columns. Note that the rank can be at most 10.</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;">The 10x20 matrix X has rank <span style="color: #FF0000;">10</span>
Rank <span style="color: #FF0000;">1</span> approximation difference norm <span style="color: #008000;">=</span> <span style="color: #FF0000;">3.7954</span>
Rank <span style="color: #FF0000;">2</span> approximation difference norm <span style="color: #008000;">=</span> <span style="color: #FF0000;">3.3226</span>
Rank <span style="color: #FF0000;">3</span> approximation difference norm <span style="color: #008000;">=</span> <span style="color: #FF0000;">2.9135</span>
Rank <span style="color: #FF0000;">4</span> approximation difference norm <span style="color: #008000;">=</span> <span style="color: #FF0000;">2.4584</span>
Rank <span style="color: #FF0000;">5</span> approximation difference norm <span style="color: #008000;">=</span> <span style="color: #FF0000;">2.0038</span>
Rank <span style="color: #FF0000;">6</span> approximation difference norm <span style="color: #008000;">=</span> <span style="color: #FF0000;">1.5689</span>
Rank <span style="color: #FF0000;">7</span> approximation difference norm <span style="color: #008000;">=</span> <span style="color: #FF0000;">1.1829</span>
Rank <span style="color: #FF0000;">8</span> approximation difference norm <span style="color: #008000;">=</span> <span style="color: #FF0000;">0.8107</span>
Rank <span style="color: #FF0000;">9</span> approximation difference norm <span style="color: #008000;">=</span> <span style="color: #FF0000;">0.3676</span>
Rank <span style="color: #FF0000;">10</span> approximation difference norm <span style="color: #008000;">=</span> <span style="color: #FF0000;">0.0000</span></pre></div></div>

<p>-Steve</p>
]]></content:encoded>
			<wfw:commentRss>http://www.centerspace.net/blog/statistics/theoretical-motivation-behind-pcr/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

