<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	
	xmlns:georss="http://www.georss.org/georss"
	xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
	>

<channel>
	<title>principal component regression Archives - CenterSpace</title>
	<atom:link href="https://www.centerspace.net/tag/principal-component-regression/feed" rel="self" type="application/rss+xml" />
	<link>https://www.centerspace.net/tag/principal-component-regression</link>
	<description>.NET numerical class libraries</description>
	<lastBuildDate>Sun, 03 May 2020 15:30:07 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.1.1</generator>
<site xmlns="com-wordpress:feed-additions:1">104092929</site>	<item>
		<title>Principal Components Regression: Part 2 &#8211; The Problem With Linear Regression</title>
		<link>https://www.centerspace.net/priniciple-components-regression-in-csharp</link>
					<comments>https://www.centerspace.net/priniciple-components-regression-in-csharp#comments</comments>
		
		<dc:creator><![CDATA[Steve Sneller]]></dc:creator>
		<pubDate>Thu, 04 Mar 2010 17:17:07 +0000</pubDate>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Theory]]></category>
		<category><![CDATA[PCR]]></category>
		<category><![CDATA[PCR c#]]></category>
		<category><![CDATA[PCR estimator]]></category>
		<category><![CDATA[principal component analysis C#]]></category>
		<category><![CDATA[principal component regression]]></category>
		<guid isPermaLink="false">http://centerspace.net/blog/?p=1816</guid>

					<description><![CDATA[<p>Multiple Linear Regression (MLR) is a powerful approach to modeling the relationship between one or two or more <em>explanatory</em> variables and a <em>response </em>variable by fitting a linear equation to observed data.  This is the second part in a three part series on PCR. </p>
<p>The post <a rel="nofollow" href="https://www.centerspace.net/priniciple-components-regression-in-csharp">Principal Components Regression: Part 2 &#8211; The Problem With Linear Regression</a> appeared first on <a rel="nofollow" href="https://www.centerspace.net">CenterSpace</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p><small> This is the second part in a three part series on PCR, the first article on the topic can be found <a href="/theoretical-motivation-behind-pcr/">here</a>.</small></p>
<h3><strong>The Linear Regression Model</strong></h3>
<p>Multiple Linear Regression (MLR) is a common approach to modeling the relationship between one or two or more <em>explanatory</em> variables and a <em>response </em>variable by fitting a linear equation to observed data. First let’s set up some notation. I will be rather brief, assuming the audience is somewhat familiar with MLR.</p>
<p>In multiple linear regression it is assumed that a <em>response variable</em>, <img decoding="async" title="Y" src="http://latex.codecogs.com/gif.latex?Y" alt="" /> depends on k <em>explanatory variables</em>, <img decoding="async" title="X_1,...X_k" src="http://latex.codecogs.com/gif.latex?X_1,...,X_k" alt="" />, by way of a linear relationship:</p>
<p><img decoding="async" title="Y=b_1X_1+b_2X_2...+b_kX_k" src="http://latex.codecogs.com/gif.latex?Y=b_1X_1+b_2X_2...+b_kX_k" alt="" /></p>
<p>The idea is to perform several observations of the response and explanatory variables and then to chose the linear coefficients <img decoding="async" title="b_1,...b_k" src="http://latex.codecogs.com/gif.latex?b_1,...b_k" alt="" /> which best fit the observed data.</p>
<p>Thus, a multiple linear regression model is:</p>
<p><img decoding="async" title="y_{i} = c + x_{i1}b_{1} + \cdots + x_{ik}b_{k} + f" src="http://latex.codecogs.com/gif.latex?y_{i} = c + x_{i1}b_{1} + \cdots + x_{ik}b_{k} + f_{i}" alt="" /><br />
<img decoding="async" title="i = 1,\cdots,n,\textrm{ where:}" src="http://latex.codecogs.com/gif.latex?i = 1,\cdots,n,\textrm{ where:}" alt="" /><br />
<img decoding="async" title="y_{i}\textrm{ is the }i\textrm{th value of the response variable}" src="http://latex.codecogs.com/gif.latex?y_{i}\textrm{ is the }i\textrm{th value of the response variable}" alt="" /><br />
<img decoding="async" title="x_{ij}\textrm{ is the }i\textrm{th value of the }j\textrm{th explanatory variable}" src="http://latex.codecogs.com/gif.latex?x_{ij}\textrm{ is the }i\textrm{th value of the }j\textrm{th explanatory variable}" alt="" /><br />
<img decoding="async" title="n \textrm{ is the sample size}" src="http://latex.codecogs.com/gif.latex?n \textrm{ is the sample size}" alt="" /><br />
<img decoding="async" title="k \textrm{ is the number of }x \textrm{-variables}" src="http://latex.codecogs.com/gif.latex?k \textrm{ is the number of }x \textrm{-variables}" alt="" /><br />
<img decoding="async" title="c \textrm{ is the intercpet of the regression model}" src="http://latex.codecogs.com/gif.latex?c \textrm{ is the intercpet of the regression model}" alt="" /><br />
<img decoding="async" title="b_{j} \textrm{ is the regression coefficient for the }j\textrm{th explanatory variable}" src="http://latex.codecogs.com/gif.latex?b_{j} \textrm{ is the regression coefficient for the }j\textrm{th explanatory variable}" alt="" /><br />
<img decoding="async" title="f\textrm{ is the random noise term, assumed independent, with zero mean and common variance }\sigma ^{2}" src="http://latex.codecogs.com/gif.latex?f\textrm{ is the random noise term, assumed independent, with zero mean and common variance }\sigma ^{2}" alt="" /><br />
<img decoding="async" title="c, b_{1},\cdots,b_{k}\textrm{ and }\sigma^{2}\textrm{ are unknown parameters, to be estimated from the data.}" src="http://latex.codecogs.com/gif.latex?c, b_{1},\cdots,b_{k}\textrm{ and }\sigma^{2}\textrm{ are unknown parameters, to be estimated from the data.}" alt="" /></p>
<p>In matrix notation we have</p>
<p><img decoding="async" title="Xb = y" src="http://latex.codecogs.com/gif.latex?\textrm{(1)  }Xb = y + f" alt="" /></p>
<p>where</p>
<p><img decoding="async" title="X = (x_{ij})\textrm{, } y = (y_{i}) \textrm{, and } f = f_{i}" src="http://latex.codecogs.com/gif.latex?X = (x_{ij})\textrm{, } y = (y_{i}) \textrm{, and } f = f_{i}" alt="" />.</p>
<p>The solution for the coefficient vector <img decoding="async" title="b" src="http://latex.codecogs.com/gif.latex?b" alt="" /> which “best” fits the data is given by the so called “normal equations”</p>
<p><img decoding="async" title="\textrm{(2)  }\beta=(X'X)^{-1}X'y" src="http://latex.codecogs.com/gif.latex?\textrm{(2)  }\beta=(X'X)^{-1}X'y" alt="" /></p>
<p>This is known as the least squares solution to the problem because it minimizes the sum of the squares of the errors.</p>
<p>Now, consider the following example in which<br />
<img decoding="async" title="X=\begin{bmatrix} 1 &amp; 1.9\ 1 &amp; 2.1\ 1 &amp; 2\\ 1&amp; 2\\ 1 &amp; 1.8 \end{bmatrix}" src="http://latex.codecogs.com/gif.latex?X=\begin{bmatrix} 1 &amp; 1.9\\ 1 &amp; 2.1\\ 1 &amp; 2\\ 1&amp; 2\\ 1 &amp; 1.8 \end{bmatrix}" alt="" /></p>
<p>and</p>
<p><img decoding="async" title="y=\begin{bmatrix} 6.0521\\ 7.0280\\ 7.1230\\ 4.4441\\ 5.0813 \end{bmatrix}" src="http://latex.codecogs.com/gif.latex?y=\begin{bmatrix} 6.0521\\ 7.0280\\ 7.1230\\ 4.4441\\ 5.0813 \end{bmatrix}" alt="" /></p>
<p>Solving this simple linear regression model using the normal equations yields</p>
<p><img decoding="async" title="\widehat{b}=\begin{bmatrix} -4.2489\\ 5.2013 \end{bmatrix}" src="http://latex.codecogs.com/gif.latex?\widehat{b}=\begin{bmatrix} -4.2489\\ 5.2013 \end{bmatrix}" alt="" /></p>
<p>which is quite far off from the actual solution</p>
<p><img decoding="async" title="b=\begin{bmatrix} 2\\ 2 \end{bmatrix}" src="http://latex.codecogs.com/gif.latex?b=\begin{bmatrix} 2\\ 2 \end{bmatrix}" alt="" /></p>
<p>The reason behind this is the fact that the matrix <img decoding="async" title="X'X" src="http://latex.codecogs.com/gif.latex?X'X" alt="" /> is ill conditioned. Since the second column of <img decoding="async" title="X" src="http://latex.codecogs.com/gif.latex?X" alt="" /> is approximately twice the first, the matrix <img decoding="async" title="X'X" src="http://latex.codecogs.com/gif.latex?X'X" alt="" /> is almost singular.</p>
<p>One solution to this problem would be to change the model. Since the second column is approximately twice the first, these two explanatory variables encode basically the same information, thus we could remove one of them from the model.<br />
However, it is usually not so easy to identify the source of the bad conditioning as it is in this example.</p>
<p>Another method for removing information from a model that is responsible for impreciseness in the least squares solution is offered by the technique of <em>principal component regression </em>(PCR). Henceforth we shall assume that the data in the matrix <img decoding="async" title="X" src="http://latex.codecogs.com/gif.latex?X" alt="" /> is <em>centered</em>. By this we mean that the mean of each explanatory variable has been subtracted from each column of X so that the explanatory variables all have mean zero. In particular this implies that the matrix <img decoding="async" title="X'X" src="http://latex.codecogs.com/gif.latex?X'X" alt="" /> is proportional to the covariance matrix for the explanatory variables.</p>
<h3>Removing the Source of Imprecision</h3>
<p>Let <img decoding="async" title="X" src="http://latex.codecogs.com/gif.latex?X" alt="" /> be an mxn matrix, and recall from the part 1 of this series that we can write <img decoding="async" title="X^TX" src="http://latex.codecogs.com/gif.latex?X^TX" alt="" /> as</p>
<p><img decoding="async" title="X^TX=V \Lambda V^T" src="http://latex.codecogs.com/gif.latex?X^TX=V \Lambda V^T" alt="" /></p>
<p>where <img decoding="async" title="\Lambda" src="http://latex.codecogs.com/gif.latex?\Lambda" alt="" /> is a diagonal matrix containing the eigenvalues (in ascending order down the diagonal) of <img decoding="async" title="X^TX" src="http://latex.codecogs.com/gif.latex?X^TX" alt="" />, and <img decoding="async" title="V" src="http://latex.codecogs.com/gif.latex?V" alt="" /> is orthogonal. The condition number <img decoding="async" title="\kappa (X^TX)" src="http://latex.codecogs.com/gif.latex?\kappa (X^TX)" alt="" /> for <img decoding="async" title="X^TX" src="http://latex.codecogs.com/gif.latex?X^TX" alt="" /> is just the absolute value of the ratio of the largest and smallest eigenvalues:</p>
<p><img decoding="async" title="\kappa(X^TX)=\left | \frac{\lambda_{max}}{\lambda_{min}} \right |" src="http://latex.codecogs.com/gif.latex?\kappa(X^TX)=\left | \frac{\lambda_{max}}{\lambda_{min}} \right |" alt="" /></p>
<p>Thus we can see that if the smallest eigenvalue is much smaller than the largest eigenvalue, we get a very large condition number which implies a poorly conditioned matrix. The idea then is to remove these small eigenvalues from <img decoding="async" title="X^TX" src="http://latex.codecogs.com/gif.latex?X^TX" alt="" /> thus giving us an approximation to <img decoding="async" title="X^TX" src="http://latex.codecogs.com/gif.latex?X^TX" alt="" /> that is better conditioned. To this end, suppose that we wish to retain the r (r less than or equal to n) largest eigenvalues of <img decoding="async" title="X^TX" src="http://latex.codecogs.com/gif.latex?X^TX" alt="" /> in our approximation, and thus write</p>
<p><img decoding="async" title="X^TX=(V_1,V_2)\begin{pmatrix} \Lambda_1 &amp; 0 \\ 0 &amp; \Lambda_2 \end{pmatrix} \begin{pmatrix} V_1^T \\ V_2^T \end{pmatrix}" src="http://latex.codecogs.com/gif.latex?X^TX=(V_1,V_2)\begin{pmatrix} \Lambda_1 &amp; 0 \\ 0 &amp; \Lambda_2 \end{pmatrix} \begin{pmatrix} V_1^T \\ V_2^T \end{pmatrix}" alt="" />,</p>
<p>where</p>
<p><img decoding="async" title="\Lambda_1" src="http://latex.codecogs.com/gif.latex?\Lambda_1" alt="" /> is an r x r diagonal matrix consisting of the r largest eigenvalues of <img decoding="async" title="X^TX" src="http://latex.codecogs.com/gif.latex?X^TX" alt="" />, <img decoding="async" title="\Lambda_2" src="http://latex.codecogs.com/gif.latex?\Lambda_2" alt="" /> is a (n-r) x (n-r) diagonal matrix consisting of the remaining n – r eigenvalues of <img decoding="async" title="X^TX" src="http://latex.codecogs.com/gif.latex?X^TX" alt="" />, and the n x n matrix <img decoding="async" title="V=(V_1,V_2)" src="http://latex.codecogs.com/gif.latex?V=(V_1,V_2)" alt="" /> is orthogonal with <img decoding="async" title="V_1=(v_1,...v_r)" src="http://latex.codecogs.com/gif.latex?V_1=(v_1,...v_r)" alt="" /> consisting of the first  r columns of  <img decoding="async" title="V" src="http://latex.codecogs.com/gif.latex?V" alt="" />, and <img decoding="async" title="V_2=(v_{r+1},...v_n)" src="http://latex.codecogs.com/gif.latex?V_2=(v_{r+1},...v_n)" alt="" /> consisting of the remaining n – r columns of  <img decoding="async" title="V" src="http://latex.codecogs.com/gif.latex?V" alt="" />. Using this formulation we can write an approximation <img decoding="async" title="\widehat{X^TX}" src="http://latex.codecogs.com/gif.latex?\widehat{X^TX}" alt="" /> to <img decoding="async" title="X^TX" src="http://latex.codecogs.com/gif.latex?X^TX" alt="" /> using the r largest eigenvalues as</p>
<p><img decoding="async" title="\widehat{X^TX}=V_1 \Lambda_1 V_1^T" src="http://latex.codecogs.com/gif.latex?\widehat{X^TX}=V_1 \Lambda_1 V_1^T" alt="" />.</p>
<p>If we substitute this approximation into the normal equations 2, and do some simplification, we end up with the <em>principal components estimator</em></p>
<p><img decoding="async" title="\textrm{(3) }\widehat{\beta^{(r)}}=V_1 \Lambda_1^{-1} V_1^T X^T y" src="http://latex.codecogs.com/gif.latex?\textrm{(3) }\widehat{\beta^{(r)}}=V_1 \Lambda_1^{-1} V_1^T X^T y" alt="" />.</p>
<p>While we could use equation 3 directly, it is usually not the best way to perform principal components regression. The next article in this series will illustrate an algorithm for PCR and implement it using the NMath libraries.</p>
<p>-Steve</p>
<p>The post <a rel="nofollow" href="https://www.centerspace.net/priniciple-components-regression-in-csharp">Principal Components Regression: Part 2 &#8211; The Problem With Linear Regression</a> appeared first on <a rel="nofollow" href="https://www.centerspace.net">CenterSpace</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.centerspace.net/priniciple-components-regression-in-csharp/feed</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1816</post-id>	</item>
		<item>
		<title>Principal Component Regression: Part 1 &#8211; The Magic of the SVD</title>
		<link>https://www.centerspace.net/theoretical-motivation-behind-pcr</link>
					<comments>https://www.centerspace.net/theoretical-motivation-behind-pcr#comments</comments>
		
		<dc:creator><![CDATA[Steve Sneller]]></dc:creator>
		<pubDate>Mon, 08 Feb 2010 17:44:45 +0000</pubDate>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Theory]]></category>
		<category><![CDATA[PCR]]></category>
		<category><![CDATA[PCR c#]]></category>
		<category><![CDATA[principal component regression]]></category>
		<category><![CDATA[singular value decomposition]]></category>
		<category><![CDATA[SVD]]></category>
		<category><![CDATA[svd c#]]></category>
		<guid isPermaLink="false">http://www.centerspace.net/blog/?p=1307</guid>

					<description><![CDATA[<p><img src="https://www.centerspace.net/blog/wp-content/uploads/2010/02/StevePCA_width400.jpg" alt="SVD of a 2x2 matrix" title="SVD of a 2x2 matrix" class="excerpt" /><br />
This is the first part of a multi-part series on Principal Component Regression, or PCR for short. We will eventually end up with a computational algorithm for PCR and code it up using C# using the NMath libraries. PCR is a method for constructing a linear regression model in the case that we have a large number of predictor variables which are highly correlated. Of course, we don't know exactly which variables are correlated, otherwise we'd just throw them out and perform a normal linear regression.</p>
<p>The post <a rel="nofollow" href="https://www.centerspace.net/theoretical-motivation-behind-pcr">Principal Component Regression: Part 1 &#8211; The Magic of the SVD</a> appeared first on <a rel="nofollow" href="https://www.centerspace.net">CenterSpace</a>.</p>
]]></description>
										<content:encoded><![CDATA[<h2>Introduction</h2>
<p>This is the first part of a multi-part series on Principal Component Regression, or PCR for short. We will eventually end up with a computational algorithm for PCR and code it up using C# using the NMath libraries. PCR is a method for constructing a linear regression model in the case that we have a large number of predictor variables which are highly correlated. Of course, we don&#8217;t know exactly which variables are correlated, otherwise we&#8217;d just throw them out and perform a normal linear regression.</p>
<p>In order to understand what is going on in the PCR algorithm, we need to know a little bit about the SVD (Singular Value Decomposition). Understanding a bit about the SVD and it&#8217;s relationship to the eigenvalue decomposition will go a long way in understanding the PCR algorithm.</p>
<h2>The Singular Value Decomposition</h2>
<p>The SVD (Singular Value Decomposition)  is one of the most revealing matrix decompositions in linear algebra. A bit expensive to compute, but the bounty of information it yields is awe inspiring. Understanding a little about the SVD will illuminate the Principal Components Regression (PCR) algorithm. The SVD may seem like a deep and mysterious thing, at least I thought it was until I read the chapters covering it in the book  <a href="https://www.iri.upc.edu/people/thomas/Collection/details/5350.html">&#8220;Numerical Linear Algebra&#8221;</a>, by Lloyd N. Trefethen, and David Bau, III, which I summarize below.<br />
<span id="more-1307"></span><br />
We begin with an easy to state, and not too difficult to prove geometric statement about linear transformations.</p>
<h2>A Geometric Fact</h2>
<p>Let <img decoding="async" title="S" src="http://latex.codecogs.com/gif.latex?S" alt="" /> be the unit sphere in <img decoding="async" title="\mathbb{R}^{n}" src="http://latex.codecogs.com/gif.latex?\mathbb{R}^{n}" alt="" />, and let  <img decoding="async" title="X \in \mathbb{R}^{mxn}" src="http://latex.codecogs.com/gif.latex?X \in \mathbb{R}^{mxn}" alt="" /> be any matrix mapping <img decoding="async" title="\mathbb{R}^{n}" src="http://latex.codecogs.com/gif.latex?\mathbb{R}^{n}" alt="" /> into <img decoding="async" title="\mathbb{R}^{n}" src="http://latex.codecogs.com/gif.latex?\mathbb{R}^{m}" alt="" /> and suppose, for the moment, that <img decoding="async" title="X" src="http://latex.codecogs.com/gif.latex?X" alt="" /> has full rank. Then the image, <img decoding="async" title="XS" src="http://latex.codecogs.com/gif.latex?XS" alt="" /> of <img decoding="async" title="S" src="http://latex.codecogs.com/gif.latex?S" alt="" /> under <img decoding="async" title="X" src="http://latex.codecogs.com/gif.latex?X" alt="" /> is a hyperellipse in <img decoding="async" title="\mathbb{R}^{n}" src="http://latex.codecogs.com/gif.latex?\mathbb{R}^{m}" alt="" /> (see the book for the proof).</p>
<figure id="attachment_1360" aria-describedby="caption-attachment-1360" style="width: 400px" class="wp-caption aligncenter"><a href="https://www.centerspace.net/blog/wp-content/uploads/2010/02/StevePCA_width400.jpg"><img decoding="async" loading="lazy" class="size-full wp-image-1360" title="SVD of a 2x2 matrix" src="https://www.centerspace.net/blog/wp-content/uploads/2010/02/StevePCA_width400.jpg" alt="SVD of a 2x2 matrix" width="400" height="184" srcset="https://www.centerspace.net/wp-content/uploads/2010/02/StevePCA_width400.jpg 400w, https://www.centerspace.net/wp-content/uploads/2010/02/StevePCA_width400-300x138.jpg 300w" sizes="(max-width: 400px) 100vw, 400px" /></a><figcaption id="caption-attachment-1360" class="wp-caption-text">Figure 1.  SVD of a 2x2 matrix</figcaption></figure>
<p>Given this fact we make the following definitions (refer to Figure 1.):</p>
<p>Define the singular values ,</p>
<p><img decoding="async" title="\sigma _{1}\cdots\sigma_{n}" src="http://latex.codecogs.com/gif.latex?\sigma _{1}\cdots\sigma_{n}" alt="" /></p>
<p>of <img decoding="async" title="X" src="http://latex.codecogs.com/gif.latex?X" alt="" /> to be the lengths of the <img decoding="async" title="n" src="http://latex.codecogs.com/gif.latex?n" alt="" /> principal semiaxes of the hyperellipse <img decoding="async" title="XS" src="http://latex.codecogs.com/gif.latex?XS" alt="" />. It is conventional to assume the singular values are numbered in descending order</p>
<p><img decoding="async" title="\inline \sigma {1}\geq \sigma _{2}\geq\cdots\geq \sigma_{n}" src="http://latex.codecogs.com/gif.latex?\inline \sigma {1}\geq \sigma _{2}\geq\cdots\geq \sigma_{n}" alt="" /></p>
<p>Define the left singular vectors</p>
<p><img decoding="async" title="u_{1},\cdots,u_{m}" src="http://latex.codecogs.com/gif.latex?u_{1},\cdots,u_{n}" alt="" /></p>
<p>to be unit vectors in the direction of the principal semiaxes of <img decoding="async" title="XS" src="http://latex.codecogs.com/gif.latex?XS" alt="" /> and define the right singular vectors,</p>
<p><img decoding="async" title="v_{1}\cdots v_{n}" src="http://latex.codecogs.com/gif.latex?v_{1}\cdots v_{n}" alt="" />,</p>
<p>to be the pre-images of the principal semiaxes of <img decoding="async" title="XS" src="http://latex.codecogs.com/gif.latex?XS" alt="" /> so that</p>
<p><img decoding="async" title="Xv_{i} = \sigma_{i}u_{i}" src="http://latex.codecogs.com/gif.latex?Xv_{i} = \sigma_{i}u_{i}" alt="" />.</p>
<p>In matrix form we have</p>
<p><img decoding="async" title="XV = U \Sigma" src="http://latex.codecogs.com/gif.latex?XV = U \Sigma" alt="" />,</p>
<p>where <img decoding="async" src="http://latex.codecogs.com/gif.latex?V" alt="" /> is the <img decoding="async" src="http://latex.codecogs.com/gif.latex?n\textrm{ x }n" alt="" /> orthonormal matrix whose columns are the right singular vectors of <img decoding="async" src="http://latex.codecogs.com/gif.latex?X" alt="" />, <img decoding="async" src="http://latex.codecogs.com/gif.latex?\Sigma" alt="" /> is an <img decoding="async" src="http://latex.codecogs.com/gif.latex?n\textrm{ x }n" alt="" /> diagonal matrix with positive entries equal to the singular values, and <img decoding="async" src="http://latex.codecogs.com/gif.latex?U" alt="" /> is an <img decoding="async" src="http://latex.codecogs.com/gif.latex?m\textrm{ x }n" alt="" /> matrix whose orthonormal columns are the left singular vectors.<br />
Since the columns of <img decoding="async" src="http://latex.codecogs.com/gif.latex?V" alt="" /> are orthonormal by construction, <img decoding="async" src="http://latex.codecogs.com/gif.latex?V" alt="" />is a <em>unitary</em> matrix, that is it&#8217;s transpose is equal to it&#8217;s inverse, thus we can write</p>
<p><img decoding="async" title="\textrm{(2) }X = U \Sigma V^{T}" src="http://latex.codecogs.com/gif.latex?\textrm{(2) }X = U \Sigma V^{T}" alt="" /></p>
<p>And there you have it, the SVD is all it&#8217;s majesty! Actually the above decomposition is what is known as the <em>reduced </em>SVD. Note that the columns of <img decoding="async" src="http://latex.codecogs.com/gif.latex?U" alt="" />are <img decoding="async" src="http://latex.codecogs.com/gif.latex?n" alt="" /> orthonormal vectors in <img decoding="async" src="http://latex.codecogs.com/gif.latex?m" alt="" /> dimensional space. <img decoding="async" src="http://latex.codecogs.com/gif.latex?U" alt="" /> can be extended to a unitary matrix by adjoining an additional <img decoding="async" src="http://latex.codecogs.com/gif.latex?m-n" alt="" /> orthonormal columns. If in addition we append <img decoding="async" src="http://latex.codecogs.com/gif.latex?m-n" alt="" /> rows of zeros to the bottom of the matrix <img decoding="async" src="http://latex.codecogs.com/gif.latex?\Sigma" alt="" />, it will effectively multiply the appended columns in <img decoding="async" src="http://latex.codecogs.com/gif.latex?U" alt="" /> by zero, thus preserving equation (2). When <img decoding="async" src="http://latex.codecogs.com/gif.latex?U" alt="" /> and <img decoding="async" src="http://latex.codecogs.com/gif.latex?\Sigma" alt="" /> are modified in this way equation (2) is called the <em>full</em> SVD.</p>
<h2>The Relationship Between Singular Values and Eigenvalues</h2>
<p>There is an important relationship between the singular values of <img decoding="async" title="X" src="http://latex.codecogs.com/gif.latex?X" alt="" /> and the eigenvalues of <img decoding="async" title="X^{T}X" src="http://latex.codecogs.com/gif.latex?X^{T}X" alt="" />. Recall that a vector <img decoding="async" title="v" src="http://latex.codecogs.com/gif.latex?v" alt="" /> is an eigenvector with corresponding eigenvalue <img decoding="async" title="\lambda" src="http://latex.codecogs.com/gif.latex?\lambda" alt="" /> for a matrix <img decoding="async" title="X" src="http://latex.codecogs.com/gif.latex?X" alt="" /> if and only if <img decoding="async" title="Xv=\lambda v" src="http://latex.codecogs.com/gif.latex?Xv=\lambda v" alt="" />. Now, suppose we have the full SVD for <img decoding="async" src="http://latex.codecogs.com/gif.latex?X" alt="" /> as in equation (2). Then</p>
<p><img decoding="async" title="X^{T}X=(U\Sigma V^{T})^{T}(U \Sigma V^{T})" src="http://latex.codecogs.com/gif.latex?X^{T}X=(U\Sigma V^{T})^{T}(U \Sigma V^{T})" alt="" /></p>
<p><img decoding="async" title="= V \Sigma ^{T}U^{T}U \Sigma V^{T}" src="http://latex.codecogs.com/gif.latex?= V \Sigma ^{T}U^{T}U \Sigma V^{T}" alt="" /></p>
<p><img decoding="async" title="= V \Sigma^{T} \Sigma V^{T}" src="http://latex.codecogs.com/gif.latex?= V \Sigma^{T} \Sigma V^{T}" alt="" /></p>
<p>or,</p>
<p><img decoding="async" title="(X^{T}X)V = V \Lambda" src="http://latex.codecogs.com/gif.latex?(X^{T}X)V = V \Lambda" alt="" /></p>
<p>where we have used the fact that <img decoding="async" src="http://latex.codecogs.com/gif.latex?U" alt="" /> and <img decoding="async" src="http://latex.codecogs.com/gif.latex?V" alt="" /> are unitary and set</p>
<p><img decoding="async" src="http://latex.codecogs.com/gif.latex?\Lambda = \Sigma^{T} \Sigma" alt="" />.</p>
<p>Note that <img decoding="async" src="http://latex.codecogs.com/gif.latex?\Lambda" alt="" /> is a diagonal matrix with the singular values squared along the diagonal. From this it follows that the columns of <img decoding="async" src="http://latex.codecogs.com/gif.latex?V" alt="" />are eigenvectors for <img decoding="async" title="X^{T}X" src="http://latex.codecogs.com/gif.latex?X^{T}X" alt="" /> and the main diagonal of <img decoding="async" src="http://latex.codecogs.com/gif.latex?\Lambda" alt="" /> contain the corresponding eigenvalues. Thus the nonzero singular values of <img decoding="async" src="http://latex.codecogs.com/gif.latex?X" alt="" /> are the square roots of the nonzero eigenvalues of <img decoding="async" title="X^{T}X" src="http://latex.codecogs.com/gif.latex?X^{T}X" alt="" />.</p>
<p>We need one more very cool fact about the SVD before we get to the algorithm. Low-rank approximation.</p>
<h2>Low-Rank Approximation</h2>
<p>Suppose now that <img decoding="async" src="http://latex.codecogs.com/gif.latex?X" alt="" /> has rank <img decoding="async" src="http://latex.codecogs.com/gif.latex?r" alt="" /> and write <img decoding="async" src="http://latex.codecogs.com/gif.latex?\Sigma" alt="" /> in equation (2) as the sum of <img decoding="async" src="http://latex.codecogs.com/gif.latex?r" alt="" /> rank one matrices (each <img decoding="async" src="http://latex.codecogs.com/gif.latex?r\textrm{ x }r" alt="" /> rank one matrix will be all zeros except for <img decoding="async" src="http://latex.codecogs.com/gif.latex?\sigma_{j}" alt="" /> as the <img decoding="async" src="http://latex.codecogs.com/gif.latex?j" alt="" />th diagonal element). We can then, using equation (2), write <img decoding="async" src="http://latex.codecogs.com/gif.latex?X" alt="" /> as the sum of rank one matrices,</p>
<p><img decoding="async" title="\textrm{(3)  }X=\sum_{j=1}^{r} \sigma_{j}u_{j}v_{j}^{T}" src="http://latex.codecogs.com/gif.latex?\textrm{(3)  }X=\sum_{j=1}^{r} \sigma_{j}u_{j}v_{j}^{T}" alt="" /></p>
<p>Equation (3) gives us a way to approximate any rank <img decoding="async" src="http://latex.codecogs.com/gif.latex?r" alt="" /> matrix <img decoding="async" src="http://latex.codecogs.com/gif.latex?X" alt="" /> by a lower rank <img decoding="async" src="http://latex.codecogs.com/gif.latex?k &lt; r" alt="" /> matrix. Indeed, given <img decoding="async" src="http://latex.codecogs.com/gif.latex?k &lt; r" alt="" />, form the <img decoding="async" src="http://latex.codecogs.com/gif.latex?k\textrm{th}" alt="" />partial sum</p>
<p><img decoding="async" title="X_{k}=\sum_{j=1}^{k} \sigma_{j}u_{j}v_{j}^{T}" src="http://latex.codecogs.com/gif.latex?X_{k}=\sum_{j=1}^{k} \sigma_{j}u_{j}v_{j}^{T}" alt="" /></p>
<p>Then <img decoding="async" src="http://latex.codecogs.com/gif.latex?X_{k}" alt="" /> is a rank <img decoding="async" src="http://latex.codecogs.com/gif.latex?k" alt="" /> approximation for <img decoding="async" src="http://latex.codecogs.com/gif.latex?X" alt="" />.  How good is this approximation? Turns out it&#8217;s the best rank <img decoding="async" src="http://latex.codecogs.com/gif.latex?k" alt="" /> approximation you can get.</p>
<h2>Computing the Low-Rank Approximations Using NMath</h2>
<p>The NMath library provides two classes for computing the SVD for a matrix (actually 8 since there SVD classes for each of the datatypes <code>Double</code>, <code>Float</code>, <code>DoubleComplex</code> and <code>FloatComplex</code>). There is a basic decomposition class for computing the standard, reduced SVD, and a decomposition server class when more control is desired. Here is a simple C# routine that constructs the low-rank approximations for a matrix <img decoding="async" src="http://latex.codecogs.com/gif.latex?X" alt="" /> and prints out the Frobenius norms of difference between <img decoding="async" src="http://latex.codecogs.com/gif.latex?X" alt="" /> and each of it&#8217;s low-rank approximations.</p>
<pre lang="csharp">static void LowerRankApproximations( DoubleMatrix X )
{
  // Construct the reduced SVD for X. We will consider
  // all singular values less than 1e-15 to be zero.
  DoubleSVDecomp decomp = new DoubleSVDecomp( X );
  decomp.Truncate( 1e-15 );
  int r = decomp.Rank;
  Console.WriteLine( "The {0}x{1} matrix X has rank {2}", X.Rows, X.Cols, r );

  // Construct the best lower rank approximations to X and
  // look at the frobenius norm of their differences.
  DoubleMatrix LowerRankApprox =
    new DoubleMatrix( X.Rows, X.Cols );
  double differenceNorm;
  for ( int k = 0; k &lt; r; k++ )
  {
    LowerRankApprox += decomp.SingularValues[k] *
      NMathFunctions.OuterProduct( decomp.LeftVectors.Col( k ), decomp.RightVectors.Col( k ) );
    differenceNorm = ( X - LowerRankApprox ).FrobeniusNorm();
    Console.WriteLine( "Rank {0} approximation difference
      norm = {1:F4}", k+1, differenceNorm );
  }
}</pre>
<p>Here&#8217;s the output for a matrix with 10 rows and 20 columns. Note that the rank can be at most 10.</p>
<pre lang="csharp">The 10x20 matrix X has rank 10
Rank 1 approximation difference norm = 3.7954
Rank 2 approximation difference norm = 3.3226
Rank 3 approximation difference norm = 2.9135
Rank 4 approximation difference norm = 2.4584
Rank 5 approximation difference norm = 2.0038
Rank 6 approximation difference norm = 1.5689
Rank 7 approximation difference norm = 1.1829
Rank 8 approximation difference norm = 0.8107
Rank 9 approximation difference norm = 0.3676
Rank 10 approximation difference norm = 0.0000</pre>
<p>-Steve</p>
<p>The post <a rel="nofollow" href="https://www.centerspace.net/theoretical-motivation-behind-pcr">Principal Component Regression: Part 1 &#8211; The Magic of the SVD</a> appeared first on <a rel="nofollow" href="https://www.centerspace.net">CenterSpace</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.centerspace.net/theoretical-motivation-behind-pcr/feed</wfw:commentRss>
			<slash:comments>8</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1307</post-id>	</item>
	</channel>
</rss>
