Statistics

Principal Components Regression: Part 3 – The NIPALS Algorithm

Principal Components Regression: Recap of Part 2 Recall that the least squares solution to the multiple linear problem is given by (1) And that problems occurred finding when the matrix (2) was close to being singular. The Principal Components Regression approach to addressing the problem is to replace in equation (1) with a better conditioned approximation. This approximatio...
Read More

Principal Components Regression: Part 2 – The Problem With Linear Regression

This is the second part in a three part series on PCR, the first article on the topic can be found here. The Linear Regression Model Multiple Linear Regression (MLR) is a common approach to modeling the relationship between one or two or more explanatory variables and a response variable by fitting a linear equation to observed data. First let’s set up some notation. I will be rather brief, ass...
Read More

Principal Component Regression: Part 1 – The Magic of the SVD

Introduction This is the first part of a multi-part series on Principal Component Regression, or PCR for short. We will eventually end up with a computational algorithm for PCR and code it up using C# using the NMath libraries. PCR is a method for constructing a linear regression model in the case that we have a large number of predictor variables which are highly correlated. Of course, we don't ...
Read More

Savitzky-Golay Smoothing in C#

Savitzky-Golay smoothing effectively removes local signal noise while preserving the shape of the signal. Commonly, it's used as a preprocessing step with experimental data, especially spectrometry data because of it's effectiveness at removing random variation while minimally degrading the signal's information content. Savitzky-Golay boils down to a fast (multi-core scaling) correlation operati...
Read More

Variance inflation factors

A customer contacted us about computing "variance inflation factors". Wikipedia defines this as: In statistics, the variance inflation factor (VIF) is a method of detecting the severity of multicollinearity. More precisely, the VIF is an index which measures how much the variance of a coefficient (square of the standard deviation) is increased because of collinearity. [Ref] Here's an implemen...
Read More
Top