NMath Stats User's Guide

TOC | Previous | Next | Index

7.2 Creating Logistic Regressions (.NET, C#, CSharp, VB, Visual Basic, F#)

A LogisticRegression object is constructed from data in the following format: a matrix whose rows contain the predictor variable values, and an IList<bool> for the observed values.

Code Example – C# logistic regression

DoubleMatrix A = ...
bool[] obs = ...
var lr = new LogisticRegression<NewtonRaphsonParameterCalc>( 
  A, obs );

A MismatchedSizeException is raised if the number of rows in the matrix A is not equal to the length of the vector obs.

If you want the model to have an intercept parameter, you can specify that as well:

Code Example – C# logistic regression

bool addIntercept = true;
var lr = new LogisticRegression<NewtonRaphsonParameterCalc>( 
  A, obs, addIntercept );

If true, a column of ones is prepended onto the data in the regression matrix A, thus adding an intercept to the model. If false, the data in the regression matrix is used as given.

You can also provide a regression calculator instance to use. For example, if you want regression to fail consistently when the regression matrix is rank deficient, you can construct a NewtonRaphsonParameterCalc object with the FailIfNotFullRank property set to true (see Section 7.1), then construct a LogisticRegression object with the resulting parameter calculation object:

Code Example – C# logistic regression

var parameterCalc = new NewtonRaphsonParameterCalc() {   
  FailIfNotFullRank = true };
var lr = new LogisticRegression<NewtonRaphsonParameterCalc>(
  A, obs, addIntercept, parameterCalc );

Additional LogisticRegression constructors provide flexibility in how the observation values are specified. For example, you can provide a vector of floating point observation values, which is converted to dichotomous values using a supplied Predictate<double> function. This code uses a lambda expression to specify the predicate:

Code Example – C# logistic regression

DoubleVector v = ...
var lr = new LogisticRegression<NewtonRaphsonParameterCalc>(
  A, v, x => x >= 110.0, addIntercept);

Similarly, you can provide the observation values as one of the columns of the regression matrix:

Code Example – C# logistic regression

int observationColIndex = 0;
var lr = new LogisticRegression<NewtonRaphsonParameterCalc>(
  A, observationColIndex, x => x != 0, addIntercept);

Design Variables

LogisticRegression provides static convenience method DesignVariables() for producing design, or dummy, variables using reference cell coding. If the categorical variable has k levels, there will be k - 1 design variables created. Reference cell coding involves setting all the design variable values to 0 for the reference group, and then setting a single design variable equal to 1 for each of the other groups.

For example, suppose we have a DataFrame df with a column of race values, which has three levels.

Code Example – C# logistic regression

int raceColIndex = df.IndexOfColumn( "Race" );
DataFrame raceDesignVars = 
  LogisticRegression<NewtonRaphsonParameterCalc>.DesignVariables( 
    df[raceColIndex] );

Since the race variable has three levels there will be two design variables. By default they will be named Race_0 and Race_1.

We then replace the original race column with the two design variable columns, and convert the data frame to a matrix of floating point values.

Code Example – C# logistic regression

df.RemoveColumn( raceColIndex );
for ( int c = 0; c < raceDesignVars.Cols; c++ )
{
  df.InsertColumn( raceColIndex + c, raceDesignVars[c] );
}
DoubleMatrix matrixDat = data.ToDoubleMatrix();

 


Top

Top