**7.2****
****Creating Logistic Regressions** (.NET, C#, CSharp, VB, Visual Basic, F#)

A **LogisticRegression**
object is constructed from data in the following format: a matrix whose
rows contain the predictor variable values, and an **IList<bool>**
for the observed values.

Code Example – C# logistic regression

DoubleMatrix A = ... bool[] obs = ... var lr = new LogisticRegression<NewtonRaphsonParameterCalc>( A, obs );

A **MismatchedSizeException**
is raised if the number of rows in the matrix A
is not equal to the length of the vector obs.

If you want the model to have an intercept parameter, you can specify that as well:

Code Example – C# logistic regression

bool addIntercept = true; var lr = new LogisticRegression<NewtonRaphsonParameterCalc>( A, obs, addIntercept );

If true, a column of ones is prepended onto the data in the regression matrix A, thus adding an intercept to the model. If false, the data in the regression matrix is used as given.

You can also provide a regression calculator instance
to use. For example, if you want regression to fail consistently when
the regression matrix is rank deficient, you can construct a **NewtonRaphsonParameterCalc** object with
the FailIfNotFullRank property set to true (see Section 7.1), then construct a **LogisticRegression** object with the resulting
parameter calculation object:

Code Example – C# logistic regression

var parameterCalc = new NewtonRaphsonParameterCalc() { FailIfNotFullRank = true }; var lr = new LogisticRegression<NewtonRaphsonParameterCalc>( A, obs, addIntercept, parameterCalc );

Additional **LogisticRegression**
constructors provide flexibility in how the observation values are specified.
For example, you can provide a vector of floating point observation values,
which is converted to dichotomous values using a supplied **Predictate<double>** function. This
code uses a lambda expression to specify the predicate:

Code Example – C# logistic regression

DoubleVector v = ... var lr = new LogisticRegression<NewtonRaphsonParameterCalc>( A, v, x => x >= 110.0, addIntercept);

Similarly, you can provide the observation values as one of the columns of the regression matrix:

Code Example – C# logistic regression

int observationColIndex = 0; var lr = new LogisticRegression<NewtonRaphsonParameterCalc>( A, observationColIndex, x => x != 0, addIntercept);

**LogisticRegression**
provides static convenience method DesignVariables()
for producing design, or dummy, variables using
*reference cell coding*. If the categorical
variable has *k* levels, there will
be *k - 1* design variables created.
Reference cell coding involves setting all the design variable values
to 0 for the reference group, and then setting
a single design variable equal to 1 for
each of the other groups.

For example, suppose we have a **DataFrame**
df with a column of race values, which has
three levels.

Code Example – C# logistic regression

int raceColIndex = df.IndexOfColumn( "Race" ); DataFrame raceDesignVars = LogisticRegression<NewtonRaphsonParameterCalc>.DesignVariables( df[raceColIndex] );

Since the race variable has three levels there will be two design variables. By default they will be named Race_0 and Race_1.

We then replace the original race column with the two design variable columns, and convert the data frame to a matrix of floating point values.

Code Example – C# logistic regression

df.RemoveColumn( raceColIndex ); for ( int c = 0; c < raceDesignVars.Cols; c++ ) { df.InsertColumn( raceColIndex + c, raceDesignVars[c] ); } DoubleMatrix matrixDat = data.ToDoubleMatrix();