| LogisticRegressionParameterCalcDesignVariables(IDFColumn) Method |
Convenience method for generating design, or dummy, variables which
replace independent variables in a logistic model that take on discrete,
nominal scaled values. The encoding method used is "reference cell coding".
where the group with the SMALLEST code serves as the reference group. The
method is described in the remark below.
Namespace: CenterSpace.NMath.CoreAssembly: NMath (in NMath.dll) Version: 7.4
Syntax public static DataFrame DesignVariables(
IDFColumn catagoricalDataCol
)
Public Shared Function DesignVariables (
catagoricalDataCol As IDFColumn
) As DataFrame
public:
static DataFrame^ DesignVariables(
IDFColumn^ catagoricalDataCol
)
static member DesignVariables :
catagoricalDataCol : IDFColumn -> DataFrame
Parameters
- catagoricalDataCol IDFColumn
- A column containaing the nominally scaled
variable values.
Return Value
DataFrameDataFrame containing the design variables encoded using "reference
cell encoding". The group with the SMALLEST code serves as the reference
group. If the input column name is X, and the variable X has k possible values,
the output
DataFrame will contain k - 1 columns with names:
X_0, X_1,...,X_(k-1)
Remarks If a nominal scaled variable has k possible values, then k - 1 design variables
will be created, each with a value of zero or one. The design variable values
are encoded by setting all design variables to zero for the reference group, and then
setting a single design variable equal to one for each of the other groups.
The design variables replace the nominally scaled variable in the model.
Suppose that the jth independent variable, xj has k levels. Denote by Dju,
the design variables and denote the coefficients for these design variables by
Bju, u = 1, 2,...,k-1. Then the logit for the model with p variables and the
jth variable being discrete would be
g(x) = B0 + B1*x1 +...+ (Bj1*Dj1 + Bj2*Dj2 +...+Bj(k-1)*Dj(k-1) +...+ Bp*xp
For example, suppose that Race is an independent variable in a model with
three possible values: white, black and other. Suppose further that these values
have been encoded in the data as white = 1, black = 2, and other = 3. The input
to the DesignVariables function would be a data frame column with name = Race
and the numerical values for each subject.
This function would then generate a data frame containing 3 - 1 = 2 columns for
the two design variables with names Race_0 and Race_1.
Sample input/output -
Input Column:
Race
----
1
1
2
1
1
3
3
2
1
Output DataFrame:
Race_0 Race_1
------ ------
0 0
0 0
1 0
0 0
0 0
0 1
0 1
1 0
0 0
See Also