| LogisticRegressionParameterCalcDesignVariables(IDFColumn, IComparable) Method |
Convenience method for generating design, or dummy, variables which
replace independent variables in a logistic model that take on discrete,
nominal scaled values. The encoding method used is "reference cell coding"
with the group with the speficied code serves as the reference group. The
method is described in the remark below.
Namespace: CenterSpace.NMath.CoreAssembly: NMath (in NMath.dll) Version: 7.4
Syntax public static DataFrame DesignVariables(
IDFColumn catagoricalDataCol,
IComparable referenceCode
)
Public Shared Function DesignVariables (
catagoricalDataCol As IDFColumn,
referenceCode As IComparable
) As DataFrame
public:
static DataFrame^ DesignVariables(
IDFColumn^ catagoricalDataCol,
IComparable^ referenceCode
)
static member DesignVariables :
catagoricalDataCol : IDFColumn *
referenceCode : IComparable -> DataFrame
Parameters
- catagoricalDataCol IDFColumn
- A column containaing the nominally scaled
variable values.
- referenceCode IComparable
- The group with this level will be the reference
group. Must have the same type as the elements of catagoricalDataCol
Return Value
DataFrameDataFrame containing the design variables encoded using "reference
cell encoding". The group with the specified code serves as the reference
group. If the input column name is X, and the variable X has k possible values,
the output
DataFrame will contain k - 1 columns with names:
X_0, X_1,...,X_(k-1)
Exceptions Exception | Condition |
---|
InvalidArgumentException | Thrown if the referenceCode
parameter does not have the same type as the type of the elements of the
the categorical column. |
Remarks If a nominal scaled variable has k possible values, then k - 1 design variables
will be created, each with a value of zero or one. The design variable values
are encoded by setting all design variables to zero for the reference group, and then
setting a single design variable equal to one for each of the other groups.
The design variables replace the nominally scaled variable in the model.
Suppose that the jth independent variable, xj has k levels. Denote by Dju,
the design variables and denote the coefficients for these design variables by
Bju, u = 1, 2,...,k-1. Then the logit for the model with p variables and the
jth variable being discrete would be
g(x) = B0 + B1*x1 +...+ (Bj1*Dj1 + Bj2*Dj2 +...+Bj(k-1)*Dj(k-1) +...+ Bp*xp
For example, suppose that Race is an independent variable in a model with
three possible values: white, black and other. Suppose further that these values
have been encoded in the data as white = 1, black = 2, and other = 3 and that
other = 3 is specified as the reference group. The input
to the DesignVariables function would be a data frame column with name = Race
and the numerical values for each subject.
This function would then generate a data frame containing 3 - 1 = 2 columns for
the two design variables with names Race_0 and Race_1.
Sample input/output -
Input Column:
Race
----
1
1
2
1
1
3
3
2
1
Output DataFrame:
Race_0 Race_1
------ ------
1 0
1 0
0 1
1 0
1 0
0 0
0 0
0 1
0 1
See Also