Imports System Imports System.Collections.Generic Imports System.IO Imports CenterSpace.NMath.Core Namespace CenterSpace.NMath.Examples.VisualBasic A .NET example in Visual Basic showing how to perform logistic regression. Module LogisticRegressionExample Sub Main() Console.WriteLine("Coronary Heart Disease Example -----------------") Console.WriteLine(Environment.NewLine) CoronaryHeartDiseaseAge() Console.WriteLine(Environment.NewLine) Console.WriteLine("Low Birth Weight Example -----------------------") Console.WriteLine(Environment.NewLine) LowBirthWeight() Console.WriteLine(Environment.NewLine) Console.WriteLine("Crime Example -----------------------------------") Console.WriteLine(Environment.NewLine) Crime() Console.WriteLine(Environment.NewLine) Console.WriteLine("Press Enter Key") Console.Read() End Sub Example relating the presence of coronary heart disease and age. The data consist of subjects age and the whether or not the subject displays evidence of coronary heart disease (1 for present, 0 for not present). Private Sub CoronaryHeartDiseaseAge() The data for this example are stored in a matrix. The first column contains the independent, or predictor, variable values. The second column contains the observed outcome values (0 or 1), where 1 indicates the presence of coronary heart disease, and 0 denotes its absence. Dim ChdDataAll As New DoubleMatrix(New StreamReader(New FileStream("chdage.mat", FileMode.Open))) Dim ChdData As DoubleMatrix = ChdDataAll(Slice.All, New Slice(1, 2)) If (ChdData = Nothing) Then Console.WriteLine("Could not load data for coronary heart disease example. Exiting.") Return End If A logistic regression can be constructed from data in the following format: a matrix whose rows contain the predictor variable values, and a vector of booleans for the observed values. Dim Obs(ChdData.Rows - 1) As Boolean Dim I As Integer For I = 0 To ChdData.Rows - 1 Obs(I) = ChdData(I, 1) <> 0 Next Dim RegMat As DoubleMatrix = ChdData(Slice.All, New Slice(0, 1)) The logistic regression Module takes a Module parameter indicating the parameter calculation algorithm to use. Here we use a Newton-Raphson calculator Module, essentially an iteratively reweighted least squares. Since we want our model to have an intercept parameter, we set the last argument to true. Dim LogReg As New LogisticRegression(Of NewtonRaphsonParameterCalc)(RegMat, Obs, True) First we check that parameter calculation is successful. If not, we print out some diagnostic information and exit. If (Not LogReg.IsGood) Then Console.WriteLine("Logistic regression parameter calculation failed:") Console.WriteLine(LogReg.ParameterCalculationErrorMessage) Dim ParameterCalc = LogReg.ParameterCalculator Console.WriteLine("Maximum iterations: " & ParameterCalc.MaxIterations) Console.WriteLine("Number of iterations: " & ParameterCalc.Iterations) Console.WriteLine("Newton Raphson converged: " & ParameterCalc.Converged) Return End If Parameter calculation is successful. The fit analysis Module is still under construction and will contain more statistics. For now we look at the G-statistic. Dim FitAnalysis As New LogisticRegressionFitAnalysis(Of NewtonRaphsonParameterCalc)(LogReg) Console.WriteLine("Log likelihood: " & FitAnalysis.LogLikelihood.ToString("G3")) Console.WriteLine("G-statistic: " & FitAnalysis.GStatistic.ToString("G3")) Console.WriteLine("G-statistic P-value: " & FitAnalysis.GStatisticPValue.ToString("G3")) Console.WriteLine() Print out the parameter values and related statistics: Dim ParameterEstimates() As LogisticRegressionParameter(Of NewtonRaphsonParameterCalc) = LogReg.ParameterEstimates Console.WriteLine("Intercept Parameter:") Console.WriteLine(ParameterEstimates(0).ToString()) Console.WriteLine() Console.WriteLine("Age Coefficient:") Console.WriteLine(ParameterEstimates(1).ToString()) Console.WriteLine() Predict the probability of the presence of coronary heart disease for some ages. Dim Ages As New DoubleMatrix("5x1 [29.0 37.0 48.0 64.0 78.0]") Dim Probabilities As DoubleVector = LogReg.PredictedProbabilities(Ages) For I = 0 To Ages.Rows - 1 Console.WriteLine("The probability of the presence of coronary heart disease at age {0} is {1}", Ages(I, 0), Probabilities(I).ToString("G3")) Next End Sub Example applying logistic regression to a study of low birth weights. The goal of this study was to identify risk factors associated with giving birth to a low birth weight baby. There are four variables under consideration: Age, Weight of subject, Race, and Number of physician visits during pregnancy. Private Sub LowBirthWeight() Dim Data As DataFrame = DataFrame.Load("lowbwt.dat", True, False, " ", True) Logistic regression provides a convenience method for producing design, or dummy, variables Imports "reference cell coding". If the categorical variable has k levels, there will be k - 1 design variables created. Reference cell coding involves setting all the design variable values to 0 for the reference group, and then setting a single design variable equal to 1 for each of the other groups. We first create a data frame containing the design variables and their values constructed from the Race column of the data. Since the race variable has 3 levels there will be two design variables. By default they will be named Race_0 and Race_1. Dim RaceColIndex As Integer = Data.IndexOfColumn("Race") Dim RaceDesignVars As DataFrame = LogisticRegression(Of NewtonRaphsonParameterCalc).DesignVariables(Data(RaceColIndex)) Next we remove the Race column from our input data and replace it with the two design variable columns. Data.RemoveColumn(RaceColIndex) Dim C As Integer For C = 0 To RaceDesignVars.Cols - 1 Data.InsertColumn(RaceColIndex + C, RaceDesignVars(C)) Next Now convert the data frames data to a matrix of floating point values. Dim MatrixDat As DoubleMatrix = Data.ToDoubleMatrix() The first column of the data is patient ID and the second column of the data contains the observed condition of low birth weight. A 1 in the observation column indicates low birth weight and a 0 indicated normal birth weight. We want to exclude the first column of patient IDs from the regression data. Dim A As DoubleMatrix = MatrixDat(Range.All, New Range(1, Position.End)) We now construct the logistic regression. This constructor allows you to leave the column of observed values in the data matrix. However you must supply the constructor with the index of the observation column and a predicate function object for converting the numerical values to boolean: true if the condition is present and false if it is not. So in constructing the object we pass in the matrix containing the independent, or predictor, variable values and the observed values. Next we pass in a 0 indicating the matrix column at index 0 contains the observed values. Next we pass in a lambda expression indicating the nonzero values in the observation column indicate the presence of low birth weight. Finally we include an intercept parameter as indicated by the final true argument. Dim ObservationPredicate = Function(x) Return x <> 0 End Function Dim LR As New LogisticRegression(Of NewtonRaphsonParameterCalc)(MatrixDat, 0, ObservationPredicate, True) Check to see if parameter calculation succeeded. If not print out diagnostics and exit. Console.WriteLine("LR good? " & LR.IsGood) If (Not LR.IsGood) Then Console.WriteLine("Logistic regression parameter calculation failed:") Console.WriteLine(LR.ParameterCalculationErrorMessage) Dim ParameterCalc = LR.ParameterCalculator Console.WriteLine("Maximum iterations: " & ParameterCalc.MaxIterations) Console.WriteLine("Number of iterations: " & ParameterCalc.Iterations) Console.WriteLine("Newton Raphson converged: " & ParameterCalc.Converged) Return End If Parameter calculation succeeded. Print out the model parameter estimates and related information. Dim parameterEstimates = LR.ParameterEstimates For I = 0 To parameterEstimates.Length - 1 Dim estimate = parameterEstimates(I) If (I = 0) Then Console.WriteLine("Constant term = {0}, SE = {1}", Math.Round(estimate.Value, 3), estimate.StandardError.ToString("G3")) Else Console.WriteLine("Coefficient for {0} = {1}, SE = {2}", Data(I).Name, Math.Round(estimate.Value, 3), estimate.StandardError.ToString("G3")) End If Next Console.WriteLine() We can look at the parameter covariance matrix. Console.WriteLine("Parameter covariance matrix:") Console.WriteLine(NMathFunctions.Round(LR.ParameterCovarianceMatrix, 3).ToTabDelimited()) Console.WriteLine() Finally, print out some fit information. Dim FitAnalysis = New LogisticRegressionFitAnalysis(Of NewtonRaphsonParameterCalc)(LR) Console.WriteLine("Log likelihood = " & FitAnalysis.LogLikelihood.ToString("G3")) Console.WriteLine("G-statistic = " & FitAnalysis.GStatistic.ToString("G3")) Dim PValue = FitAnalysis.GStatisticPValue Console.WriteLine("Pr[X^2({0}) > {1}] = {2}", LR.NumberOfPredictors, FitAnalysis.GStatistic, FitAnalysis.GStatisticPValue) Predict the probability of a 29 year old white women weighing 159 pounds and with 5 physician visits during pregnancy. Dim Subject As New DoubleVector(29.0, 159.0, 0.0, 0.0, 5.0) Dim Prob As Double = LR.PredictedProbability(Subject) Console.WriteLine("Estimated probability of a white woman age {0}, weighing {1} lbs, {2} Dr. visits is {3}", Subject(0), Subject(1), Subject(4), Prob.ToString("G5")) End Sub Private Sub Crime() Dim CrimeData = DataFrame.Load("crime.dat", True, False, " ", True) Dim ColumnNames() As String = {"CrimeRat", "MaleTeen", "South", "Educ", "Police59"} Dim Columns(ColumnNames.Length - 1) As Integer Dim I As Integer For I = 0 To ColumnNames.Length - 1 Columns(I) = CrimeData.IndexOfColumn(ColumnNames(I)) Next Dim S As New Subset(Columns) Dim Data = CrimeData.GetColumns(S) Dim MatrixData = Data.ToDoubleMatrix() Dim ObservationPredicate = Function(x) Return x >= 110.0 End Function Dim LR As New LogisticRegression(Of NewtonRaphsonParameterCalc)(MatrixData, 0, ObservationPredicate, True) Console.WriteLine("lr is good: " & LR.IsGood) Dim ParamEst() As LogisticRegressionParameter(Of NewtonRaphsonParameterCalc) = LR.ParameterEstimates For I = 0 To ParamEst.Length - 1 Console.WriteLine(ParamEst(I).ToString()) Next Dim Fit As New LogisticRegressionFitAnalysis(Of NewtonRaphsonParameterCalc)(LR) Dim Pearson = Fit.PearsonStatistic() Console.WriteLine("Pearson Statistic -") Console.WriteLine(Environment.NewLine & "Pearson: " & Pearson.ToString()) Console.WriteLine() Calculate the Hosmer Lemeshow statistic Imports 10 groups. Console.WriteLine("Hosmer Lemeshow Statistic -") Dim hosmerLemeshowStat = Fit.HLStatistic(10) Console.WriteLine(hosmerLemeshowStat) End Sub End Module End Namespace← All NMath Code Examples