VB Logistic Regression Example

← All NMath Code Examples

 

Imports System
Imports System.Collections.Generic
Imports System.IO

Imports CenterSpace.NMath.Core

Namespace CenterSpace.NMath.Examples.VisualBasic

  A .NET example in Visual Basic showing how to perform logistic regression.
  Module LogisticRegressionExample

    Sub Main()

      Console.WriteLine("Coronary Heart Disease Example -----------------")
      Console.WriteLine(Environment.NewLine)
      CoronaryHeartDiseaseAge()
      Console.WriteLine(Environment.NewLine)

      Console.WriteLine("Low Birth Weight Example -----------------------")
      Console.WriteLine(Environment.NewLine)
      LowBirthWeight()
      Console.WriteLine(Environment.NewLine)

      Console.WriteLine("Crime Example -----------------------------------")
      Console.WriteLine(Environment.NewLine)
      Crime()
      Console.WriteLine(Environment.NewLine)

      Console.WriteLine("Press Enter Key")
      Console.Read()

    End Sub


    Example relating the presence of coronary heart disease and age. The data consist of subjects
    age and the whether or not the subject displays evidence of coronary heart disease
    (1 for present, 0 for not present).
    Private Sub CoronaryHeartDiseaseAge()

      The data for this example are stored in a matrix. The first column contains the independent,
      or predictor, variable values. The second column contains the observed outcome values (0 or 1),
      where 1 indicates the presence of coronary heart disease, and 0 denotes its absence.
      Dim ChdDataAll As New DoubleMatrix(New StreamReader(New FileStream("chdage.mat", FileMode.Open)))
      Dim ChdData As DoubleMatrix = ChdDataAll(Slice.All, New Slice(1, 2))

      If (ChdData = Nothing) Then
        Console.WriteLine("Could not load data for coronary heart disease example. Exiting.")
        Return
      End If

      A logistic regression can be constructed from data in the following format: a matrix whose
      rows contain the predictor variable values, and a vector of booleans for the observed values.
      Dim Obs(ChdData.Rows - 1) As Boolean

      Dim I As Integer
      For I = 0 To ChdData.Rows - 1
        Obs(I) = ChdData(I, 1) <> 0
      Next

      Dim RegMat As DoubleMatrix = ChdData(Slice.All, New Slice(0, 1))

      The logistic regression Module takes a Module parameter indicating the parameter calculation
      algorithm to use. Here we use a Newton-Raphson calculator Module, essentially an iteratively
      reweighted least squares. Since we want our model to have an intercept parameter, we set
      the last argument to true.
      Dim LogReg As New LogisticRegression(Of NewtonRaphsonParameterCalc)(RegMat, Obs, True)

      First we check that parameter calculation is successful. If not, we
      print out some diagnostic information and exit.
      If (Not LogReg.IsGood) Then
        Console.WriteLine("Logistic regression parameter calculation failed:")
        Console.WriteLine(LogReg.ParameterCalculationErrorMessage)
        Dim ParameterCalc = LogReg.ParameterCalculator
        Console.WriteLine("Maximum iterations: " & ParameterCalc.MaxIterations)
        Console.WriteLine("Number of iterations: " & ParameterCalc.Iterations)
        Console.WriteLine("Newton Raphson converged: " & ParameterCalc.Converged)
        Return
      End If

      Parameter calculation is successful. The fit analysis Module is still
      under construction and will contain more statistics. For now we look
      at the G-statistic.
      Dim FitAnalysis As New LogisticRegressionFitAnalysis(Of NewtonRaphsonParameterCalc)(LogReg)
      Console.WriteLine("Log likelihood: " & FitAnalysis.LogLikelihood.ToString("G3"))
      Console.WriteLine("G-statistic: " & FitAnalysis.GStatistic.ToString("G3"))
      Console.WriteLine("G-statistic P-value: " & FitAnalysis.GStatisticPValue.ToString("G3"))
      Console.WriteLine()

      Print out the parameter values and related statistics:
      Dim ParameterEstimates() As LogisticRegressionParameter(Of NewtonRaphsonParameterCalc) = LogReg.ParameterEstimates
      Console.WriteLine("Intercept Parameter:")
      Console.WriteLine(ParameterEstimates(0).ToString())
      Console.WriteLine()
      Console.WriteLine("Age Coefficient:")
      Console.WriteLine(ParameterEstimates(1).ToString())
      Console.WriteLine()

      Predict the probability of the presence of coronary heart disease for some ages.
      Dim Ages As New DoubleMatrix("5x1 [29.0 37.0 48.0 64.0 78.0]")
      Dim Probabilities As DoubleVector = LogReg.PredictedProbabilities(Ages)
      For I = 0 To Ages.Rows - 1
        Console.WriteLine("The probability of the presence of coronary heart disease at age {0} is {1}",
          Ages(I, 0), Probabilities(I).ToString("G3"))
      Next

    End Sub

    Example applying logistic regression to a study of low birth weights. The goal of this study was
    to identify risk factors associated with giving birth to a low birth weight baby. There are four
    variables under consideration: Age, Weight of subject, Race, and Number of physician visits during
    pregnancy.
    Private Sub LowBirthWeight()

      Dim Data As DataFrame = DataFrame.Load("lowbwt.dat", True, False, " ", True)

      Logistic regression provides a convenience method for producing design, or dummy, variables
      Imports "reference cell coding". If the categorical variable has k levels, there will be k - 1
      design variables created. Reference cell coding involves setting all the design variable
      values to 0 for the reference group, and then setting a single design variable equal to 1 for each of
      the other groups.

      We first create a data frame containing the design variables and their values 
      constructed from the Race column of the data. Since the race variable has
      3 levels there will be two design variables. By default they will be named
      Race_0 and Race_1.
      Dim RaceColIndex As Integer = Data.IndexOfColumn("Race")
      Dim RaceDesignVars As DataFrame = LogisticRegression(Of NewtonRaphsonParameterCalc).DesignVariables(Data(RaceColIndex))

      Next we remove the Race column from our input data and replace it with 
      the two design variable columns.
      Data.RemoveColumn(RaceColIndex)
      Dim C As Integer
      For C = 0 To RaceDesignVars.Cols - 1
        Data.InsertColumn(RaceColIndex + C, RaceDesignVars(C))
      Next

      Now convert the data frames data to a matrix of floating point values.
      Dim MatrixDat As DoubleMatrix = Data.ToDoubleMatrix()

      The first column of the data is patient ID and the second column of the data contains the
      observed condition of low birth weight. A 1 in the observation column indicates low birth weight
      and a 0 indicated normal birth weight. We want to exclude the first column of patient IDs from the
      regression data.
      Dim A As DoubleMatrix = MatrixDat(Range.All, New Range(1, Position.End))

      We now construct the logistic regression. This constructor allows
      you to leave the column of observed values in the data matrix. 
      However you must supply the constructor with the index of the 
      observation column and a predicate function object for converting
      the numerical values to boolean: true if the condition is present
      and false if it is not. So in constructing the object we pass in
      the matrix containing the independent, or predictor, variable 
      values and the observed values. Next we pass in a 0 indicating the
      matrix column at index 0 contains the observed values. Next we pass
      in a lambda expression indicating the nonzero values in the observation
      column indicate the presence of low birth weight. Finally we 
      include an intercept parameter as indicated by the final true 
      argument.
      Dim ObservationPredicate = Function(x)
                                   Return x <> 0
                                 End Function
      Dim LR As New LogisticRegression(Of NewtonRaphsonParameterCalc)(MatrixDat, 0, ObservationPredicate, True)

      Check to see if parameter calculation succeeded. If not print out diagnostics
      and exit.
      Console.WriteLine("LR good? " & LR.IsGood)
      If (Not LR.IsGood) Then
        Console.WriteLine("Logistic regression parameter calculation failed:")
        Console.WriteLine(LR.ParameterCalculationErrorMessage)
        Dim ParameterCalc = LR.ParameterCalculator
        Console.WriteLine("Maximum iterations: " & ParameterCalc.MaxIterations)
        Console.WriteLine("Number of iterations: " & ParameterCalc.Iterations)
        Console.WriteLine("Newton Raphson converged: " & ParameterCalc.Converged)
        Return
      End If

      Parameter calculation succeeded. Print out the model parameter estimates
      and related information.
      Dim parameterEstimates = LR.ParameterEstimates
      For I = 0 To parameterEstimates.Length - 1
        Dim estimate = parameterEstimates(I)
        If (I = 0) Then
          Console.WriteLine("Constant term = {0}, SE = {1}", Math.Round(estimate.Value, 3),
            estimate.StandardError.ToString("G3"))
        Else
          Console.WriteLine("Coefficient for {0} = {1}, SE = {2}", Data(I).Name, Math.Round(estimate.Value, 3),
            estimate.StandardError.ToString("G3"))
        End If
      Next

      Console.WriteLine()

      We can look at the parameter covariance matrix.
      Console.WriteLine("Parameter covariance matrix:")
      Console.WriteLine(NMathFunctions.Round(LR.ParameterCovarianceMatrix, 3).ToTabDelimited())
      Console.WriteLine()

      Finally, print out some fit information.
      Dim FitAnalysis = New LogisticRegressionFitAnalysis(Of NewtonRaphsonParameterCalc)(LR)
      Console.WriteLine("Log likelihood = " & FitAnalysis.LogLikelihood.ToString("G3"))
      Console.WriteLine("G-statistic = " & FitAnalysis.GStatistic.ToString("G3"))
      Dim PValue = FitAnalysis.GStatisticPValue
      Console.WriteLine("Pr[X^2({0}) > {1}] = {2}", LR.NumberOfPredictors, FitAnalysis.GStatistic,
        FitAnalysis.GStatisticPValue)

      Predict the probability of a 29 year old white women weighing 159 pounds and with
      5 physician visits during pregnancy.
      Dim Subject As New DoubleVector(29.0, 159.0, 0.0, 0.0, 5.0)
      Dim Prob As Double = LR.PredictedProbability(Subject)
      Console.WriteLine("Estimated probability of a white woman age {0}, weighing {1} lbs, {2} Dr. visits is {3}",
        Subject(0), Subject(1), Subject(4), Prob.ToString("G5"))
    End Sub

    Private Sub Crime()

      Dim CrimeData = DataFrame.Load("crime.dat", True, False, " ", True)

      Dim ColumnNames() As String = {"CrimeRat", "MaleTeen", "South", "Educ", "Police59"}
      Dim Columns(ColumnNames.Length - 1) As Integer
      Dim I As Integer
      For I = 0 To ColumnNames.Length - 1
        Columns(I) = CrimeData.IndexOfColumn(ColumnNames(I))
      Next

      Dim S As New Subset(Columns)
      Dim Data = CrimeData.GetColumns(S)
      Dim MatrixData = Data.ToDoubleMatrix()
      Dim ObservationPredicate = Function(x)
                                   Return x >= 110.0
                                 End Function
      Dim LR As New LogisticRegression(Of NewtonRaphsonParameterCalc)(MatrixData, 0, ObservationPredicate, True)
      Console.WriteLine("lr is good: " & LR.IsGood)
      Dim ParamEst() As LogisticRegressionParameter(Of NewtonRaphsonParameterCalc) = LR.ParameterEstimates
      For I = 0 To ParamEst.Length - 1
        Console.WriteLine(ParamEst(I).ToString())
      Next

      Dim Fit As New LogisticRegressionFitAnalysis(Of NewtonRaphsonParameterCalc)(LR)
      Dim Pearson = Fit.PearsonStatistic()
      Console.WriteLine("Pearson Statistic -")
      Console.WriteLine(Environment.NewLine & "Pearson: " & Pearson.ToString())
      Console.WriteLine()

      Calculate the Hosmer Lemeshow statistic Imports 10 groups.
      Console.WriteLine("Hosmer Lemeshow Statistic -")
      Dim hosmerLemeshowStat = Fit.HLStatistic(10)
      Console.WriteLine(hosmerLemeshowStat)
    End Sub

  End Module

End Namespace


← All NMath Code Examples
Top