VB NMF Consensus Matrix Example

← All NMath Stats Code Examples


Imports System
Imports System.Text

Imports CenterSpace.NMath.Core
Imports CenterSpace.NMath.Stats
Imports System.IO

Namespace CenterSpace.NMath.Stats.Examples.VisualBasic

  ' A .NET example in Visual Basic showing how to compute a consensus matrix averaging different NMF clusterings.
  ' A Nonnegative Matrix Factorization (NMF) is an approximate factorization
  ' of a positive matrix v into a product of two matrices w and h:
  ' v ~ wh
  ' This factorization can by used to group, or cluster, the columns of v
  ' (the columns of v are usually refered to as "samples"). NMF uses an
  ' iterative algorithm with random starting values for w and h. This, coupled
  ' with the fact that the factorization is not unique, means that if you cluster
  ' the columns of v using an NMF cluster several different times, you may get several
  ' different clusterings. The NMF consensus matrix is a way to average 
  ' the possibly different clusterings, and is computed using the following process:
  ' Cluster the columns of v using NMF n times. Each NMF clustering will yield 
  ' a "connectivity matrix". The connectivity matrix is a symmetric matrix 
  ' whose i, jth entry is 1 if columns i and j of v were clustered together,
  ' and 0 if they were not. The "consensus matrix" is also a symmetric matrix
  ' whose i, jth entry is formed by taking the average of the i, jth entries of
  ' the n connectivity matrices. 
  ' It is clear that each i, jth entry of the consensus matrix has a value between 0
  ' (columns i and j were not clustered together on any of the n runs) and 1 (columns
  ' i and j were clustered together on all n runs). Thus the i, jth entry of a 
  ' consensus matrix may be considered, in some sense, a "probability" that columns
  ' i and j belong to the same cluster. 
  ' A consensus matrix C may also used to perform a hierarchical clustering of the 
  ' columns of v by using as the distance function:
  ' distance between columns i and j = 1.0 - C[i,j]
  ' This is demonstrated in the example below.

  Module NMFConsensusMatrixExample

    Sub Main()

      ' Read in some data...
      Dim Data As DataFrame = DataFrame.Load("nmf_data.dat", True, True, ControlChars.Tab, True)

      ' Extract the data as a DoubleMatrix.
      Dim V As DoubleMatrix = Data.ToDoubleMatrix()

      ' Set the order of the NMF (this is the number of columns in w, where
      ' v ~ wh
      Dim K As Integer = 3

      ' Set the number of runs or connectivity matrices to use to form the 
      ' consensus matrix.
      Dim NumberOfRuns As Integer = 70

      ' Construct a consensus matrix using the "divergence" update
      ' algorithm.
      Dim ConsensusMatrix As New NMFConsensusMatrix(Of NMFDivergenceUpdate)(V, Data.ColumnHeaders, K, NumberOfRuns)


      ' Print out the number of runs in which the NMF algorithm actually converged to an answer, and the 
      ' resulting consensus matrix.
      Console.WriteLine("{0} runs out of {1} converged.", ConsensusMatrix.NumberOfConvergedRuns, NumberOfRuns)
      Console.WriteLine("Consensus Matrix:")

      ' Let's look at the first column and for each successive column print out the 
      ' "probability" that they are clustered together (we'll use the column
      ' names from the data frame instead of column numbers).
      Dim Label As String = ConsensusMatrix.Labels(0)

      Dim J As Integer

      For J = 1 To (ConsensusMatrix.Order - 1)
        Console.WriteLine("The ""probability"" that {0} is clustered with {1} is {2}",
          Label, ConsensusMatrix.Labels(J), ConsensusMatrix(0, J))

      ' Perform a hierarchical cluster analysis using the consensus matrix 
      ' to define the distance function as described in the class description
      ' above.

      ' The cluster analysis class wants to cluster the rows of a matrix. Since we 
      ' are essentially clustering a bunch of column numbers, we'll provide a matrix
      ' with one column and n rows where n is the number of columns of v (and the
      ' order of of the consensus matrix). The column will contain the numbers 0
      ' to n - 1 (basically, we're just clustering the numbers 0,...,n - 1).
      Dim ItemNumbers As New DoubleMatrix(ConsensusMatrix.Order, 1, 0, 1)

      ' The distance function object holds the consensus matrix C and returns the distance
      ' between i and j as 1.0 - C[i,j]
      Dim DistanceFunctionObject As New ConsensusMatrixDistance(ConsensusMatrix)
      Dim ClusterAnalysisDist As New Distance.Function(AddressOf DistanceFunctionObject.CaDistance)
      Dim CA As New ClusterAnalysis(ItemNumbers, ClusterAnalysisDist)

      ' Form three clusters using the cluster analysis cut tree function and print them out.
      Dim ClusterS As ClusterSet = CA.CutTree(3)
      Dim ClusterNumber As Integer
      Dim I As Integer
      For ClusterNumber = 0 To (ClusterS.NumberOfClusters - 1)
        Dim Members() As Integer = ClusterS.Cluster(ClusterNumber)
        Console.Write("Cluster number {0} contains: ", ClusterNumber)
        For I = 0 To (Members.Length - 1)
          Console.Write("{0} ", ConsensusMatrix.Labels(Members(I)))

      Console.WriteLine("Press Enter Key")

    End Sub

  End Module

  Class ConsensusMatrixDistance

    Private ConsensusMatrix As ConnectivityMatrix

    Public Sub New(ByVal Conn As ConnectivityMatrix)
      ConsensusMatrix = Conn
    End Sub

    Public Function CaDistance(ByVal Data1 As DoubleVector, ByVal Data2 As DoubleVector) As Double

      Dim I As Integer = Data1(0)
      Dim J As Integer = Data2(0)
      Return 1.0 - ConsensusMatrix(I, J)

    End Function

  End Class

End Namespace

← All NMath Stats Code Examples