VB NMF Consensus Matrix Example

← All NMath Code Examples

 

Imports System
Imports System.Text

Imports CenterSpace.NMath.Core

Imports System.IO

Namespace CenterSpace.NMath.Examples.VisualBasic

  A .NET example in Visual Basic showing how to compute a consensus matrix averaging different NMF clusterings.
  
  A Nonnegative Matrix Factorization (NMF) is an approximate factorization
  of a positive matrix v into a product of two matrices w and h:
  v ~ wh
  This factorization can by used to group, or cluster, the columns of v
  (the columns of v are usually refered to as "samples"). NMF uses an
  iterative algorithm with random starting values for w and h. This, coupled
  with the fact that the factorization is not unique, means that if you cluster
  the columns of v using an NMF cluster several different times, you may get several
  different clusterings. The NMF consensus matrix is a way to average 
  the possibly different clusterings, and is computed using the following process:
  
  Cluster the columns of v using NMF n times. Each NMF clustering will yield 
  a "connectivity matrix". The connectivity matrix is a symmetric matrix 
  whose i, jth entry is 1 if columns i and j of v were clustered together,
  and 0 if they were not. The "consensus matrix" is also a symmetric matrix
  whose i, jth entry is formed by taking the average of the i, jth entries of
  the n connectivity matrices. 
  
  It is clear that each i, jth entry of the consensus matrix has a value between 0
  (columns i and j were not clustered together on any of the n runs) and 1 (columns
  i and j were clustered together on all n runs). Thus the i, jth entry of a 
  consensus matrix may be considered, in some sense, a "probability" that columns
  i and j belong to the same cluster. 
  A consensus matrix C may also used to perform a hierarchical clustering of the 
  columns of v by using as the distance function:
  
  distance between columns i and j = 1.0 - C[i,j]
  
  This is demonstrated in the example below.

  Module NMFConsensusMatrixExample

    Sub Main()

      Read in some data...
      Dim Data As DataFrame = DataFrame.Load("nmf_data.dat", True, True, ControlChars.Tab, True)

      Extract the data as a DoubleMatrix.
      Dim V As DoubleMatrix = Data.ToDoubleMatrix()

      Set the order of the NMF (this is the number of columns in w, where
      v ~ wh
      Dim K As Integer = 3

      Set the number of runs or connectivity matrices to use to form the 
      consensus matrix.
      Dim NumberOfRuns As Integer = 70

      Construct a consensus matrix using the "divergence" update
      algorithm.
      Dim ConsensusMatrix As New NMFConsensusMatrix(Of NMFDivergenceUpdate)(V, Data.ColumnHeaders, K, NumberOfRuns)

      Console.WriteLine()

      Print out the number of runs in which the NMF algorithm actually converged to an answer, and the 
      resulting consensus matrix.
      Console.WriteLine("{0} runs out of {1} converged.", ConsensusMatrix.NumberOfConvergedRuns, NumberOfRuns)
      Console.WriteLine()
      Console.WriteLine("Consensus Matrix:")
      Console.WriteLine(ConsensusMatrix.ToTabDelimited("G3"))

      Lets look at the first column and for each successive column print out the 
      "probability" that they are clustered together (well use the column
      names from the data frame instead of column numbers).
      Dim Label As String = ConsensusMatrix.Labels(0)
      Console.WriteLine()

      Dim J As Integer

      For J = 1 To (ConsensusMatrix.Order - 1)
        Console.WriteLine("The ""probability"" that {0} is clustered with {1} is {2}",
          Label, ConsensusMatrix.Labels(J), ConsensusMatrix(0, J))
      Next

      Perform a hierarchical cluster analysis using the consensus matrix 
      to define the distance function as described in the class description
      above.

      The cluster analysis class wants to cluster the rows of a matrix. Since we 
      are essentially clustering a bunch of column numbers, well provide a matrix
      with one column and n rows where n is the number of columns of v (and the
      order of of the consensus matrix). The column will contain the numbers 0
      to n - 1 (basically, were just clustering the numbers 0,...,n - 1).
      Dim ItemNumbers As New DoubleMatrix(ConsensusMatrix.Order, 1, 0, 1)

      The distance function object holds the consensus matrix C and returns the distance
      between i and j as 1.0 - C[i,j]
      Dim DistanceFunctionObject As New ConsensusMatrixDistance(ConsensusMatrix)
      Dim ClusterAnalysisDist As New Distance.Function(AddressOf DistanceFunctionObject.CaDistance)
      Dim CA As New ClusterAnalysis(ItemNumbers, ClusterAnalysisDist)

      Form three clusters using the cluster analysis cut tree function and print them out.
      Dim ClusterS As ClusterSet = CA.CutTree(3)
      Console.WriteLine()
      Dim ClusterNumber As Integer
      Dim I As Integer
      For ClusterNumber = 0 To (ClusterS.NumberOfClusters - 1)
        Dim Members() As Integer = ClusterS.Cluster(ClusterNumber)
        Console.Write("Cluster number {0} contains: ", ClusterNumber)
        For I = 0 To (Members.Length - 1)
          Console.Write("{0} ", ConsensusMatrix.Labels(Members(I)))
        Next
        Console.WriteLine()
      Next

      Console.WriteLine()
      Console.WriteLine("Press Enter Key")
      Console.Read()

    End Sub

  End Module

  Class ConsensusMatrixDistance

    Private ConsensusMatrix As ConnectivityMatrix

    Public Sub New(ByVal Conn As ConnectivityMatrix)
      ConsensusMatrix = Conn
    End Sub

    Public Function CaDistance(ByVal Data1 As DoubleVector, ByVal Data2 As DoubleVector) As Double

      Dim I As Integer = Data1(0)
      Dim J As Integer = Data2(0)
      Return 1.0 - ConsensusMatrix(I, J)

    End Function

  End Class

End Namespace

← All NMath Code Examples
Top