# VB NMF Consensus Matrix Example

← All NMath Core Code Examples

```ï»¿Imports System
Imports System.Text

Imports CenterSpace.NMath.Core
Imports System.IO

Namespace CenterSpace.NMath.Core.Examples.VisualBasic

' A .NET example in Visual Basic showing how to compute a consensus matrix averaging different NMF clusterings.
'
' A Nonnegative Matrix Factorization (NMF) is an approximate factorization
' of a positive matrix v into a product of two matrices w and h:
' v ~ wh
' This factorization can by used to group, or cluster, the columns of v
' (the columns of v are usually refered to as "samples"). NMF uses an
' iterative algorithm with random starting values for w and h. This, coupled
' with the fact that the factorization is not unique, means that if you cluster
' the columns of v using an NMF cluster several different times, you may get several
' different clusterings. The NMF consensus matrix is a way to average
' the possibly different clusterings, and is computed using the following process:
'
' Cluster the columns of v using NMF n times. Each NMF clustering will yield
' a "connectivity matrix". The connectivity matrix is a symmetric matrix
' whose i, jth entry is 1 if columns i and j of v were clustered together,
' and 0 if they were not. The "consensus matrix" is also a symmetric matrix
' whose i, jth entry is formed by taking the average of the i, jth entries of
' the n connectivity matrices.
'
' It is clear that each i, jth entry of the consensus matrix has a value between 0
' (columns i and j were not clustered together on any of the n runs) and 1 (columns
' i and j were clustered together on all n runs). Thus the i, jth entry of a
' consensus matrix may be considered, in some sense, a "probability" that columns
' i and j belong to the same cluster.
' A consensus matrix C may also used to perform a hierarchical clustering of the
' columns of v by using as the distance function:
'
' distance between columns i and j = 1.0 - C[i,j]
'
' This is demonstrated in the example below.

Module NMFConsensusMatrixExample

Sub Main()

Dim Data As DataFrame = DataFrame.Load("nmf_data.dat", True, True, ControlChars.Tab, True)

' Extract the data as a DoubleMatrix.
Dim V As DoubleMatrix = Data.ToDoubleMatrix()

' Set the order of the NMF (this is the number of columns in w, where
' v ~ wh
Dim K As Integer = 3

' Set the number of runs or connectivity matrices to use to form the
' consensus matrix.
Dim NumberOfRuns As Integer = 70

' Construct a consensus matrix using the "divergence" update
' algorithm.
Dim ConsensusMatrix As New NMFConsensusMatrix(Of NMFDivergenceUpdate)(V, Data.ColumnHeaders, K, NumberOfRuns)

Console.WriteLine()

' Print out the number of runs in which the NMF algorithm actually converged to an answer, and the
' resulting consensus matrix.
Console.WriteLine("{0} runs out of {1} converged.", ConsensusMatrix.NumberOfConvergedRuns, NumberOfRuns)
Console.WriteLine()
Console.WriteLine("Consensus Matrix:")
Console.WriteLine(ConsensusMatrix.ToTabDelimited("G3"))

' Let's look at the first column and for each successive column print out the
' "probability" that they are clustered together (we'll use the column
' names from the data frame instead of column numbers).
Dim Label As String = ConsensusMatrix.Labels(0)
Console.WriteLine()

Dim J As Integer

For J = 1 To (ConsensusMatrix.Order - 1)
Console.WriteLine("The ""probability"" that {0} is clustered with {1} is {2}",
Label, ConsensusMatrix.Labels(J), ConsensusMatrix(0, J))
Next

' Perform a hierarchical cluster analysis using the consensus matrix
' to define the distance function as described in the class description
' above.

' The cluster analysis class wants to cluster the rows of a matrix. Since we
' are essentially clustering a bunch of column numbers, we'll provide a matrix
' with one column and n rows where n is the number of columns of v (and the
' order of of the consensus matrix). The column will contain the numbers 0
' to n - 1 (basically, we're just clustering the numbers 0,...,n - 1).
Dim ItemNumbers As New DoubleMatrix(ConsensusMatrix.Order, 1, 0, 1)

' The distance function object holds the consensus matrix C and returns the distance
' between i and j as 1.0 - C[i,j]
Dim DistanceFunctionObject As New ConsensusMatrixDistance(ConsensusMatrix)
Dim CA As New ClusterAnalysis(ItemNumbers, ClusterAnalysisDist)

' Form three clusters using the cluster analysis cut tree function and print them out.
Dim ClusterS As ClusterSet = CA.CutTree(3)
Console.WriteLine()
Dim ClusterNumber As Integer
Dim I As Integer
For ClusterNumber = 0 To (ClusterS.NumberOfClusters - 1)
Dim Members() As Integer = ClusterS.Cluster(ClusterNumber)
Console.Write("Cluster number {0} contains: ", ClusterNumber)
For I = 0 To (Members.Length - 1)
Console.Write("{0} ", ConsensusMatrix.Labels(Members(I)))
Next
Console.WriteLine()
Next

Console.WriteLine()
Console.WriteLine("Press Enter Key")

End Sub

End Module

Class ConsensusMatrixDistance

Private ConsensusMatrix As ConnectivityMatrix

Public Sub New(ByVal Conn As ConnectivityMatrix)
ConsensusMatrix = Conn
End Sub

Public Function CaDistance(ByVal Data1 As DoubleVector, ByVal Data2 As DoubleVector) As Double

Dim I As Integer = Data1(0)
Dim J As Integer = Data2(0)
Return 1.0 - ConsensusMatrix(I, J)

End Function

End Class

End Namespace

```
← All NMath Stats Code Examples
Top