← All NMath Code Examples
Imports System
Imports System.Text
Imports CenterSpace.NMath.Core
Imports System.IO
Namespace CenterSpace.NMath.Examples.VisualBasic
A .NET example in Visual Basic showing how to compute a consensus matrix averaging different NMF clusterings.
A Nonnegative Matrix Factorization (NMF) is an approximate factorization
of a positive matrix v into a product of two matrices w and h:
v ~ wh
This factorization can by used to group, or cluster, the columns of v
(the columns of v are usually refered to as "samples"). NMF uses an
iterative algorithm with random starting values for w and h. This, coupled
with the fact that the factorization is not unique, means that if you cluster
the columns of v using an NMF cluster several different times, you may get several
different clusterings. The NMF consensus matrix is a way to average
the possibly different clusterings, and is computed using the following process:
Cluster the columns of v using NMF n times. Each NMF clustering will yield
a "connectivity matrix". The connectivity matrix is a symmetric matrix
whose i, jth entry is 1 if columns i and j of v were clustered together,
and 0 if they were not. The "consensus matrix" is also a symmetric matrix
whose i, jth entry is formed by taking the average of the i, jth entries of
the n connectivity matrices.
It is clear that each i, jth entry of the consensus matrix has a value between 0
(columns i and j were not clustered together on any of the n runs) and 1 (columns
i and j were clustered together on all n runs). Thus the i, jth entry of a
consensus matrix may be considered, in some sense, a "probability" that columns
i and j belong to the same cluster.
A consensus matrix C may also used to perform a hierarchical clustering of the
columns of v by using as the distance function:
distance between columns i and j = 1.0 - C[i,j]
This is demonstrated in the example below.
Module NMFConsensusMatrixExample
Sub Main()
Read in some data...
Dim Data As DataFrame = DataFrame.Load("nmf_data.dat", True, True, ControlChars.Tab, True)
Extract the data as a DoubleMatrix.
Dim V As DoubleMatrix = Data.ToDoubleMatrix()
Set the order of the NMF (this is the number of columns in w, where
v ~ wh
Dim K As Integer = 3
Set the number of runs or connectivity matrices to use to form the
consensus matrix.
Dim NumberOfRuns As Integer = 70
Construct a consensus matrix using the "divergence" update
algorithm.
Dim ConsensusMatrix As New NMFConsensusMatrix(Of NMFDivergenceUpdate)(V, Data.ColumnHeaders, K, NumberOfRuns)
Console.WriteLine()
Print out the number of runs in which the NMF algorithm actually converged to an answer, and the
resulting consensus matrix.
Console.WriteLine("{0} runs out of {1} converged.", ConsensusMatrix.NumberOfConvergedRuns, NumberOfRuns)
Console.WriteLine()
Console.WriteLine("Consensus Matrix:")
Console.WriteLine(ConsensusMatrix.ToTabDelimited("G3"))
Lets look at the first column and for each successive column print out the
"probability" that they are clustered together (well use the column
names from the data frame instead of column numbers).
Dim Label As String = ConsensusMatrix.Labels(0)
Console.WriteLine()
Dim J As Integer
For J = 1 To (ConsensusMatrix.Order - 1)
Console.WriteLine("The ""probability"" that {0} is clustered with {1} is {2}",
Label, ConsensusMatrix.Labels(J), ConsensusMatrix(0, J))
Next
Perform a hierarchical cluster analysis using the consensus matrix
to define the distance function as described in the class description
above.
The cluster analysis class wants to cluster the rows of a matrix. Since we
are essentially clustering a bunch of column numbers, well provide a matrix
with one column and n rows where n is the number of columns of v (and the
order of of the consensus matrix). The column will contain the numbers 0
to n - 1 (basically, were just clustering the numbers 0,...,n - 1).
Dim ItemNumbers As New DoubleMatrix(ConsensusMatrix.Order, 1, 0, 1)
The distance function object holds the consensus matrix C and returns the distance
between i and j as 1.0 - C[i,j]
Dim DistanceFunctionObject As New ConsensusMatrixDistance(ConsensusMatrix)
Dim ClusterAnalysisDist As New Distance.Function(AddressOf DistanceFunctionObject.CaDistance)
Dim CA As New ClusterAnalysis(ItemNumbers, ClusterAnalysisDist)
Form three clusters using the cluster analysis cut tree function and print them out.
Dim ClusterS As ClusterSet = CA.CutTree(3)
Console.WriteLine()
Dim ClusterNumber As Integer
Dim I As Integer
For ClusterNumber = 0 To (ClusterS.NumberOfClusters - 1)
Dim Members() As Integer = ClusterS.Cluster(ClusterNumber)
Console.Write("Cluster number {0} contains: ", ClusterNumber)
For I = 0 To (Members.Length - 1)
Console.Write("{0} ", ConsensusMatrix.Labels(Members(I)))
Next
Console.WriteLine()
Next
Console.WriteLine()
Console.WriteLine("Press Enter Key")
Console.Read()
End Sub
End Module
Class ConsensusMatrixDistance
Private ConsensusMatrix As ConnectivityMatrix
Public Sub New(ByVal Conn As ConnectivityMatrix)
ConsensusMatrix = Conn
End Sub
Public Function CaDistance(ByVal Data1 As DoubleVector, ByVal Data2 As DoubleVector) As Double
Dim I As Integer = Data1(0)
Dim J As Integer = Data2(0)
Return 1.0 - ConsensusMatrix(I, J)
End Function
End Class
End Namespace
← All NMath Code Examples