Click or drag to resize

NMFClusteringAlg Class

Class NMFClustering performs a Non-negative Matrix Factorization (NMF) of a given matrix.
Inheritance Hierarchy
SystemObject
  CenterSpace.NMath.CoreNMFClusteringAlg

Namespace:  CenterSpace.NMath.Core
Assembly:  NMath (in NMath.dll) Version: 7.4
Syntax
public class NMFClustering<Alg>
where Alg : new(), INMFUpdateAlgorithm

Type Parameters

Alg
The NMF update algortithm to use. Must implement the interface INMFUpdateAlgorithm.

The NMFClusteringAlg type exposes the following members.

Constructors
  NameDescription
Public methodNMFClusteringAlg
Constructs a NMFClustering instance with the given iteration algorithm and default values for maximum iterations, stopping adjacency, and convergence check period.
Public methodNMFClusteringAlg(Alg)
Constructs a NMFClustering instance with the given iteration algorithm and default values for maximum iterations, stopping adjacency, and convergence check period.
Public methodNMFClusteringAlg(Int32, Int32)
Constructs a NMFClustering instace with the given parameters and the default convergence check period is used.
Public methodNMFClusteringAlg(Int32, Int32, Int32)
Constructs a NMFClustering instace with the given parameters.
Public methodNMFClusteringAlg(Int32, Int32, Int32, Alg)
Constructs NMFClustering instance with the given parameters.
Top
Properties
  NameDescription
Public propertyClusterSet
Gets a ClusterSet object that identifies the cluster into which each object was grouped. The indexing operator on the returned ClusterSet returns the cluter number to which ith object was grouped.
Public propertyConnectivity
Gets the adjacency matrix. The Adjacency Matrix is a symmetric matrix whose i, jth value is 1 if columns i and j of the factored matrix are in the same cluster, and 0 if they are not.
Public propertyConverged
Returns true if the factorization algorithm converged. Returns false if the maximum number of iterations was exceeded before convergence was achieved.
Public propertyConvergenceCheckPeriod
Gets and sets the convergence check period. Convergence will be checked every
ConvergenceCheckPeriod
iterations.
Public propertyCost
Gets the value of the cost function for the factorization. The cost function is the function that is minimized by the NMF updater algorithm used.
Public propertyH
Gets the matrix H in the NMF factorization of V, V ~ WH
Public propertyIterations
Gets the total number of iterations performed in the most recent calculation.
Public propertyMaxFactorizationIterations
Gets and sets the maximum number of iterations to perform.
Public propertyStoppingAdjacency
Gets and sets the stopping adjacency. The stopping adjacency. The stopping adjacency is the number of consecutively unchanged connection matrices that must be observed before convergence.
Public propertyUpdater
Gets ans sets the NMF iteration algorithm to use.
Public propertyW
Gets the matrix W in the NMF factorization of V, V ~ WH
Top
Methods
  NameDescription
Public methodFactor(DataFrame, Int32)
Performs the NMF V ~ WH using random initial values for W and H.
Public methodFactor(DoubleMatrix, Int32)
Performs the NMF V ~ WH using random initial values for W and H.
Public methodFactor(DataFrame, Int32, DoubleMatrix, DoubleMatrix)
Performs the NMF V ~ WH using the specified initial values for W and H.
Public methodFactor(DoubleMatrix, Int32, DoubleMatrix, DoubleMatrix)
Performs the NMF V ~ WH using the specified initial values for W and H.
Top
Fields
  NameDescription
Public fieldStatic memberDEFAULT_CONV_CHECK_PEROID
The default number if iterations between checks for convergence.
Public fieldStatic memberDEFAULT_MAX_FACTORIZATION_ITERATIONS
The default maximum number of iterations to perform.
Public fieldStatic memberDEFAULT_STOPPING_ADJACENCY
The default value for the stopping adjacency. The stopping adjacency is the number of consecutively unchanged connection matrices that must be observed before convergence.
Top
Remarks
The stopping criterion for the iterative process is the stabilization of the clustering of the columns of the matrix V. The clustering works like this: At each stage of the factorization, we have an approximate factorization of V into the product of two matrices W and H, V ~ WH Thus each column of vj of V is expressed as a linear combination of the columns of W with coefficients given in the corresponding column hj of H: vj = sum(i)hij*wi Each column vj of V is placed into a cluster corresponding to the column wi of W which has the largest coefficient hij. That is, column vj of V is placed in cluster i if the entry hij in H is the largest entery in column hj of H. The result of the clustering is returned as a Adjacency Matrix whose i, jth value is 1 if columns i and j of V are in the same cluster, and 0 if they are not.

The iteration stops when the adjacency matrix is unchanged for a certain number of iterations, that is, it has stabilized. There are three parameters that control the iteration: the maximum number of iterations to perform, the stopping adjacency streak, which is the number of consecutive times the adjancency matrix remains unchanged before it is considered stablized, and the period to generate and check the adjacency matrix. The last parameter is necessary because computing the adjacency matrix can be a somewhat expensive operation that one may not wish to perform at every iteration, but only every nth iteration. For example, running a NMFClustering instance with maximum iterations = 2000, stopping adjacency = 40, and convergence check period = 10, will create a adjacency matrix every 10 iterations and check it against the last one. If they are the same a count is incremented. The iteration will stop when we get 40 consecutive unchaged adjacency matrices, or we hit 2000 iterations - which ever comes first. See the paper Metagenes and Molecular Pattern Discovery Using Matrix Factorization Jean-Philippe Brunet, Pablo Tamayo, Todd R. Golub, and Jill P. Mesirov.

See Also