 | NMFClusteringAlg Class |
Class NMFClustering performs a Non-negative Matrix Factorization (NMF) of
a given matrix.
Inheritance Hierarchy Namespace: CenterSpace.NMath.CoreAssembly: NMath (in NMath.dll) Version: 7.4
Syntaxpublic class NMFClustering<Alg>
where Alg : new(), INMFUpdateAlgorithm
Public Class NMFClustering(Of Alg As {New, INMFUpdateAlgorithm})
generic<typename Alg>
where Alg : gcnew(), INMFUpdateAlgorithm
public ref class NMFClustering
type NMFClustering<'Alg when 'Alg : new() and INMFUpdateAlgorithm> = class end
Type Parameters
- Alg
-
The NMF update algortithm to use. Must implement the interface
INMFUpdateAlgorithm.
The NMFClusteringAlg type exposes the following members.
Constructors | Name | Description |
---|
 | NMFClusteringAlg |
Constructs a NMFClustering instance with the given iteration algorithm and default
values for maximum iterations, stopping adjacency, and convergence check period.
|
 | NMFClusteringAlg(Alg) |
Constructs a NMFClustering instance with the given iteration algorithm and default
values for maximum iterations, stopping adjacency, and convergence check period.
|
 | NMFClusteringAlg(Int32, Int32) |
Constructs a NMFClustering instace with the given parameters and the default convergence
check period is used.
|
 | NMFClusteringAlg(Int32, Int32, Int32) |
Constructs a NMFClustering instace with the given parameters.
|
 | NMFClusteringAlg(Int32, Int32, Int32, Alg) |
Constructs NMFClustering instance with the given parameters.
|
Top
Properties | Name | Description |
---|
 | ClusterSet |
Gets a ClusterSet object that identifies the cluster into which
each object was grouped. The indexing operator on the returned
ClusterSet returns the cluter number to which ith object was
grouped.
|
 | Connectivity |
Gets the adjacency matrix. The Adjacency Matrix is a symmetric
matrix whose i, jth value is 1 if columns i and j of the factored matrix
are in the same cluster, and 0 if they are not.
|
 | Converged |
Returns true if the factorization algorithm converged. Returns false if the
maximum number of iterations was exceeded before convergence was achieved.
|
 | ConvergenceCheckPeriod |
Gets and sets the convergence check period. Convergence will be checked every
iterations.
|
 | Cost |
Gets the value of the cost function for the factorization. The cost
function is the function that is minimized by the NMF updater algorithm
used.
|
 | H |
Gets the matrix H in the NMF factorization of V,
V ~ WH
|
 | Iterations |
Gets the total number of iterations performed in the most recent calculation.
|
 | MaxFactorizationIterations |
Gets and sets the maximum number of iterations to perform.
|
 | StoppingAdjacency |
Gets and sets the stopping adjacency. The stopping adjacency. The
stopping adjacency is the number of consecutively
unchanged connection matrices that must be observed
before convergence.
|
 | Updater |
Gets ans sets the NMF iteration algorithm to use.
|
 | W |
Gets the matrix W in the NMF factorization of V,
V ~ WH
|
Top
Methods
Fields
Remarks
The stopping criterion for the iterative process is the
stabilization of the clustering of the columns of the matrix V. The clustering
works like this:
At each stage of the factorization, we have an approximate factorization of
V into the product of two matrices W and H,
V ~ WH
Thus each column of vj of V is expressed as a linear combination of the columns
of W with coefficients given in the corresponding column hj of H:
vj = sum(i)hij*wi
Each column vj of V is placed into a cluster corresponding to the column wi of W
which has the largest coefficient hij. That is, column vj of V is placed in cluster
i if the entry hij in H is the largest entery in column hj of H.
The result of the clustering is returned as a Adjacency Matrix whose i, jth
value is 1 if columns i and j of V are in the same cluster, and 0 if they are
not.
The iteration stops when the adjacency matrix is unchanged for a certain
number of iterations, that is, it has stabilized. There are three
parameters that control the iteration: the maximum number of iterations to
perform, the stopping adjacency streak, which is the number of consecutive times
the adjancency matrix remains unchanged before it is considered stablized,
and the period to generate and check the adjacency matrix. The last
parameter is necessary because computing the adjacency matrix can
be a somewhat expensive operation that one may not wish to perform
at every iteration, but only every nth iteration.
For example, running a NMFClustering instance with maximum iterations = 2000,
stopping adjacency = 40, and convergence check period = 10, will create
a adjacency matrix every 10 iterations and check it against the last
one. If they are the same a count is incremented. The iteration will stop
when we get 40 consecutive unchaged adjacency matrices, or we hit 2000
iterations - which ever comes first.
See the paper
Metagenes and Molecular Pattern Discovery Using Matrix Factorization
Jean-Philippe Brunet, Pablo Tamayo, Todd R. Golub, and Jill P. Mesirov.
See Also