Blog

Posts Tagged ‘NMF’

Cluster Analysis, Part V: Monte Carlo NMF

Monday, January 11th, 2010

In this continuing series, we explore the NMath Stats functions for performing cluster analysis. (For previous posts, see Part 1 – PCA , Part 2 – K-Means, Part 3 – Hierarchical, and Part 4 – NMF.) The sample data set we’re using classifies 89 single malt scotch whiskies on a five-point scale (0-4) for 12 flavor characteristics. To visualize the data set and clusterings, we make use of the free Microsoft Chart Controls for .NET, which provide a basic set of charts.

In this post, the last in the series, we’ll look at how NMath provides a Monte Carlo method for performing multiple non-negative matrix factorization (NMF) clusterings using different random starting conditions, and combining the results.

NMF uses an iterative algorithm with random starting values for W and H. This, coupled with the fact that the factorization is not unique, means that if you cluster the columns of V multiple times, you may get different final clusterings. The consensus matrix is a way to average multiple clusterings, to produce a probability estimate that any pair of columns will be clustered together.
To compute the consensus matrix, the columns of V are clustered using NMF n times. Each clustering yields a connectivity matrix. Recall that the connectivity matrix is a symmetric matrix whose i, jth entry is 1 if columns i and j of V are clustered together, and 0 if they are not. The consensus matrix is also a symmetric matrix, whose i, jth entry is formed by taking the average of the i, jth entries of the n connectivity matrices.
Thus, each i, jth entry of the consensus matrix is a value between 0, when columns i and j are not clustered together on any of the runs, and 1, when columns i and j were clustered together on all runs. The i, jth entry of a consensus matrix may be considered, in some sense, a “probability” that columns i and j belong to the same cluster.

NMF uses an iterative algorithm with random starting values for W and H. (See Part IV for more information on NMF.) This, coupled with the fact that the factorization is not unique, means that if you cluster the columns of V multiple times, you may get different final clusterings. The consensus matrix is a way to average multiple clusterings, to produce a probability estimate that any pair of columns will be clustered together.
(more…)

Share

Cluster Analysis, Part IV: Non-negative Matrix Factorization (NMF)

Wednesday, January 6th, 2010

In this continuing series, we explore the NMath Stats functions for performing cluster analysis. (For previous posts, see Part 1 – PCA , Part 2 – K-Means, and Part 3 – Hierarchical.) The sample data set we’re using classifies 89 single malt scotch whiskies on a five-point scale (0-4) for 12 flavor characteristics. To visualize the data set and clusterings, we make use of the free Microsoft Chart Controls for .NET, which provide a basic set of charts.

In this post, we’ll cluster the scotches using non-negative matrix factorization (NMF). NMF approximately factors a matrix V into two matrices, W and H:

wh

If V in an n x m matrix, then NMF can be used to approximately factor V into an n x r matrix W and an r x m matrix H. Usually r is chosen to be much smaller than either m or n, for dimension reduction. Thus, each column of V is approximated by a linear combination of the columns of W, with the coefficients being the corresponding column H. This extracts underlying features of the data as basis vectors in W, which can then be used for identification, clustering, and compression.
(more…)

Share

Non-negative Matrix Factorization in NMath, Part 1

Friday, January 9th, 2009

A couple of years ago, we were asked by a customer to provide an implementation of an algorithm called Non-negative Matrix Factorization (NMF). We did a basic implementation, which we later included in our NMath Stats library. I kind of forgot about it until we recently heard from a prospective NMath customer who wanted to use NMF for grouping, or clustering. Talking with this customer rekindled my interest in NMF and we decided to provide additional functionality built on the existing NMF code to facilitate using the NMF for clustering.

This entry will proceed in three parts. The first will give a brief introduction to NMF and its uses, the second will briefly cover how to compute the factorization, and the third will cover how NMF can be used for clustering.

The Non-negative Matrix Factorization

Given a non-negative m-row by n-column matrix A, a non-negative matrix factorization of A is a non-negative n-row by k-column matrix W and a non-negative k-row by m-column matrix H whose product approximates the matrix A.

A ~ WH

The non-negativity of the elements of W and H are crucial, and are what make this problem a bit different. The entries of A usually represent some quantity for which negative numbers make no sense. For instance, the numbers, aij, in A might be counts of the ith term in the jth document, or the ith pixel value in the jth image.

So, why is this useful? Of course, it depends on the particular application, but the basic idea is dimension reduction. In general, NMF is used only when the matrix A is large. In the image pixel value example, where each column of the matrix A contains the pixel values of a particular image, the number of rows will be quite large, as may be the number of columns. When we do an NMF of A and make k much smaller than the number of rows or columns in A, the factorization yields a representation of each column of A as a linear combination of the k columns of W, with the coefficients coming from H.

For example, suppose I have 300 facial images (pictures of people’s faces). Each image is encoded as 50,000 pixel values. I arrange these into a 50,000 x 300 matrix A. 50,000 is a fairly large number, and if I am looking at each column of A as a vector, it’s a vector with 50,000 coordinates. Let’s do a NMF on A with k = 7. Now, each image (column in A) can be approximated by a linear combination of these 7 basis images. If the approximation is good, these 7 basis images, which are the columns of W, must represent a good chunk of the information in the original 300 images, and we have reduced the dimension of the space we are working in from 50,000 down to 7. Indeed, in this particular application it was found that the columns of W represented facial characteristics

Share