NMF Archives - CenterSpace

Cluster Analysis, Part V: Monte Carlo NMF

Ken Baldwin — Mon, 11 Jan 2010 03:59:51 +0000

In this continuing series, we explore the NMath Stats functions for performing cluster analysis. (For previous posts, see Part 1 – PCA , Part 2 – K-Means, Part 3 – Hierarchical, and Part 4 – NMF.) The sample data set we’re using classifies 89 single malt scotch whiskies on a five-point scale (0-4) for 12 flavor characteristics. To visualize the data set and clusterings, we make use of the free Microsoft Chart Controls for .NET, which provide a basic set of charts.

In this post, the last in the series, we’ll look at how NMath provides a Monte Carlo method for performing multiple non-negative matrix factorization (NMF) clusterings using different random starting conditions, and combining the results.

NMF uses an iterative algorithm with random starting values for W and H. This, coupled with the fact that the factorization is not unique, means that if you cluster the columns of V multiple times, you may get different final clusterings. The consensus matrix is a way to average multiple clusterings, to produce a probability estimate that any pair of columns will be clustered together.

To compute the consensus matrix, the columns of V are clustered using NMF n times. Each clustering yields a connectivity matrix. Recall that the connectivity matrix is a symmetric matrix whose i, jth entry is 1 if columns i and j of V are clustered together, and 0 if they are not. The consensus matrix is also a symmetric matrix, whose i, jth entry is formed by taking the average of the i, jth entries of the n connectivity matrices.

Thus, each i, jth entry of the consensus matrix is a value between 0, when columns i and j are not clustered together on any of the runs, and 1, when columns i and j were clustered together on all runs. The i, jth entry of a consensus matrix may be considered, in some sense, a “probability” that columns i and j belong to the same cluster.

NMF uses an iterative algorithm with random starting values for W and H. (See Part IV for more information on NMF.) This, coupled with the fact that the factorization is not unique, means that if you cluster the columns of V multiple times, you may get different final clusterings. The consensus matrix is a way to average multiple clusterings, to produce a probability estimate that any pair of columns will be clustered together.

To compute the consensus matrix, the columns of V are clustered using NMF n times. Each clustering yields a connectivity matrix. Recall that the connectivity matrix is a symmetric matrix whose i, jth entry is 1 if columns i and j of V are clustered together, and 0 if they are not. The consensus matrix is also a symmetric matrix, whose i, jth entry is formed by taking the average of the i, jth entries of the n connectivity matrices. The i, jth entry of a consensus matrix may be considered a “probability” that columns i and j belong to the same cluster.

NMath Stats provides class NMFConsensusMatrix for computing a consensus matrix. NMFConsensusMatrix is parameterized on the NMF update algorithm to use. Additional constructor parameters specify the matrix to factor, the order k of the NMF factorization (the number of columns in W), and the number of clustering runs. The consensus matrix is computed at construction time, so be aware that this may be an expensive operation.

For example, the following C# code creates a consensus matrix for 100 runs, clustering the scotch data (loaded into a dataframe in Part I) into four clusters:

int k = 4;
int numberOfRuns = 100;
NMFConsensusMatrix consensusMatrix =
  new NMFConsensusMatrix(
    data.ToDoubleMatrix().Transpose(),
    k,
    numberOfRuns);

Console.WriteLine("{0} runs out of {1} converged.",
  consensusMatrix.NumberOfConvergedRuns, numberOfRuns);

The output is:

100 runs out of 100 converged.

NMFConsensusMatrix provides a standard indexer for getting the element value at a specified row and column in the consensus matrix. For instance, one of the goals of Young et al. was to identify single malts that are particularly good representatives of each cluster. This information could be used, for example, to purchase a representative sampling of scotches. As described in Part IV, they reported that these whiskies were the closest to each flavor profile:

Glendronach and Macallan
Tomatin and Speyburn
AnCnoc and Miltonduff
Ardbeg and Clynelish

The consensus matrix reveals, however, that the pairings are not equally strong:

Console.WriteLine("Probability that Glendronach is clustered with Macallan = {0}",
  consensusMatrix[data.IndexOfKey("Glendronach"), data.IndexOfKey("Macallan")]);
Console.WriteLine("Probability that Tomatin is clustered with Speyburn = {0}",
  consensusMatrix[data.IndexOfKey("Tomatin"), data.IndexOfKey("Speyburn")]);
Console.WriteLine("Probability that AnCnoc is clustered with Miltonduff = {0}",
  consensusMatrix[data.IndexOfKey("AnCnoc"), data.IndexOfKey("Miltonduff")]);
Console.WriteLine("Probability that Ardbeg is clustered with Clynelish = {0}",
  consensusMatrix[data.IndexOfKey("Ardbeg"), data.IndexOfKey("Clynelish")]);

The output is:

Probability that Glendronach is clustered with Macallan = 1
Probability that Tomatin is clustered with Speyburn = 0.4
Probability that AnCnoc is clustered with Miltonduff = 0.86
Probability that Ardbeg is clustered with Clynelish = 1

Thus, although Glendronach and Macallan are clustered together in all 100 runs, Tomatin and Speyburn are only clustered together 40% of the time.

A consensus matrix, C, can itself be used to cluster objects, by perform a hierarchical cluster analysis using the distance function:

For example, this C# code creates an hierarchical cluster analysis using this distance function, then cuts the tree at the level of four clusters, printing out the cluster members:

DoubleMatrix colNumbers = new DoubleMatrix(consensusMatrix.Order, 1, 0, 1);
string[] names = data.StringRowKeys;

Distance.Function distance =
  delegate(DoubleVector data1, DoubleVector data2)
  {
    int i = (int)data1[0];
    int j = (int)data2[0];
    return 1.0 - consensusMatrix[i, j];
  };

ClusterAnalysis ca = new ClusterAnalysis(colNumbers, distance, Linkage.WardFunction);

int k = 4;
ClusterSet cs = ca.CutTree(k);
for (int clusterNumber = 0; clusterNumber < cs.NumberOfClusters; clusterNumber++)
{
  int[] members = cs.Cluster(clusterNumber);
  Console.Write("Objects in cluster {0}: ", clusterNumber);
  for (int i = 0; i < members.Length; i++)
  {
    Console.Write("{0} ", names[members[i]]);
  }
  Console.WriteLine("\n");
}

The output is:

Objects in cluster 0:
Aberfeldy Auchroisk Balmenach Dailuaine Glendronach
Glendullan Glenfarclas Glenrothes Glenturret Macallan
Mortlach RoyalLochnagar Tomore 

Objects in cluster 1:
Aberlour ArranIsleOf Belvenie BenNevis Benriach Benromach
Bladnoch BlairAthol Bowmore Craigallechie Dalmore
Dalwhinnie Deanston GlenElgin GlenGarioch GlenKeith
GlenOrd Glenkinchie Glenlivet Glenlossie Inchgower
Knochando Linkwood OldFettercairn RoyalBrackla
Speyburn Teaninich Tomatin Tomintoul Tullibardine 

Objects in cluster 2:
AnCnoc Ardmore Auchentoshan Aultmore Benrinnes
Bunnahabhain Cardhu Craigganmore Dufftown Edradour
GlenGrant GlenMoray GlenSpey Glenallachie Glenfiddich
Glengoyne Glenmorangie Loch Lomond Longmorn
Mannochmore Miltonduff Scapa Speyside Strathisla
Strathmill Tamdhu Tamnavulin Tobermory 

Objects in cluster 3:
Ardbeg Balblair Bruichladdich Caol Ila Clynelish
GlenDeveronMacduff GlenScotia Highland Park
Isle of Jura Lagavulin Laphroig Oban OldPulteney
Springbank Talisker

Once again using the cluster assignments to color the objects in the plane of the first two principal components, we can see the grouping represented by the consensus matrix (k=4).

Well, this concludes are tour through the NMath clustering functionality. Techniques such as principal component analysis, k-means clustering, hierarchical cluster analysis, and non-negative matrix factorization can all be applied to data such as these to explore various clusterings. Choosing among these approaches is ultimately a matter of domain knowledge and performance requirements. Is it appropriate to cluster based on distance in the original space, or should dimension reduction be applied? If dimension reduction is used, are negative component parameters meaningful? Are there sufficient computational resource available to construct a complete hierarchical cluster tree, or should a k-means approach be used? If an hierarchical cluster tree is computed, what distance and linkage function should be used? NMath provides a powerful, flexible set of clustering tools for data mining and data analysis.

Ken

References

Young, S.S., Fogel, P., Hawkins, D. M. (unpublished manuscript). “Clustering Scotch Whiskies using Non-Negative Matrix Factorization”. Retrieved December 15, 2009 from http://niss.org/sites/default/files/ScotchWhisky.pdf.

The post Cluster Analysis, Part V: Monte Carlo NMF appeared first on CenterSpace.

Cluster Analysis, Part IV: Non-negative Matrix Factorization (NMF)

Ken Baldwin — Wed, 06 Jan 2010 16:48:14 +0000

In this continuing series, we explore the NMath Stats functions for performing cluster analysis. (For previous posts, see Part 1 – PCA , Part 2 – K-Means, and Part 3 – Hierarchical.) The sample data set we’re using classifies 89 single malt scotch whiskies on a five-point scale (0-4) for 12 flavor characteristics. To visualize the data set and clusterings, we make use of the free Microsoft Chart Controls for .NET, which provide a basic set of charts.

In this post, we’ll cluster the scotches using non-negative matrix factorization (NMF). NMF approximately factors a matrix V into two matrices, W and H:

If V in an n x m matrix, then NMF can be used to approximately factor V into an n x r matrix W and an r x m matrix H. Usually r is chosen to be much smaller than either m or n, for dimension reduction. Thus, each column of V is approximated by a linear combination of the columns of W, with the coefficients being the corresponding column H. This extracts underlying features of the data as basis vectors in W, which can then be used for identification, clustering, and compression.

Earlier in this series, we used principal component analysis (PCA) as a means of dimension reduction for the purposes of visualizing the scotch data. NMF differs from PCA in two important respects:

NMF enforces the constraint that the factors W and H must be non-negative-that is, all elements must be equal to or greater than zero. By not allowing negative entries in W and H, NMF enables a non-subtractive combination of the parts to form a whole, and in some contexts, more meaningful basis vectors. In the scotch data, for example, what would it mean for a scotch to have a negative value for a flavor charactistic?
NMF does not require the basis vectors to be orthogonal. If we are using NMF to extract meaningful underlying components of the data, there is no a priori reason to require the components to be orthogonal.

Let’s begin by reproducing the NMF analysis of the scotch data presented in Young et al.. The authors performed NMF with r=4, to identify four major flavor factors in scotch whiskies, and then asked whether there are single malts that appear to be relatively pure embodiments of these four flavor profiles.

NMath Stats provides class NMFClustering for performing data clustering using iterative nonnegative matrix factorization (NMF), where each iteration step produces a new W and H. At each iteration, each column v of V is placed into a cluster corresponding to the column w of W which has the largest coefficient in H. That is, column v of V is placed in cluster i if the entry hij in H is the largest entry in column hj of H. Results are returned as an adjacency matrix whose i, jth value is 1 if columns i and j of V are in the same cluster, and 0 if they are not. Iteration stops when the clustering of the columns of the matrix V stabilizes.

NMFClustering is parameterized on the NMF update algorithm to use. For instance:

NMFClustering nmf =
  new NMFClustering();

This specifies the divergence update algorithm, which minimizes a divergence functional related to the Poisson likelihood of generating V from W and H. (For more information, see Brunet, Jean-Philippe et al. , 2004.)

The Factor() method performs the actual iterative factorization. The following C# code clusters the scotch data (loaded into a dataframe in Part I) into four clusters:

int k = 4;

// specify starting conditions (optional)
int seed = 1973;
RandGenUniform rnd = new RandGenUniform(seed);
DoubleMatrix starting_W = new DoubleMatrix(data.Cols, k, rnd);
DoubleMatrix starting_H = new DoubleMatrix(k, data.Rows, rnd);

nmf.Factor(data.ToDoubleMatrix().Transpose(),
           k,
           starting_W,
           starting_H);
Console.WriteLine("Factorization converged in {0} iterations.\n",
                   nmf.Iterations);

There are a couple things to note in this code:

By default, NMFact uses random starting values for W and H. This, coupled with the fact that the factorization is not unique, means that if you cluster the columns of V multiple times, you may get different final clusterings. In order to reproduce the results in Young et al. the code above specifies a particular random seed for the initial conditions.
The scotch data needs to be transposed before clustering, since NMFClustering requires each object to be clustered to be a column in the input matrix.

The output is:

Factorization converged in 530 iterations.

We can examine the four flavor factors (columns of W) to see what linear combination of the original flavor characteristics each represents. The following code orders each factor, normalized so the largest value is 1.0, similar to the data shown in Table 1 of Young et al.:

ReproduceTable1(nmf.W, data.ColumnHeaders);

private static void ReproduceTable1(DoubleMatrix W,
  object[] rowKeys)
{
  // normalize
  for (int i = 0; i < W.Cols; i++)
  {
    W[Slice.All, i] /= NMathFunctions.MaxValue(W.Col(i));
  }

  // Create data frame to hold W
  string[] factorNames = GetFactorNames(W.Cols);
  DataFrame df_W = new DataFrame(W, factorNames);
  df_W.SetRowKeys(rowKeys);

  // Print out sorted columns
  for (int i = 0; i < df_W.Cols; i++)
  {
    df_W.SortRows(new int[] { i },
                         new SortingType[] { SortingType.Descending });
    Console.WriteLine(df_W[Slice.All, new Slice(i, 1)]);
    Console.WriteLine();
  }
  Console.WriteLine();
}

The output is:

#	Factor 0
Fruity	1.0000
Floral	0.8681
Sweetness	0.8292
Malty	0.6568
Nutty	0.5855
Body	0.4295
Smoky	0.2805
Honey	0.2395
Spicy	0.0000
Winey	0.0000
Tobacco	0.0000
Medicinal	0.0000 

#	Factor 1
Winey	1.0000
Body	0.6951
Nutty	0.5078
Sweetness	0.4257
Honey	0.3517
Malty	0.3301
Fruity	0.2949
Smoky	0.2631
Spicy	0.0000
Floral	0.0000
Tobacco	0.0000
Medicinal	0.0000 

#	Factor 2
Spicy	1.0000
Honey	0.4885
Sweetness	0.4697
Floral	0.4301
Smoky	0.3508
Malty	0.3492
Body	0.3160
Fruity	0.0036
Nutty	0.0000
Winey	0.0000
Tobacco	0.0000
Medicinal	0.0000 

#	Factor 3
Medicinal	1.0000
Smoky	0.8816
Body	0.7873
Spicy	0.3936
Sweetness	0.3375
Malty	0.3069
Nutty	0.2983
Fruity	0.2441
Tobacco	0.2128
Floral	0.0000
Winey	0.0000
Honey	0.0000

Thus:

Factor 0 contains Fruity, Floral, and Sweetness flavors.
Factor 1 emphasizes the Winey flavor.
Factor 2 contains Spicy and Honey flavors.
Factor 3 contains Medicinal and Smokey flavors.

The objects are placed into clusters corresponding to the column of W which has the largest coefficient in H. The following C# code prints out the contents of each cluster, ordered by largest coefficient, after normalizing so the sum of each component is 1.0:

ReproduceTable2(nmf.H, data.RowKeys, nmf.ClusterSet);

private static void ReproduceTable2(DoubleMatrix H, object[] rowKeys, ClusterSet cs)
{
  // normalize
  for (int i = 0; i < H.Rows; i++)
  {
    H[i, Slice.All] /= NMathFunctions.Sum(H.Row(i));
  }

  // Create data frame to hold H
  string[] factorNames = GetFactorNames(H.Rows);
  DataFrame df_H = new DataFrame(H.Transpose(), factorNames);
  df_H.SetRowKeys(rowKeys);

  // Print information on each cluster
  for (int clusterNumber = 0; clusterNumber < cs.NumberOfClusters; clusterNumber++)
  {
    int[] members = cs.Cluster(clusterNumber);
    int factor = NMathFunctions.MaxIndex(H.Col(members[0]));
    Console.WriteLine("Cluster {0} ordered by {1}: ", clusterNumber, factorNames[factor]);

    DataFrame cluster = df_H[new Subset(members), Slice.All];
    cluster.SortRows(new int[] { factor }, new SortingType[] { SortingType.Descending });

    Console.WriteLine(cluster);
    Console.WriteLine();
  }
}

The output is:

Cluster 0 ordered by Factor 1:
#	        Factor 0	Factor 1	Factor 2	Factor 3
Glendronach	0.0000	0.0567	0.0075	0.0000
Macallan	0.0085	0.0469	0.0083	0.0000
Balmenach	0.0068	0.0395	0.0123	0.0000
Dailuaine	0.0070	0.0317	0.0164	0.0000
Mortlach	0.0060	0.0316	0.0240	0.0000
Tomore	        0.0000	0.0308	0.0000	0.0000
RoyalLochnagar	0.0104	0.0287	0.0164	0.0000
Glenrothes	0.0054	0.0280	0.0081	0.0000
Glenfarclas	0.0127	0.0279	0.0164	0.0000
Auchroisk	0.0103	0.0267	0.0099	0.0000
Aberfeldy	0.0125	0.0238	0.0117	0.0000
Strathisla	0.0162	0.0229	0.0151	0.0000
Glendullan	0.0140	0.0228	0.0102	0.0000
BlairAthol	0.0111	0.0211	0.0166	0.0000
Dalmore	        0.0088	0.0208	0.0114	0.0204
Ardmore	        0.0104	0.0182	0.0118	0.0000 

Cluster 1 ordered by Factor 2:
#	        Factor 0	Factor 1	Factor 2	Factor 3
Tomatin	        0.0000	0.0170	0.0306	0.0000
Aberlour	0.0136	0.0260	0.0282	0.0000
Belvenie	0.0087	0.0123	0.0262	0.0000
GlenGarioch	0.0079	0.0086	0.0252	0.0000
Speyburn	0.0115	0.0000	0.0244	0.0000
BenNevis	0.0202	0.0000	0.0242	0.0000
Bowmore	        0.0049	0.0109	0.0225	0.0186
Inchgower	0.0104	0.0000	0.0218	0.0118
Craigallechie	0.0131	0.0098	0.0216	0.0136
Tomintoul	0.0085	0.0083	0.0214	0.0000
Benriach	0.0150	0.0000	0.0214	0.0000
Glenlivet	0.0125	0.0176	0.0205	0.0000
Glenturret	0.0080	0.0228	0.0203	0.0000
Benromach	0.0132	0.0140	0.0198	0.0000
Glenkinchie	0.0112	0.0000	0.0190	0.0000
OldFettercairn	0.0068	0.0137	0.0182	0.0160
Knochando	0.0131	0.0133	0.0179	0.0000
GlenOrd	        0.0118	0.0128	0.0175	0.0000
Glenlossie	0.0143	0.0000	0.0167	0.0000
GlenDeveronMacduff	0.0000	0.0156	0.0158	0.0216
GlenKeith	0.0108	0.0146	0.0145	0.0000
ArranIsleOf	0.0073	0.0086	0.0127	0.0125
GlenSpey	0.0086	0.0091	0.0119	0.0000 

Cluster 2 ordered by Factor 0:
#	        Factor 0	Factor 1	Factor 2	Factor 3
AnCnoc	        0.0294	0.0000	0.0000	0.0000
Miltonduff	0.0242	0.0000	0.0000	0.0000
Aultmore	0.0242	0.0000	0.0000	0.0000
Longmorn	0.0214	0.0141	0.0089	0.0000
Cardhu	        0.0204	0.0000	0.0094	0.0000
Auchentoshan	0.0203	0.0000	0.0065	0.0000
Strathmill	0.0203	0.0000	0.0125	0.0000
Edradour	0.0195	0.0172	0.0092	0.0000
Tobermory	0.0190	0.0000	0.0000	0.0000
Glenfiddich	0.0190	0.0000	0.0000	0.0000
Tamnavulin	0.0189	0.0000	0.0148	0.0000
Dufftown	0.0189	0.0000	0.0000	0.0147
Craigganmore	0.0184	0.0000	0.0030	0.0254
Speyside	0.0182	0.0138	0.0000	0.0000
Glenallachie	0.0178	0.0000	0.0108	0.0000
Dalwhinnie	0.0174	0.0000	0.0172	0.0000
GlenMoray	0.0174	0.0079	0.0157	0.0000
Tamdhu	        0.0172	0.0124	0.0000	0.0000
Glengoyne	0.0170	0.0090	0.0065	0.0000
Benrinnes	0.0158	0.0196	0.0161	0.0000
GlenElgin	0.0155	0.0107	0.0133	0.0000
Bunnahabhain	0.0148	0.0075	0.0078	0.0110
Glenmorangie	0.0143	0.0000	0.0123	0.0166
Scapa	        0.0140	0.0128	0.0089	0.0127
Bladnoch	0.0137	0.0063	0.0088	0.0000
Linkwood	0.0129	0.0165	0.0092	0.0000
Mannochmore	0.0124	0.0126	0.0081	0.0000
GlenGrant	0.0122	0.0121	0.0000	0.0000
Deanston	0.0119	0.0151	0.0122	0.0000
Loch Lomond	0.0105	0.0000	0.0094	0.0130
Tullibardine	0.0099	0.0093	0.0098	0.0138 

Cluster 3 ordered by Factor 3:
#	        Factor 0	Factor 1	Factor 2	Factor 3
Ardbeg	        0.0000	0.0000	0.0000	0.0906
Clynelish	0.0001	0.0000	0.0000	0.0855
Lagavulin	0.0000	0.0138	0.0000	0.0740
Laphroig	0.0000	0.0082	0.0000	0.0731
Talisker	0.0030	0.0000	0.0129	0.0706
Caol Ila	0.0048	0.0000	0.0019	0.0694
Oban	        0.0067	0.0000	0.0008	0.0564
OldPulteney	0.0114	0.0073	0.0000	0.0429
Isle of Jura	0.0079	0.0000	0.0059	0.0352
Balblair	0.0125	0.0000	0.0074	0.0297
Springbank	0.0000	0.0142	0.0189	0.0282
RoyalBrackla	0.0122	0.0078	0.0135	0.0276
GlenScotia	0.0096	0.0144	0.0000	0.0275
Bruichladdich	0.0100	0.0098	0.0140	0.0249
Teaninich	0.0081	0.0000	0.0111	0.0216
Highland Park	0.0050	0.0145	0.0146	0.0211

These data are very similar to those shown in Table 2 in Young et al. According to their analysis, the most representative malts in each cluster are:

Glendronach and Macallan
Tomatin and Speyburn
AnCnoc and Miltonduff
Ardbeg and Clynelish

As you can see, these scotches are at, or very near, the top of each ordered cluster in the output above.

Finally, it is interesting to view the clusters found by NMF in the same plane of the first two principal components that we have looked at previously.

If you compare this plot to that produced by k-means clustering or hierarchical cluster analysis, you can see how different the results are. We are no longer clustering based on “similarity” in the original 12-dimensional flavor space (of which this is a view). Instead, we’ve used a reduced set of synthetic dimensions which capture underlying features in the data.

In order to produce results similar to those of Young et al. we explicitly specified a random seed to the NMF process. With different seeds, somewhat different final clusterings can occur. In the final post in this series, we’ll look at how NMath provides a Monte Carlo method for performing multiple NMF clusterings using different random starting conditions, and combining the results.

Ken

References

Brunet, Jean-Philippe et al. (2004). “Metagenes and Molecular Pattern Discovery Using Matrix Factorization”, Proceedings of the National Academy of Sciences 101, no. 12 (March 23, 2004): 4164-4169.

The post Cluster Analysis, Part IV: Non-negative Matrix Factorization (NMF) appeared first on CenterSpace.

Non-negative Matrix Factorization in NMath, Part 1

Steve Sneller — Fri, 09 Jan 2009 22:48:55 +0000

A couple of years ago, we were asked by a customer to provide an implementation of an algorithm called Non-negative Matrix Factorization (NMF). We did a basic implementation, which we later included in our NMath Stats library. I kind of forgot about it until we recently heard from a prospective NMath customer who wanted to use NMF for grouping, or clustering. Talking with this customer rekindled my interest in NMF and we decided to provide additional functionality built on the existing NMF code to facilitate using the NMF for clustering.

This entry will proceed in three parts. The first will give a brief introduction to NMF and its uses, the second will briefly cover how to compute the factorization, and the third will cover how NMF can be used for clustering.

The Non-negative Matrix Factorization

Given a non-negative m-row by n-column matrix A, a non-negative matrix factorization of A is a non-negative n-row by k-column matrix W and a non-negative k-row by m-column matrix H whose product approximates the matrix A.

A ~ WH

The non-negativity of the elements of W and H are crucial, and are what make this problem a bit different. The entries of A usually represent some quantity for which negative numbers make no sense. For instance, the numbers, a_ij, in A might be counts of the i^th term in the j^th document, or the i^th pixel value in the j^th image.

So, why is this useful? Of course, it depends on the particular application, but the basic idea is dimension reduction. In general, NMF is used only when the matrix A is large. In the image pixel value example, where each column of the matrix A contains the pixel values of a particular image, the number of rows will be quite large, as may be the number of columns. When we do an NMF of A and make k much smaller than the number of rows or columns in A, the factorization yields a representation of each column of A as a linear combination of the k columns of W, with the coefficients coming from H.

For example, suppose I have 300 facial images (pictures of people’s faces). Each image is encoded as 50,000 pixel values. I arrange these into a 50,000 x 300 matrix A. 50,000 is a fairly large number, and if I am looking at each column of A as a vector, it’s a vector with 50,000 coordinates. Let’s do a NMF on A with k = 7. Now, each image (column in A) can be approximated by a linear combination of these 7 basis images. If the approximation is good, these 7 basis images, which are the columns of W, must represent a good chunk of the information in the original 300 images, and we have reduced the dimension of the space we are working in from 50,000 down to 7. Indeed, in this particular application it was found that the columns of W represented facial characteristics

The post Non-negative Matrix Factorization in NMath, Part 1 appeared first on CenterSpace.