Blog

Posts Tagged ‘clustering C#’

Clustering Analysis, Part I: Principal Component Analysis (PCA)

Tuesday, December 15th, 2009
Cluster analysis or clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense.

Cluster analysis is the assignment of a set of objects into one or more clusters based on object similarity. NMath Stats includes a variety of techniques for performing cluster analysis, which we will explore in a series of posts.

The Data Set

The data set we’ll use was created by David Wishart (2002), who classified 89 single malt scotch whiskies on a five-point scale (0-4) for 12 flavor characteristics: Body, Sweetness, Smoky, Medicinal, Tobacco, Honey, Spicy, Winey, Nutty, Malty, Fruity, Floral. Wishart provides clusterings of the whiskies into 4, 6, and 10 clusters. Young et al. (unpublished manuscript) demonstrate a further clustering into 4 clusters using non-negative matrix factorization (NMF). Both the Young et al. paper and the original data set are available here. (more…)

Share