Posts Tagged ‘dendrograms C#’

Drawing Dendrograms

Wednesday, February 3rd, 2010

A dendrogram is a tree diagram often used to visualize the results of hierarchical clustering. A customer recently contacted us asking for help drawing dendrograms from the output of the hierarchical clustering algorithm in NMath Stats. (For more information in hierarchical clustering in NMath Stats, see this post.)

UserZoom is an international software company specializing in web customer experience and usability testing. UserZoom software includes a tool to run online card sorting exercises with real users. Card Sorting is a technique that information architects and user experience practitioners use to explore how “real people” group items and content. UserZoom developers were using NMath hierarchical clustering to analyze card sorting results, and wanted to generate dendrograms to help researchers understand participant’s grouping criteria.

In NMath Stats, class ClusterAnalysis performs hierarchical cluster analyses. For example, the following C# code clusters 6 random vectors of length 3:

int seed = 123;
DoubleMatrix data =
  new DoubleMatrix(6, 3, new RandGenUniform(1, 10, seed));
ClusterAnalysis ca = new ClusterAnalysis(data);

The random seed is specified so we can get repeatable results.

After construction, the Linkages property gets an (n-1) x 3 matrix containing the complete hierarchical linkage tree. At each level in the tree, columns 1 and 2 contain the indices of the clusters linked to form the next cluster. Column 3 contains the distances between the clusters.


The output is:

3	4	2.11126489430827
1	5	2.50743075649221
2	6	2.59997959766694
7	8	3.61586096606758
0	9	4.37499686772813

Each object is initially assigned to its own singleton cluster, numbered 0 to 5. The analysis then proceeds iteratively, at each stage joining the two most similar clusters into a new cluster, continuing until there is one overall cluster. The first new cluster formed is assigned index 6, the second is assigned index 7, and so forth. When these indices appear later in the tree, the clusters are being combined again into a still larger cluster.