The k-means clustering method assigns data points into k groups such that the sum of squares from points to the computed cluster centers is minimized. In NMath Stats, class KMeansClustering performs k-means clustering.
The algorithm used is that of Hartigan and Wong (A K-means clustering algorithm. Applied Statistics 28, 100-108. 1979):
A KMeansClustering instance is constructed from a matrix or a dataframe containing numeric data. Each row in the data set represents an object to be clustered.
KMeansClustering km = new KMeansClustering( data );
After construction, you can retrieve information about the KMeansClustering data using the provided properties:
Iteration stops when either clustering stabilizes, or the maximum number of iterations is reached. You can specify the maximum number of iterations in several ways:
KMeansClustering km = new KMeansClustering( data, 100 );
The Cluster() method clusters the data into the specified number of clusters. The method accepts either k, the number of clusters, or a matrix of initial cluster centers:
For example, this code clusters eight random vectors of length three into two clusters, using random starting cluster centers:
DoubleMatrix data = new DoubleMatrix( 8, 3, new RandGenUniform() ); KMeansClustering cl = new KMeansClustering( data ); ClusterSet clusters = cl.Cluster( 2 );
This code specifies the two starting centers:
DoubleMatrix centers = new DoubleMatrix("2x3 [ 0 0 0 1 1 1 ]"); ClusterSet clusters = cl.Cluster( centers );
The Cluster() method returns a ClusterSet object, which represents a collection of objects assigned to a finite number of clusters. Properties on the KMeansClustering instance give additional information about the clustering just performed:
For instance, this code clusters 30 random vectors of length three into three clusters, and prints out the results:
DoubleMatrix data = new DoubleMatrix(30, 3, new RandGenUniform()); KMeansClustering km = new KMeansClustering(data); km.Cluster(3); Console.WriteLine( "k = {0}", km.K ); Console.WriteLine( "Initial cluster centers:" ); Console.WriteLine( km.InitialCenters.ToTabDelimited() ); Console.WriteLine( "{0} iterations", km.Iterations ); Console.WriteLine("Stopped because max iterations of {0} met? {1}", km.MaxIterations, km.MaxIterationsMet); Console.WriteLine( "Final cluster centers:" ); Console.WriteLine( km.FinalCenters.ToTabDelimited() ); Console.WriteLine( "Clustering assignments:" ); Console.WriteLine( km.Clusters ); for (int i = 0; i < km.K; i++) { Console.WriteLine( "Cluster {0} has {1} items", i, km.Sizes[i] ); }TOC | Previous | Next | Index