**46.4****
****K-Means Clustering** (.NET, C#, CSharp, VB, Visual Basic, F#)

The *k*-means clustering
method assigns data points into *k*
groups such that the sum of squares from points to the computed cluster
centers is minimized. In **NMath Stats**,
class **KMeansClustering**
performs *k*-means clustering.

The algorithm used is that of Hartigan and Wong (*A K-means clustering algorithm*. Applied
Statistics 28, 100–108. 1979):

**1.**** **For
each point, move it to another cluster if that would lower the sum of
squares from points to the computed cluster centers.

**2.**** **If a
point is moved, immediately update the cluster centers of the two affected
clusters.

**3.**** **Repeat
until no points are moved, or the specified maximum number of iterations
is reached.

**Creating KMeansClustering
Objects**

A **KMeansClustering**
instance is constructed from a matrix or a dataframe containing numeric
data. Each row in the data set represents an object to be clustered.

Code Example – C# k-means clustering

var km = new KMeansClustering( data );

Code Example – VB k-means clustering

Dim KM As New KMeansClustering(Data)

After construction, you can retrieve information about
the **KMeansClustering**
data using the provided properties:

● N gets the total number of objects being clustered.

● Data gets and set the data matrix

Iteration stops when either clustering stabilizes, or the maximum number of iterations is reached. You can specify the maximum number of iterations in several ways:

● The static
DefaultMaxIterations property gets and sets
the default maximum number of iterations for instances of **KMeansClustering**. (Initially set to 1000.)

● You can
specify a non-default maximum in the **KMeansClustering**
constructor. For instance:

var km = new KMeansClustering( data, 100 );

● The MaxIterations property gets and sets the maximum
number of iterations on an existing **KMeansClustering**
instance.

The Cluster() method clusters
the data into the specified number of clusters. The method accepts either
*k*, the number of clusters, or a matrix
of initial cluster centers:

● If *k* is given, a set of distinct rows in the
data matrix are chosen as the initial centers using the algorithm specified
by a KMeanClustering.Start enumerated value.
By default, rows are chosen at random.

● If a matrix
of initial cluster centers is given, *k*
is inferred from the number of rows.

For example, this code clusters eight random vectors of length three into two clusters, using random starting cluster centers:

Code Example – C# k-means clustering

var data = new DoubleMatrix( 8, 3, new RandGenUniform() );

var cl = new KMeansClustering( data );

ClusterSet clusters = cl.Cluster( 2 );

Code Example – VB k-means clustering

Dim Data As New DoubleMatrix(8, 3, New RandGenUniform())

Dim CL As New KMeansClustering(Data)

Dim Clusters As ClusterSet = CL.Cluster(2)

This code specifies the two starting centers:

Code Example – C# k-means clustering

var centers = new DoubleMatrix("2x3 [ 0 0 0 1 1 1 ]");

ClusterSet clusters = cl.Cluster( centers );

Code Example – VB k-means clustering

Dim Centers As New DoubleMatrix("2x3 [ 0 0 0 1 1 1 ]")

Dim Clusters As ClusterSet = CL.Cluster(Centers)

The Cluster() method returns
a **ClusterSet**
object, which represents a collection of objects assigned to a finite
number of clusters. Properties on the **KMeansClustering**
instance give additional information about the clustering just performed:

● K gets the number of clusters.

● InitialCenters gets the matrix of initial cluster centers.

● FinalCenters gets the matrix of final cluster centers.

● Clusters gets the cluster assignments.

● WithinSumOfSquares gets the within-cluster sum of squares for each cluster.

● Sizes gets the number of points in each cluster.

● Iterations gets the number of iterations performed.

● MaxIterationsMet returns true if the clustering stopped because the maximum number of iterations was reached; otherwise, false.

For instance, this code clusters 30 random vectors of length three into three clusters, and prints out the results:

Code Example – C# k-means clustering

var data = new DoubleMatrix(30, 3, new RandGenUniform());

var km = new KMeansClustering(data);

km.Cluster(3);

Console.WriteLine( "k = {0}", km.K );

Console.WriteLine( "Initial cluster centers:" );

Console.WriteLine( km.InitialCenters.ToTabDelimited() );

Console.WriteLine( "{0} iterations", km.Iterations );

Console.WriteLine("Stopped because max iterations of {0} met? {1}",

km.MaxIterations, km.MaxIterationsMet);

Console.WriteLine( "Final cluster centers:" );

Console.WriteLine( km.FinalCenters.ToTabDelimited() );

Console.WriteLine( "Clustering assignments:" );

Console.WriteLine( km.Clusters );

for (int i = 0; i < km.K; i++) {

Console.WriteLine( "Cluster {0} has {1} items", i, km.Sizes[i] );

}

Code Example – VB k-means clustering

Dim Data As New DoubleMatrix(30, 3, New RandGenUniform())

Dim KM As New KMeansClustering(Data)

KM.Cluster(3)

Console.WriteLine("k = {0}", KM.K)

Console.WriteLine("Initial cluster centers:")

Console.WriteLine(KM.InitialCenters.ToTabDelimited())

Console.WriteLine("{0} iterations", KM.Iterations)

Console.WriteLine("Stopped because max iterations of {0} met? {1}",

KM.MaxIterations, KM.MaxIterationsMet)

Console.WriteLine("Final cluster centers:")

Console.WriteLine(KM.FinalCenters.ToTabDelimited())

Console.WriteLine("Clustering assignments:")

Console.WriteLine(KM.Clusters)

For I As Integer = 0 To KM.K - 1

Console.WriteLine("Cluster {0} has {1} items", I, KM.Sizes(I))

Next