What is clustering ?
- The organization of unlabeled data into similarity groups called clusters
- A cluster is a collection of data items which are “similar” between them, and “dissimilar” to data items in other clusters

K-Means clustering
- K-means (MacQueen, 1967) is a partitional clustering algorithm
- Let the set of data points D be {x1 , x2 , …, xn },
where xi = (xi1 , xi2 , …, xir) is a vector in X ⊆ R r , and r is the number of dimensions
- The k-means algorithm partitions the given data into k clusters:
- Each cluster has a cluster center, called centroid.
- – k is specified by the user