K-Means Clustering
Cheat Sheet
Everything you need on one page. Perfect for revision, interviews, and quick reference.
Everything you need on one page. Perfect for revision, interviews, and quick reference.
Time complexity: $O(n \cdot K \cdot d \cdot t)$ where $n$ = samples, $d$ = dimensions, $t$ = iterations.
K-Means++ guarantees $O(\log K)$ competitive ratio for the objective function compared to optimal.
Silhouette score ranges from $-1$ (wrong cluster) to $+1$ (well-matched). Values near $0$ indicate overlapping clusters.
Higher CH = better. Lower DB = better. $B_K$ = between-cluster dispersion, $W_K$ = within-cluster dispersion, $S_k$ = avg distance in cluster $k$.
K-Medoids uses actual data points as centers, making it more robust to outliers than standard K-Means.
| Feature | K-Means | DBSCAN | Hierarchical | GMM |
|---|---|---|---|---|
| Cluster Shape | Spherical | Arbitrary | Arbitrary | Elliptical |
| Requires $K$ | Yes | No | Optional | Yes |
| Handles Outliers | Poor | Good (noise label) | Poor | Moderate |
| Assignment | Hard | Hard + noise | Hard | Soft (probabilistic) |
| Scalability | $O(nKd)$ | $O(n \log n)$ | $O(n^2 \log n)$ | $O(nK d^2)$ |
n_init=10) and pick the best $J$A: Minimize the Within-Cluster Sum of Squares (WCSS): $J = \sum_{k=1}^{K} \sum_{x_i \in C_k} \|x_i - \mu_k\|^2$. It measures total squared distance from each point to its assigned centroid.
A: Yes, K-Means always converges because $J$ decreases monotonically with each step. However, it converges to a local minimum, not necessarily the global optimum.
A: K-Means uses Euclidean distance. Features on larger scales will dominate the distance calculation, biasing cluster assignments. Standardization ensures all features contribute equally.
A: K-Means++ spreads initial centroids apart by choosing each subsequent centroid with probability proportional to $D(x)^2$, reducing the chance of poor local minima and improving convergence speed.
A: K-Means performs hard assignment (each point belongs to one cluster), while GMM provides soft (probabilistic) assignment. GMM can model elliptical clusters; K-Means assumes spherical clusters.
A: When clusters have arbitrary shapes, you don't know $K$ in advance, or your data contains noise/outliers. DBSCAN automatically detects the number of clusters and labels outliers as noise.
A: K-Means cannot directly handle categorical data. Use K-Modes (uses mode instead of mean) for categorical, or K-Prototypes for mixed data. One-hot encoding is an alternative but can be problematic.
A: Each data point becomes its own cluster with inertia = 0. This is trivially optimal for $J$ but provides no useful grouping. The elbow method helps find the balance between low $J$ and meaningful $K$.