K-Medoids Clustering
Watch the K-medoids algorithm iteratively assign data points to clusters and update medoid positions. Based on my SC2300 HW2 implementation with support for L1, L2, and Cosine distance metrics.
Ringed points are medoids (cluster representatives)
Clustering Stats
Objective
Cluster Sizes
Configuration
The Algorithm
K-medoids is a robust variant of K-means that uses actual data points (medoids) as cluster centers instead of computed centroids. This makes it more robust to outliers and applicable to any distance metric.
# K-Medoids Algorithm
repeat until convergence:
1. Assign each point to nearest medoid
2. Update each medoid to the cluster
member minimizing total within-cluster distance
Key Concepts
Medoids vs Centroids
Unlike K-means which computes mean centroids, K-medoids selects actual data points as representatives. This makes the algorithm work with any distance metric, not just Euclidean.
K-Medoids++ Init
Similar to K-means++, initial medoids are selected with probability proportional to squared distance from existing medoids. This spreads initial centers and typically leads to faster convergence.
Distance Metrics
L2 (Euclidean) measures straight-line distance. L1 (Manhattan) sums absolute differences. Cosine measures angular similarity, useful for text data where direction matters more than magnitude.
Multi-Trial Selection
Since K-medoids can converge to local optima, the HW2 implementation runs multiple independent trials and selects the result with the lowest objective value (best-of-T strategy).
Implementation Details
| Parameter | Value |
|---|---|
| Distance Metrics | L1, L2, Cosine |
| Initialization | Random, K-Medoids++ |
| Datasets (HW2) | Wine, Digits, 20 Newsgroups |
| Evaluation | NMI, Elbow Method |
| Course | SC2300 Introduction to ML |