K-Medoids Clustering

Watch the K-medoids algorithm iteratively assign data points to clusters and update medoid positions. Based on my SC2300 HW2 implementation with support for L1, L2, and Cosine distance metrics.

Iteration:0 / 3

Ringed points are medoids (cluster representatives)

Clustering Stats

Iteration0

Objective6.50

ConvergedNo

Points45

Objective

Cluster Sizes

Cluster 1

Cluster 2

Cluster 3

Configuration

k = 3

Points/cluster: 15

Seed: 42

Distance Metric

Initialization

The Algorithm

K-medoids is a robust variant of K-means that uses actual data points (medoids) as cluster centers instead of computed centroids. This makes it more robust to outliers and applicable to any distance metric.

# K-Medoids Algorithm

repeat until convergence:

1. Assign each point to nearest medoid

2. Update each medoid to the cluster

member minimizing total within-cluster distance

Key Concepts

Medoids vs Centroids

Unlike K-means which computes mean centroids, K-medoids selects actual data points as representatives. This makes the algorithm work with any distance metric, not just Euclidean.

K-Medoids++ Init

Similar to K-means++, initial medoids are selected with probability proportional to squared distance from existing medoids. This spreads initial centers and typically leads to faster convergence.

Distance Metrics

L2 (Euclidean) measures straight-line distance. L1 (Manhattan) sums absolute differences. Cosine measures angular similarity, useful for text data where direction matters more than magnitude.

Multi-Trial Selection

Since K-medoids can converge to local optima, the HW2 implementation runs multiple independent trials and selects the result with the lowest objective value (best-of-T strategy).

Implementation Details

Parameter	Value
Distance Metrics	L1, L2, Cosine
Initialization	Random, K-Medoids++
Datasets (HW2)	Wine, Digits, 20 Newsgroups
Evaluation	NMI, Elbow Method
Course	SC2300 Introduction to ML