# Clustering
Slides: [[lec6_clustering.pdf|clustering]]
## 1 $K$-means Clustering
![[Pasted image 20230418131737.png|400]]
Input: Points $x_1, \ldots, x_n \in \mathbb{R}^d$; integer $k$
Output: "Centers", or representatives, $\mu_1, \ldots, \mu_k \in \mathbb{R}^d$
Goal: Minimize average squared distance between points and their nearest representatives:
$
F
$
**Lloyd's $k$-means algorithm**: NP-hard optimization problem. Heuristic: “k-means algorithm”. (unfavorable loss landscape with lots of local minima).
![[Pasted image 20230418131901.png|400]]
Each iteration is guaranteed to reduce the cost ⇒ guarantees convergence to a local optimum.
### 1.2 Initialization matters!
- Practice 1: Choose $k$ data points at random as the initial centers.
- Practice 2: Start with extra centers, then prune later
- **Practice 3: $k-\text{means}_{++}$**
![[Pasted image 20230418132826.png|400]]
- Time complexity: $O(knd)$
### 1.3 Clustering: use cases
1. **Vector quantization**: Find a finite set of representatives that provides good coverage of a complex, possibly infinite, high-dimensional space.
2. Finding **meaningful structure** in data: e.g. salient grouping.
#example Represent images using $k$-means codewords
![[3_clustering_projection 2023-04-20 12.56.41.excalidraw.svg|700]]
### 1.4 Hierarchical clustering
Avoid the problem of choosing appropriate $k$:
***The linkage algorithm***
![[3_clustering_projection 2023-04-20 13.21.37.excalidraw|700]]
### 1.5 Linkage methods
$\begin{align*}
\begin{array}{c|c}
\text{Single linkage}& \operatorname{dist}\left(C, C^{\prime}\right)=\min _{x \in C, x^{\prime} \in C^{\prime}}\left\|x-x^{\prime}\right\|\\
\text{Complete linkage} &\operatorname{dist}\left(C, C^{\prime}\right)=\max _{x \in C, x^{\prime} \in C^{\prime}}\left\|x-x^{\prime}\right\|
\end{array}
\end{align*}$
Variants:
1. Avg. pari-wise distance: $\operatorname{dist}\left(C, C^{\prime}\right)=\frac{1}{|C| \cdot\left|C^{\prime}\right|} \sum_{x \in C} \sum_{x^{\prime} \in C^{\prime}}\left\|x-x^{\prime}\right\|$
2. Distance btw. cluster centers: $\operatorname{dist}\left(C, C^{\prime}\right)=\left\|\operatorname{mean}(C)-\operatorname{mean}\left(C^{\prime}\right)\right\|$
3. ***Ward's method***: The increase in k-means cost occasioned by merging the two clusters
$\begin{align*}
\operatorname{dist}\left(C, C^{\prime}\right)=\frac{|C| \cdot\left|C^{\prime}\right|}{|C|+\left|C^{\prime}\right|}\left\|\operatorname{mean}(C)-\operatorname{mean}\left(C^{\prime}\right)\right\|^2
\end{align*}$