# Clustering Slides: [[lec6_clustering.pdf|clustering]] ## 1 $K$-means Clustering ![[Pasted image 20230418131737.png|400]] Input: Points $x_1, \ldots, x_n \in \mathbb{R}^d$; integer $k$ Output: "Centers", or representatives, $\mu_1, \ldots, \mu_k \in \mathbb{R}^d$ Goal: Minimize average squared distance between points and their nearest representatives: $ F $ **Lloyd's $k$-means algorithm**: NP-hard optimization problem. Heuristic: “k-means algorithm”. (unfavorable loss landscape with lots of local minima). ![[Pasted image 20230418131901.png|400]] Each iteration is guaranteed to reduce the cost ⇒ guarantees convergence to a local optimum. ### 1.2 Initialization matters! - Practice 1: Choose $k$ data points at random as the initial centers. - Practice 2: Start with extra centers, then prune later - **Practice 3: $k-\text{means}_{++}$** ![[Pasted image 20230418132826.png|400]] - Time complexity: $O(knd)$ ### 1.3 Clustering: use cases 1. **Vector quantization**: Find a finite set of representatives that provides good coverage of a complex, possibly infinite, high-dimensional space. 2. Finding **meaningful structure** in data: e.g. salient grouping. #example Represent images using $k$-means codewords ![[3_clustering_projection 2023-04-20 12.56.41.excalidraw.svg|700]] ### 1.4 Hierarchical clustering Avoid the problem of choosing appropriate $k$: ***The linkage algorithm*** ![[3_clustering_projection 2023-04-20 13.21.37.excalidraw|700]] ### 1.5 Linkage methods $\begin{align*} \begin{array}{c|c} \text{Single linkage}& \operatorname{dist}\left(C, C^{\prime}\right)=\min _{x \in C, x^{\prime} \in C^{\prime}}\left\|x-x^{\prime}\right\|\\ \text{Complete linkage} &\operatorname{dist}\left(C, C^{\prime}\right)=\max _{x \in C, x^{\prime} \in C^{\prime}}\left\|x-x^{\prime}\right\| \end{array} \end{align*}$ Variants: 1. Avg. pari-wise distance: $\operatorname{dist}\left(C, C^{\prime}\right)=\frac{1}{|C| \cdot\left|C^{\prime}\right|} \sum_{x \in C} \sum_{x^{\prime} \in C^{\prime}}\left\|x-x^{\prime}\right\|$ 2. Distance btw. cluster centers: $\operatorname{dist}\left(C, C^{\prime}\right)=\left\|\operatorname{mean}(C)-\operatorname{mean}\left(C^{\prime}\right)\right\|$ 3. ***Ward's method***: The increase in k-means cost occasioned by merging the two clusters $\begin{align*} \operatorname{dist}\left(C, C^{\prime}\right)=\frac{|C| \cdot\left|C^{\prime}\right|}{|C|+\left|C^{\prime}\right|}\left\|\operatorname{mean}(C)-\operatorname{mean}\left(C^{\prime}\right)\right\|^2 \end{align*}$