# Use Cases of SVD ## 1 Latent Semantic Indexing (LSI) for Topic Modeling (Blei 2012) ![[Pasted image 20230503113334.png|700]] Given a large corpus of $n$ documents: - Fix a *vocabulary*, say of $V$ words. - *Bag-of-words* representation for documents: each document becomes a vector of length $V$, with one coordinate per word. - The corpus is an $n \times V$ matrix, one row per document. ![[Pasted image 20230503113504.png|200]] *Goal*: Represent matrix $M$ concisely - Use SVD to get an approximation to $M$ (for small $k$) - Approximate each document as a linear combination of topics - $\Psi$ is the topic matrix. $\text{doc1}\approx\sum\limits_{j=1}^k \theta_{1j}\psi_j=\theta_1\Psi$ ![[Pasted image 20230503114914.png|600]] Think of this as a topic model with $k$ topics. - $\Psi_j$ is a vector of length $V$ describing topic $j$ : coefficient $\Psi_{j w}$ is large if word $w$ appears often in that topic. - Each document is a combination of topics: $\theta_{i j}=$ weight of topic $j$ in doc $i$. - Document $i$ originally represented by $i$-th row of $M$, a vector in $\mathbb{R}^V$. - Can instead use $\theta_i \in \mathbb{R}^k$, a more concise "semantic" representation. ## 2 Non-neg Matrix Factorization *Goal*: Again, approximate a matrix $M\in \mathbb{R}_{}^{p\times q}$ by: $\hat{M}=\underset{p\times k}{\underbrace{ U }}\underset{k\times q}{\underbrace{ W }}$ - Such that $U,W$ have non-negatives entries - And $\sum_{i,j}(M_{ij}-\hat{M}_{ij})$ is minimized - This problem is *intractable* ## 3 Collaborative Filtering (Koren et,al. 2009) *Recommender systems*: matching customers with products. - Given: data on prior purchases/interests of users - Recommend: further products of interest - Prototypical example: Netflix. ***Collaborative filtering*** (a successful approach) - Model dependencies between different products, and between different users. - Can give reasonable recommendations to a relatively new user. - Two strategies for collaborative filtering: - 1) *Neighborhood methods* - 2) *Latent factor methods*: Embed users and movies in low-dim space ### Latent factor methods Generative movie, user embeddings using SVD. User ratings are assembled in a large matrix $M$: ![[Pasted image 20230503125239.png|200]] - Not rated $=0$, otherwise scores 1-5. For $n$ users and $p$ movies, this has size $n \times p$. - Most of the entries are unavailable, and we'd like to predict these. - Idea: Get best low-rank approximation of $M$, and use to fill in the missing entries. *Method*: Use the best rank-$k$ approximation $M\approx UW^\top$ ![[Pasted image 20230503125521.png|500]] - User i's rating of movie $j$ is approximated as $M_{i j} \approx u_i \cdot w_j$ - $U$ is the user embedding matrix, $W^\top$ is the movie embedding matrix. ### Weighted SVD Given $p\times q$ matrix $M$ and a target rank $k$ and weights $\set{w_{ij}: \;1\leq i \leq p,\;1\leq j\leq q}$ . - Find a rank-$k$ approximation $\hat{M}$ s.t. $\sum_{ij}w_{ij}(M_{ij}-\hat{M}_{ij})$ is minimized.