# Use Cases of SVD
## 1 Latent Semantic Indexing (LSI) for Topic Modeling (Blei 2012)
![[Pasted image 20230503113334.png|700]]
Given a large corpus of $n$ documents:
- Fix a *vocabulary*, say of $V$ words.
- *Bag-of-words* representation for documents: each document becomes a vector of length $V$, with one coordinate per word.
- The corpus is an $n \times V$ matrix, one row per document.
![[Pasted image 20230503113504.png|200]]
*Goal*: Represent matrix $M$ concisely - Use SVD to get an approximation to $M$ (for small $k$)
- Approximate each document as a linear combination of topics
- $\Psi$ is the topic matrix. $\text{doc1}\approx\sum\limits_{j=1}^k \theta_{1j}\psi_j=\theta_1\Psi$
![[Pasted image 20230503114914.png|600]]
Think of this as a topic model with $k$ topics.
- $\Psi_j$ is a vector of length $V$ describing topic $j$ : coefficient $\Psi_{j w}$ is large if word $w$ appears often in that topic.
- Each document is a combination of topics: $\theta_{i j}=$ weight of topic $j$ in doc $i$.
- Document $i$ originally represented by $i$-th row of $M$, a vector in $\mathbb{R}^V$.
- Can instead use $\theta_i \in \mathbb{R}^k$, a more concise "semantic" representation.
## 2 Non-neg Matrix Factorization
*Goal*: Again, approximate a matrix $M\in \mathbb{R}_{}^{p\times q}$ by: $\hat{M}=\underset{p\times k}{\underbrace{ U }}\underset{k\times q}{\underbrace{ W }}$
- Such that $U,W$ have non-negatives entries
- And $\sum_{i,j}(M_{ij}-\hat{M}_{ij})$ is minimized
- This problem is *intractable*
## 3 Collaborative Filtering (Koren et,al. 2009)
*Recommender systems*: matching customers with products.
- Given: data on prior purchases/interests of users
- Recommend: further products of interest
- Prototypical example: Netflix.
***Collaborative filtering*** (a successful approach)
- Model dependencies between different products, and between different users.
- Can give reasonable recommendations to a relatively new user.
- Two strategies for collaborative filtering:
- 1) *Neighborhood methods*
- 2) *Latent factor methods*: Embed users and movies in low-dim space
### Latent factor methods
Generative movie, user embeddings using SVD. User ratings are assembled in a large matrix $M$:
![[Pasted image 20230503125239.png|200]]
- Not rated $=0$, otherwise scores 1-5. For $n$ users and $p$ movies, this has size $n \times p$.
- Most of the entries are unavailable, and we'd like to predict these.
- Idea: Get best low-rank approximation of $M$, and use to fill in the missing entries.
*Method*: Use the best rank-$k$ approximation $M\approx UW^\top$
![[Pasted image 20230503125521.png|500]]
- User i's rating of movie $j$ is approximated as $M_{i j} \approx u_i \cdot w_j$
- $U$ is the user embedding matrix, $W^\top$ is the movie embedding matrix.
### Weighted SVD
Given $p\times q$ matrix $M$ and a target rank $k$ and weights $\set{w_{ij}: \;1\leq i \leq p,\;1\leq j\leq q}$ .
- Find a rank-$k$ approximation $\hat{M}$ s.t. $\sum_{ij}w_{ij}(M_{ij}-\hat{M}_{ij})$ is minimized.