# High-dimensional Distributions [[lec14_multi_gausssian.pdf]] ## 1 Multi-dimensional Gaussian ***Multi-dimensional Gaussian*** in $\mathbb{R}^d$ $N(\mu, \Sigma)$ generates points $X=\left(X_1, X_2, \ldots, X_d\right)\in \mathbb{R}_{}^{d}$. $\begin{align*} p(x)=\frac{1}{(2 \pi)^{d / 2}|\Sigma|^{1 / 2}} \exp \left(-\frac{1}{2}(x-\mu)^T \Sigma^{-1}(x-\mu)\right) \end{align*}$ - *mean*: $\mu \in \mathbb{R}^d$, a vector of coordinate-wise means $\begin{align*} \mu_1=\mathbb{E} X_1,\quad \mu_2=\mathbb{E} X_2,\quad \cdots, \quad\mu_d=\mathbb{E} X_d \end{align*}$ - *covariance matrix*: $\Sigma\in \mathbb{S}_{+} \subset \mathbb{R}_{}^{d \times d}$, contains all pairwise covariances $\begin{align*} \text{cov}(X_i,X_j)&= \Sigma_{ij}=\Sigma_{ji}, \;i\neq j\\ &= \mathbb{E}[x_ix_j]-\mu_i \mu_j\\ \text{var}(X_i)&= \Sigma_{ii} \end{align*}$ ### 1.1 Special Gaussian to general Gaussian ***Spherical Gaussian***: The $X_i$ are *independent* and all have the *same variance* $\sigma^2$. $ \Sigma=\sigma^2 I_d=\operatorname{diag}\left(\sigma^2, \sigma^2, \ldots, \sigma^2\right) \quad \text { (diagonal elements } \sigma^2 \text {, rest zero) } $ Each $X_i$ is an independent univariate Gaussian $N\left(\mu_i, \sigma^2\right)$ : $\begin{align*} p(x)=\frac{1}{(2 \pi)^{d / 2} \sigma^d} \exp \left(-\frac{\|x-\mu\|^2}{2 \sigma^2}\right)=\prod_{i=1}^d\left(\frac{1}{\sigma \sqrt{2 \pi}} e^{-\left(x_i-\mu_i\right)^2 / 2 \sigma^2}\right) \end{align*}$ ![[Pasted image 20230523144039.png|300]] ***Diagonal Gaussian***: The $X_i$ are independent, with variances $\sigma_i^2$. Thus $\begin{align*} \Sigma=\operatorname{diag}\left(\sigma_1^2, \ldots, \sigma_d^2\right) \quad \text { (off-diagonal elements zero) } \end{align*}$ Each $X_i$ is an independent univariate Gaussian $N\left(\mu_i, \sigma_i^2\right)$ : $ p(x)=\frac{1}{(2 \pi)^{d / 2} \sigma_1 \cdots \sigma_d} \exp \left(-\sum_{i=1}^d \frac{\left(x_i-\mu_i\right)^2}{2 \sigma_i^2}\right) $ ![[Pasted image 20230523144236.png|300]] ***General Gaussian*** $\begin{align*} p(x)=\frac{1}{(2 \pi)^{d / 2}|\Sigma|^{1 / 2}} \exp \left(-\frac{1}{2}(x-\mu)^T \Sigma^{-1}(x-\mu)\right) \end{align*}$ ![[8_multi_gaussian 2023-05-23 14.43.10.excalidraw.svg|400]] - Contours of equal density is the norm ball of *Mahalanobis distance*: $\set{x: \;(x-\mu)^\top \Sigma^{-1}(x-\mu)=c}$ ### 1.2 Properties of Gaussian **Closure under linear transformation.** *Lemma*. If $X \sim N(\mu, \Sigma)$ in $\mathbb{R}^d$ and $A$ is a $k \times d$ matrix, then $A X$ is a Gaussian in $\mathbb{R}^k$. - Mean and covariance of $A X$ is $\begin{align*} \mathbb{E}[AX]&= A\mathbb{E}[X]=A \mu\\ \text{cov}(AX)&= \mathbb{E}\left[(A X)(A X)^{\top}\right]-(A \mu)(A \mu)^{\top} \\ &= \mathbb{E}\left[A X X^{\top} A^{\top}\right]-A \mu \mu^{\top} A^{\top}\\ &= A\Sigma A^\top\\ \therefore AX&\sim \boxed{N(Au, A\Sigma A^T)} \end{align*}$ **Characteristic function**. The *characteristic function* of random vector $Z \in \mathbb{R}^d$ is the function $\phi: \mathbb{R}^d \rightarrow \mathbb{C}$, (this always exists) $\begin{align*} \phi(t)=\mathbb{E} e^{i t \cdot Z}=\mathbb{E} [\cos (t \cdot Z)]+i \mathbb{E}[\sin (t \cdot Z)] \end{align*}$ - *Fourier inversion*: two random vectors have the same distribution iff they have the same characteristic function. - Thus two distributions on $\mathbb{R}^d$ are equal iff all their one-dimensional projections are equal. *Lemma*. A quick calculation shows that $N(\mu, \Sigma)$ has characteristic function $ \phi(t)=\exp \left(i t \cdot \mu-\frac{1}{2} t^T \Sigma t\right) . $ *Proof*. Any linear transformation of a Gaussian is Gaussian. ## 2 Motivation: ### 2.1 Modeling network of sensors using conditional multi-dim Gaussian Large structures (bridges, skyscrapers) contain arrays of embedded sensors to warn of excessive strain or small cracks. Say there are $d$ sensors, each with a real-valued reading. 1. Model distribution of sensor readings under normal conditions. (to tell when something goes wrong) 2. Accommodate a moderate number of sensor failures. *Goal*: Exploit correlations between sensors - The *mean* and *covariance* is given by $\begin{align*} \mu=\underset{\text{Empirical mean}}{\underbrace{ \frac{1}{N}\sum\limits_{i=1}^Nx^{(i)}}},\quad \Sigma_{ij}=\underset{\text{Empirical variance}}{\underbrace{\frac{1}{N-1}\sum_{k=1}^N(X^{(k)}_i-\mu_i)(X^{(k)}_j-\mu_j)}} \end{align*}$ - The distribution of sensor $X_i$ is $N\left(\mu_i, \Sigma_{i i}\right)$ - The joint distribution of a set of sensors $S\in[d]$ is $N(\mu_s,\Sigma_{SS})$ ![[8_multi_gaussian 2023-05-23 16.07.50.excalidraw.svg|200]] **Distribution under conditioning**: Suppose sensors $S$ fail. What's the conditional distribution $X_S$ given readings $x_T$ of the remaining sensors $T=[d]\backslash S$ ? *Lemma*. The conditional distribution $X_S \mid X_T=x_T$ is Gaussian, with - *Mean*: $\boxed{\mu_S+\Sigma_{S T} \Sigma_{T T}^{-1}\left(x_T-\mu_T\right)}$ - *Covariance*: $\boxed{\Sigma_{S S}-\underset{\text{PSD}}{\underbrace{ \Sigma_{S T} \Sigma_{T T}^{-1} \Sigma_{T S}}}}$. Note this doesn't depend on $x_T$. ![[8_multi_gaussian 2023-05-23 16.33.17.excalidraw.svg|300]] ### 2.2 Modeling county pollution levels using Gaussian process Goal: Generate a 2D pollution map. At any given time, have readings $y \in \mathbb{R}$ at a few locations $x \in[0,1]^2$ . Use these to create a full pollution map for the county. ![[Pasted image 20230523171008.png|500]] **Given readings $\left(x_1, y_1\right), \ldots,\left(x_n, y_n\right)$, infer $f:[0,1]^2 \rightarrow \mathbb{R}$**. - Each location adds 1 dimension to the multi-Gaussian distribution - Can therefore have infinite-dimensional gaussian if we want to estimate readings for every location on the map. ## 3 Gaussian Processes (infinite-dim Gaussian) Let $\mathcal{X}$ be an input space, e.g. $[0,1]^2$. A *Gaussian in infinite dimension* is a distribution over all functions $f: \mathcal{X} \rightarrow \mathbb{R}$. ***Gaussian process (GP)*** on $\mathcal{X}$ is a collection of random variables indexed by $\mathcal{X}$ such that any finite subset of them has a Gaussian distribution. - Pick any $x_1, \ldots, x_s \in \mathcal{X}$. If $f$ is sampled from a GP then $\left(f\left(x_1\right), \ldots, f\left(x_s\right)\right)$ has a Gaussian distribution. - A Gaussian process is specified by - *Mean function* $m: \mathcal{X} \rightarrow \mathbb{R}$ - *Covariance function* $k: \mathcal{X} \times \mathcal{X} \rightarrow \mathbb{R}$ - If $f \sim G P(m, k)$ then for any finite subset $S \subset \mathcal{X}$, we have $f_S \sim N\left(m_S, k_{S S}\right)$. Here $f_S$ is a shorthand for $(f(x): x \in S)$. $\begin{align*} \begin{bmatrix}f(x_1)\\\vdots\\f(x_s)\end{bmatrix}\sim N\left(\begin{bmatrix}m(x_1)\\\vdots\\m(x_n)\end{bmatrix},\begin{bmatrix}k(x_1,x_1)&k(x_1,x_2)&\cdots&k(x_1,x_s)\\\vdots&\vdots&\ddots&\vdots\\k(x_s,x_1)&\cdots&\cdots &k(x_s,x_s)\end{bmatrix}\right) \end{align*}$ ### 3.1 Examples **Estimating a distribution over functions on the line**. $\begin{array}{c|c|c} \mathcal{X}=\mathbb{R} & m(x) \equiv 0 & k\left(x, x^{\prime}\right)=\exp \left(-\left(x-x^{\prime}\right)^2\right) \end{array}$ - Pick a few point in $\mathcal{X}$, e.g. ${1,2,4}$, the GP is then: $\begin{align*} \left[\begin{array}{l} f(1) \\ f(2) \\ f(4) \end{array}\right] \sim N\left(\left[\begin{array}{l} 0 \\ 0 \\ 0 \end{array}\right],\left[\begin{array}{ccc} 1 & e^{-1} & e^{-9} \\ e^{-1} & 1 & e^{-4} \\ e^{-9} & e^{-4} & 1 \end{array}\right]\right) \end{align*}$ ![[Pasted image 20230523172525.png|250]] **Sampling a function from a GP**. $\begin{array}{c|c|c} \mathcal{X}=\mathbb{R} & m(x) \equiv 0 & k\left(x, x^{\prime}\right)=\exp \left(-\left(x-x^{\prime}\right)^2\right) \end{array}$ For $t=1,2,3, \ldots$ : - Pick some $x_t \in \mathcal{X}$ - The joint distribution $\left(f\left(x_1\right), \ldots, f\left(x_t\right)\right)$ is Gaussian Thus so is the conditional distribution $f\left(x_t\right) \mid f\left(x_1\right), \ldots, f\left(x_{t-1}\right)$ - Sample $f\left(x_t\right)$ from this conditional distribution *Conditioning formula* for $S \subset \mathcal{X}$ (recall [[#2.1 Modeling network of sensors using conditional multi-dim Gaussian|conditional Gaussian]]) $\begin{align*} f(x) \mid f_S \sim N\left(m(x)+k(x, S) k(S, S)^{-1} f_S,\;\; k(x, x)-k(x, S) k(S, S)^{-1} k(S, x)\right) . \end{align*}$ ![[8_multi_gaussian 2023-05-23 17.48.04.excalidraw.svg|700]]