# High-dimensional Distributions
[[lec14_multi_gausssian.pdf]]
## 1 Multi-dimensional Gaussian
***Multi-dimensional Gaussian*** in $\mathbb{R}^d$ $N(\mu, \Sigma)$ generates points $X=\left(X_1, X_2, \ldots, X_d\right)\in \mathbb{R}_{}^{d}$.
$\begin{align*}
p(x)=\frac{1}{(2 \pi)^{d / 2}|\Sigma|^{1 / 2}} \exp \left(-\frac{1}{2}(x-\mu)^T \Sigma^{-1}(x-\mu)\right)
\end{align*}$
- *mean*: $\mu \in \mathbb{R}^d$, a vector of coordinate-wise means
$\begin{align*}
\mu_1=\mathbb{E} X_1,\quad \mu_2=\mathbb{E} X_2,\quad \cdots, \quad\mu_d=\mathbb{E} X_d
\end{align*}$
- *covariance matrix*: $\Sigma\in \mathbb{S}_{+} \subset \mathbb{R}_{}^{d \times d}$, contains all pairwise covariances
$\begin{align*}
\text{cov}(X_i,X_j)&= \Sigma_{ij}=\Sigma_{ji}, \;i\neq j\\
&= \mathbb{E}[x_ix_j]-\mu_i \mu_j\\
\text{var}(X_i)&= \Sigma_{ii}
\end{align*}$
### 1.1 Special Gaussian to general Gaussian
***Spherical Gaussian***: The $X_i$ are *independent* and all have the *same variance* $\sigma^2$.
$
\Sigma=\sigma^2 I_d=\operatorname{diag}\left(\sigma^2, \sigma^2, \ldots, \sigma^2\right) \quad \text { (diagonal elements } \sigma^2 \text {, rest zero) }
$
Each $X_i$ is an independent univariate Gaussian $N\left(\mu_i, \sigma^2\right)$ :
$\begin{align*}
p(x)=\frac{1}{(2 \pi)^{d / 2} \sigma^d} \exp \left(-\frac{\|x-\mu\|^2}{2 \sigma^2}\right)=\prod_{i=1}^d\left(\frac{1}{\sigma \sqrt{2 \pi}} e^{-\left(x_i-\mu_i\right)^2 / 2 \sigma^2}\right)
\end{align*}$
![[Pasted image 20230523144039.png|300]]
***Diagonal Gaussian***: The $X_i$ are independent, with variances $\sigma_i^2$. Thus
$\begin{align*}
\Sigma=\operatorname{diag}\left(\sigma_1^2, \ldots, \sigma_d^2\right) \quad \text { (off-diagonal elements zero) }
\end{align*}$
Each $X_i$ is an independent univariate Gaussian $N\left(\mu_i, \sigma_i^2\right)$ :
$
p(x)=\frac{1}{(2 \pi)^{d / 2} \sigma_1 \cdots \sigma_d} \exp \left(-\sum_{i=1}^d \frac{\left(x_i-\mu_i\right)^2}{2 \sigma_i^2}\right)
$
![[Pasted image 20230523144236.png|300]]
***General Gaussian***
$\begin{align*}
p(x)=\frac{1}{(2 \pi)^{d / 2}|\Sigma|^{1 / 2}} \exp \left(-\frac{1}{2}(x-\mu)^T \Sigma^{-1}(x-\mu)\right)
\end{align*}$
![[8_multi_gaussian 2023-05-23 14.43.10.excalidraw.svg|400]]
- Contours of equal density is the norm ball of *Mahalanobis distance*:
$\set{x: \;(x-\mu)^\top \Sigma^{-1}(x-\mu)=c}$
### 1.2 Properties of Gaussian
**Closure under linear transformation.**
*Lemma*. If $X \sim N(\mu, \Sigma)$ in $\mathbb{R}^d$ and $A$ is a $k \times d$ matrix, then $A X$ is a Gaussian in $\mathbb{R}^k$.
- Mean and covariance of $A X$ is
$\begin{align*}
\mathbb{E}[AX]&= A\mathbb{E}[X]=A \mu\\
\text{cov}(AX)&= \mathbb{E}\left[(A X)(A X)^{\top}\right]-(A \mu)(A \mu)^{\top} \\
&= \mathbb{E}\left[A X X^{\top} A^{\top}\right]-A \mu \mu^{\top} A^{\top}\\
&= A\Sigma A^\top\\
\therefore AX&\sim \boxed{N(Au, A\Sigma A^T)}
\end{align*}$
**Characteristic function**. The *characteristic function* of random vector $Z \in \mathbb{R}^d$ is the function $\phi: \mathbb{R}^d \rightarrow \mathbb{C}$, (this always exists)
$\begin{align*}
\phi(t)=\mathbb{E} e^{i t \cdot Z}=\mathbb{E} [\cos (t \cdot Z)]+i \mathbb{E}[\sin (t \cdot Z)]
\end{align*}$
- *Fourier inversion*: two random vectors have the same distribution iff they have the same characteristic function.
- Thus two distributions on $\mathbb{R}^d$ are equal iff all their one-dimensional projections are equal.
*Lemma*. A quick calculation shows that $N(\mu, \Sigma)$ has characteristic function
$
\phi(t)=\exp \left(i t \cdot \mu-\frac{1}{2} t^T \Sigma t\right) .
$
*Proof*. Any linear transformation of a Gaussian is Gaussian.
## 2 Motivation:
### 2.1 Modeling network of sensors using conditional multi-dim Gaussian
Large structures (bridges, skyscrapers) contain arrays of embedded sensors to warn of excessive strain or small cracks. Say there are $d$ sensors, each with a real-valued reading.
1. Model distribution of sensor readings under normal conditions. (to tell when something goes wrong)
2. Accommodate a moderate number of sensor failures.
*Goal*: Exploit correlations between sensors
- The *mean* and *covariance* is given by
$\begin{align*}
\mu=\underset{\text{Empirical mean}}{\underbrace{ \frac{1}{N}\sum\limits_{i=1}^Nx^{(i)}}},\quad \Sigma_{ij}=\underset{\text{Empirical variance}}{\underbrace{\frac{1}{N-1}\sum_{k=1}^N(X^{(k)}_i-\mu_i)(X^{(k)}_j-\mu_j)}}
\end{align*}$
- The distribution of sensor $X_i$ is $N\left(\mu_i, \Sigma_{i i}\right)$
- The joint distribution of a set of sensors $S\in[d]$ is $N(\mu_s,\Sigma_{SS})$
![[8_multi_gaussian 2023-05-23 16.07.50.excalidraw.svg|200]]
**Distribution under conditioning**: Suppose sensors $S$ fail. What's the conditional distribution $X_S$ given readings $x_T$ of the remaining sensors $T=[d]\backslash S$ ?
*Lemma*. The conditional distribution $X_S \mid X_T=x_T$ is Gaussian, with
- *Mean*: $\boxed{\mu_S+\Sigma_{S T} \Sigma_{T T}^{-1}\left(x_T-\mu_T\right)}$
- *Covariance*: $\boxed{\Sigma_{S S}-\underset{\text{PSD}}{\underbrace{ \Sigma_{S T} \Sigma_{T T}^{-1} \Sigma_{T S}}}}$. Note this doesn't depend on $x_T$.
![[8_multi_gaussian 2023-05-23 16.33.17.excalidraw.svg|300]]
### 2.2 Modeling county pollution levels using Gaussian process
Goal: Generate a 2D pollution map. At any given time, have readings $y \in \mathbb{R}$ at a few locations $x \in[0,1]^2$ . Use these to create a full pollution map for the county.
![[Pasted image 20230523171008.png|500]]
**Given readings $\left(x_1, y_1\right), \ldots,\left(x_n, y_n\right)$, infer $f:[0,1]^2 \rightarrow \mathbb{R}$**.
- Each location adds 1 dimension to the multi-Gaussian distribution
- Can therefore have infinite-dimensional gaussian if we want to estimate readings for every location on the map.
## 3 Gaussian Processes (infinite-dim Gaussian)
Let $\mathcal{X}$ be an input space, e.g. $[0,1]^2$. A *Gaussian in infinite dimension* is a distribution over all functions $f: \mathcal{X} \rightarrow \mathbb{R}$.
***Gaussian process (GP)*** on $\mathcal{X}$ is a collection of random variables indexed by $\mathcal{X}$ such that any finite subset of them has a Gaussian distribution.
- Pick any $x_1, \ldots, x_s \in \mathcal{X}$. If $f$ is sampled from a GP then $\left(f\left(x_1\right), \ldots, f\left(x_s\right)\right)$ has a Gaussian distribution.
- A Gaussian process is specified by
- *Mean function* $m: \mathcal{X} \rightarrow \mathbb{R}$
- *Covariance function* $k: \mathcal{X} \times \mathcal{X} \rightarrow \mathbb{R}$
- If $f \sim G P(m, k)$ then for any finite subset $S \subset \mathcal{X}$, we have $f_S \sim N\left(m_S, k_{S S}\right)$. Here $f_S$ is a shorthand for $(f(x): x \in S)$.
$\begin{align*}
\begin{bmatrix}f(x_1)\\\vdots\\f(x_s)\end{bmatrix}\sim N\left(\begin{bmatrix}m(x_1)\\\vdots\\m(x_n)\end{bmatrix},\begin{bmatrix}k(x_1,x_1)&k(x_1,x_2)&\cdots&k(x_1,x_s)\\\vdots&\vdots&\ddots&\vdots\\k(x_s,x_1)&\cdots&\cdots &k(x_s,x_s)\end{bmatrix}\right)
\end{align*}$
### 3.1 Examples
**Estimating a distribution over functions on the line**.
$\begin{array}{c|c|c}
\mathcal{X}=\mathbb{R}
& m(x) \equiv 0
& k\left(x, x^{\prime}\right)=\exp \left(-\left(x-x^{\prime}\right)^2\right)
\end{array}$
- Pick a few point in $\mathcal{X}$, e.g. ${1,2,4}$, the GP is then:
$\begin{align*}
\left[\begin{array}{l}
f(1) \\
f(2) \\
f(4)
\end{array}\right] \sim N\left(\left[\begin{array}{l}
0 \\
0 \\
0
\end{array}\right],\left[\begin{array}{ccc}
1 & e^{-1} & e^{-9} \\
e^{-1} & 1 & e^{-4} \\
e^{-9} & e^{-4} & 1
\end{array}\right]\right)
\end{align*}$
![[Pasted image 20230523172525.png|250]]
**Sampling a function from a GP**.
$\begin{array}{c|c|c}
\mathcal{X}=\mathbb{R}
& m(x) \equiv 0
& k\left(x, x^{\prime}\right)=\exp \left(-\left(x-x^{\prime}\right)^2\right)
\end{array}$
For $t=1,2,3, \ldots$ :
- Pick some $x_t \in \mathcal{X}$
- The joint distribution $\left(f\left(x_1\right), \ldots, f\left(x_t\right)\right)$ is Gaussian Thus so is the conditional distribution $f\left(x_t\right) \mid f\left(x_1\right), \ldots, f\left(x_{t-1}\right)$
- Sample $f\left(x_t\right)$ from this conditional distribution
*Conditioning formula* for $S \subset \mathcal{X}$ (recall [[#2.1 Modeling network of sensors using conditional multi-dim Gaussian|conditional Gaussian]])
$\begin{align*}
f(x) \mid f_S \sim N\left(m(x)+k(x, S) k(S, S)^{-1} f_S,\;\; k(x, x)-k(x, S) k(S, S)^{-1} k(S, x)\right) .
\end{align*}$
![[8_multi_gaussian 2023-05-23 17.48.04.excalidraw.svg|700]]