>[!info] MAP Estimator
>Principle in [[Bayesian Statistics - Inference and Posterior Approximation]] that yields** [[Statistic and Estimator|estimators]] of an unknown quantity** $\theta$ by **computing the mode (max density) of the the $\textcolor{red}{\text{posterior}}$** $\textcolor{red}{p(\theta | \mathcal{D})} = \frac{p(\mathcal{D}|\theta)p(\theta)}{p(\mathcal{D})} \propto p(\mathcal{D}|\theta)p(\theta),$ of [[Bayes Theorem]] using empirical data $\mathcal{D}$.
- In contrast to [[Maximum Likelihood Estimator|Maximum Likelihood estimators]], MAP also incorporates prior knowledge in form of $p(\theta)$
- ML equals MAP if prior is uniform
---
#### General Procedure
- **Gaussians** Using the log-trick (maximizing the log yields the same result) for the right side in above equation yields $\begin{align}J_{MAP} &= \log(\prod^{N}_{i=1}p(y_i|\mathbf{x}_i,\theta))+\log(p(\theta))\\ &\propto \frac{1}{2\sigma^2}(\mathbf{y}- \Phi \theta)^T(\mathbf{y}- \Phi \theta)+\frac{1}{2}\theta^T\mathbf{W}\theta.\end{align}$
- first term is prediction loss
- second term is regularization, e.g. additional constraints
- Using [[Linear Least Squares and Ridge Regression]], this yields $\theta_{MAP} = \arg \max_\theta J_{MAP} = (\Phi^T\Phi+\sigma^2\mathbf{W})^{-1}\Phi^T\mathbf{y}.$