See also: [[Cramer-Rao Lower Bound]] [[Density of a Probability Distribution]]
# Fisher Information
Motivation: Daniel Shy's Paper on the [[Cramer-Rao Lower Bound]] for [[Compton Imaging]], following from study, needing additional information.
## What is Fisher Information?
Fisher Information is a way to measure the amount of information that an observable random variable $X$ carries about an unknown parameter $\theta$ of a distribution that models $X$. Formally, it can be the variance of the [[Score or Informant |score / informant]], (the covariance matrix) [^1].
$I(\theta) \equiv \text{Var}_\theta[z(X,\theta)] = -\mathbb{E}_\theta[z'(X,\theta)]
$
where $\text{Var}_\theta$ and $\mathbb{E}$ are the variance and the expectation with respect to $X \sim f(x|\theta)$.
[^1]: https://encyclopediaofmath.org/wiki/Information_matrix
Fisher Information was first shown to be useful with the asymptotic theory of [[MLEM for Compton Imaging | Maximum Likelihood Expectation ]], emphasized by Ronald Fisher (which was modelled after the work of [[Francis Ysidro Edgeworth]]).
## Fisher Information Matrix
A matrix of covariances between the gradients of a parametric model [^2]. In applications it is "a measure of the information the observed [$X$] has about the unknown [$\theta$]" [^3] .
[^2]: https://web.stanford.edu/class/archive/stats/stats200/stats200.1172/Lecture15.pdf
[^3]: [[@shyCramerRaoBound2022]]
$
I(\theta)_{ij} = \text{Cov}_\theta \left[\frac{\partial}{\partial \theta_i} \text{log} f(X|\theta), \frac{\partial}{\partial \theta_j} \text{log} f(X|\theta) \right] = -\mathbb{E}_\theta \left[\frac{\partial^2}{\partial \theta_i \partial \theta_j} \text{log} f(X|\theta) \right]
$