See also: [[Cramer-Rao Lower Bound]] [[Density of a Probability Distribution]] # Fisher Information Motivation: Daniel Shy's Paper on the [[Cramer-Rao Lower Bound]] for [[Compton Imaging]], following from study, needing additional information. ## What is Fisher Information? Fisher Information is a way to measure the amount of information that an observable random variable $X$ carries about an unknown parameter $\theta$ of a distribution that models $X$. Formally, it can be the variance of the [[Score or Informant |score / informant]], (the covariance matrix) [^1]. $I(\theta) \equiv \text{Var}_\theta[z(X,\theta)] = -\mathbb{E}_\theta[z'(X,\theta)] $ where $\text{Var}_\theta$ and $\mathbb{E}$ are the variance and the expectation with respect to $X \sim f(x|\theta)$. [^1]: https://encyclopediaofmath.org/wiki/Information_matrix Fisher Information was first shown to be useful with the asymptotic theory of [[MLEM for Compton Imaging | Maximum Likelihood Expectation ]], emphasized by Ronald Fisher (which was modelled after the work of [[Francis Ysidro Edgeworth]]). ## Fisher Information Matrix A matrix of covariances between the gradients of a parametric model [^2]. In applications it is "a measure of the information the observed [$X$] has about the unknown [$\theta$]" [^3] . [^2]: https://web.stanford.edu/class/archive/stats/stats200/stats200.1172/Lecture15.pdf [^3]: [[@shyCramerRaoBound2022]] $ I(\theta)_{ij} = \text{Cov}_\theta \left[\frac{\partial}{\partial \theta_i} \text{log} f(X|\theta), \frac{\partial}{\partial \theta_j} \text{log} f(X|\theta) \right] = -\mathbb{E}_\theta \left[\frac{\partial^2}{\partial \theta_i \partial \theta_j} \text{log} f(X|\theta) \right] $