## KL Divergence ### Definition For distributions p and q on X (with p absolutely continuous w.r.t. q): ```math D_{\mathrm{KL}}(p\,\|\,q) = \int_X p(x)\,\log\frac{p(x)}{q(x)}\,dx ``` ### Properties - Non-negativity: \(D_{\mathrm{KL}}(p\|q) \ge 0\), equality iff \(p=q\) a.e. - Asymmetry: \(D_{\mathrm{KL}}(p\|q) \neq D_{\mathrm{KL}}(q\|p)\) - Information projection: arises in variational optimization under constraints ### Roles in this repository - [[variational_free_energy]]: complexity term \(D_{\mathrm{KL}}\big(q(s)\,\|\,p(s)\big)\) - [[expected_free_energy]]: risk term \(D_{\mathrm{KL}}\big(q(o|\pi)\,\|\,p(o)\big)\); epistemic value as expected KL - [[information_theory]]: connects to entropy and mutual information ### Implementation notes - Use log-sum-exp stabilization in discrete sums - Guard zeros with small epsilons when computing logs ### See also - [[information_theory]] · [[variational_inference]] · [[variational_free_energy]] · [[expected_free_energy]]