# Information Gain ## Information Structure ```mermaid graph TB IG[Information Gain] --> |Measures| KL[KL Divergence] IG --> |Quantifies| UR[Uncertainty Reduction] IG --> |Drives| EP[Epistemic Value] KL --> |Between| PB[Prior/Posterior] KL --> |Computes| DV[Divergence Value] UR --> |Through| OB[Observation] UR --> |Updates| BE[Beliefs] EP --> |Guides| AS[Action Selection] EP --> |Balances| EX[Exploration] classDef concept fill:#f9f,stroke:#333,stroke-width:2px classDef process fill:#bbf,stroke:#333,stroke-width:2px classDef outcome fill:#bfb,stroke:#333,stroke-width:2px class IG concept class KL,UR,EP process class PB,OB,AS,EX outcome ``` ## Belief Update Flow ```mermaid graph LR subgraph Prior Knowledge P[Prior P(s)] --> |Initial| H1[H(s)] H1 --> |Uncertainty| U1[Prior Uncertainty] end subgraph Observation O[Observation o] --> |Evidence| L[Likelihood P(o|s)] L --> |Bayes| Po[Posterior P(s|o)] end subgraph Information Po --> |Updated| H2[H(s|o)] H2 --> |Reduction| U2[Posterior Uncertainty] U1 --> |Difference| IG[Information Gain] U2 --> |Difference| IG end classDef state fill:#f9f,stroke:#333,stroke-width:2px classDef process fill:#bbf,stroke:#333,stroke-width:2px classDef measure fill:#bfb,stroke:#333,stroke-width:2px class P,O,Po state class L,H1,H2 process class U1,U2,IG measure ``` ## Computation Dynamics ```mermaid stateDiagram-v2 [*] --> InitialState state "Information Processing" as IP { InitialState --> BeliefState BeliefState --> ObservationState ObservationState --> UpdatedState state BeliefState { PriorBelief --> EntropyComputation EntropyComputation --> InitialUncertainty } state ObservationState { Observation --> LikelihoodEvaluation LikelihoodEvaluation --> PosteriorComputation } state UpdatedState { PosteriorBelief --> GainComputation GainComputation --> InformationValue } } IP --> [*]: InformationProcessed ``` ## Value Computation ```mermaid graph TD subgraph Distributions P[Prior] --> |KL| D[Divergence] Q[Posterior] --> |KL| D end subgraph Components D --> |Expected| EV[Expected Value] D --> |Actual| AV[Actual Value] end subgraph Integration EV --> |Weight| W[Weighted Sum] AV --> |Weight| W W --> |Total| IG[Information Gain] end classDef dist fill:#f9f,stroke:#333,stroke-width:2px classDef comp fill:#bbf,stroke:#333,stroke-width:2px classDef value fill:#bfb,stroke:#333,stroke-width:2px class P,Q dist class D,EV,AV comp class W,IG value ``` ## Analysis Methods ```mermaid graph TD subgraph Input P[Prior] --> A[Analysis] O[Observation] --> A Q[Posterior] --> A end subgraph Computation A --> |Compute| IG[Information Gain] A --> |Measure| EN[Entropy Change] A --> |Evaluate| KL[KL Divergence] end subgraph Metrics IG --> |Track| EF[Effectiveness] EN --> |Monitor| PR[Progress] KL --> |Assess| CV[Convergence] end subgraph Output EF --> |Report| R[Results] PR --> |Visualize| V[Visualization] CV --> |Update| S[Strategy] end classDef input fill:#f9f,stroke:#333,stroke-width:2px classDef process fill:#bbf,stroke:#333,stroke-width:2px classDef output fill:#bfb,stroke:#333,stroke-width:2px class P,O,Q input class IG,EN,KL process class R,V,S output ``` ## Mathematical Formulation Information gain is defined as the KL divergence between posterior and prior distributions: $IG(s;o) = D_{KL}[P(s|o)\|P(s)] = \mathbb{E}_{P(s|o)}[\ln P(s|o) - \ln P(s)]$ Links to: - [[kl_divergence]] - Divergence measure - [[bayesian_inference]] - Posterior computation - [[entropy]] - Uncertainty measure ## Implementation ```python def compute_information_gain( prior: np.ndarray, # Prior distribution P(s) posterior: np.ndarray, # Posterior distribution P(s|o) method: str = 'kl' # Computation method ) -> float: """Compute information gain between distributions. Args: prior: Prior probability distribution posterior: Posterior probability distribution method: Method to use ('kl' or 'entropy') Returns: Information gain value """ if method == 'kl': return np.sum(posterior * (np.log(posterior + 1e-10) - np.log(prior + 1e-10))) elif method == 'entropy': prior_entropy = -np.sum(prior * np.log(prior + 1e-10)) post_entropy = -np.sum(posterior * np.log(posterior + 1e-10)) return prior_entropy - post_entropy else: raise ValueError(f"Unknown method: {method}") ``` Links to: - [[numerical_methods]] - Implementation details - [[probability_distributions]] - Distribution handling - [[numerical_stability]] - Stability considerations ## Applications ### Active Inference - Drives exploration in [[epistemic_value]] - Guides [[action_selection]] - Measures [[belief_updating]] effectiveness - Links to: - [[exploration_exploitation]] - Balance - [[policy_selection]] - Action choice - [[efe_components]] - Value components ### Information Theory - Quantifies [[mutual_information]] - Measures [[entropy]] reduction - Evaluates [[channel_capacity]] - Links to: - [[information_theory_axioms]] - Foundations - [[information_geometry]] - Geometric view - [[information_flow]] - Dynamic aspects ## Properties 1. **Non-negativity** - Always ≥ 0 by [[jensen_inequality]] - = 0 iff distributions identical - Links to [[information_bounds]] 2. **Asymmetry** - Not symmetric in arguments - Order matters (posterior vs prior) - Links to [[divergence_measures]] 3. **Additivity** - Chain rule decomposition - Sequential information gains - Links to [[chain_rule_probability]] ## Related Concepts - [[uncertainty_resolution]] - Resolution process - [[active_learning]] - Learning strategy - [[optimal_experiment_design]] - Design theory - [[information_theory]] - Theoretical basis - [[bayesian_inference]] - Statistical framework ## References - [[cover_thomas_2006]] - Information Theory - [[mackay_2003]] - Information Theory - [[friston_2017]] - Active Inference - [[lindley_1956]] - Information Measures