# Probabilistic Models ## Overview Probabilistic models are mathematical frameworks that describe systems and phenomena using probability distributions to represent uncertainty and variability. These models form the foundation of [[bayesian_inference|Bayesian inference]], [[statistical_learning|statistical learning]], and [[uncertainty_quantification|uncertainty quantification]]. ## Core Components ### 1. Random Variables - [[random_variables|Random variables]] represent uncertain quantities - Can be discrete or continuous - Characterized by probability distributions ### 2. Dependencies - [[conditional_probability|Conditional probabilities]] between variables - [[probabilistic_relationships|Probabilistic relationships]] - [[causal_structure|Causal structures]] ### 3. Parameter Space - Model parameters defining distributions - [[parameter_estimation|Parameter estimation]] methods - [[hyperparameters|Hyperparameters]] for hierarchical models ## Types of Models ### 1. [[directed_graphical_models|Directed Graphical Models]] - Bayesian networks - Hidden Markov models - State space models ### 2. [[undirected_graphical_models|Undirected Graphical Models]] - Markov random fields - Conditional random fields - Boltzmann machines ### 3. [[hierarchical_models|Hierarchical Models]] - Multilevel modeling - Nested structures - Parameter sharing ## Mathematical Framework ### 1. Probabilistic Foundation **Joint Distribution Factorization:** For directed acyclic graphs: ```math p(\mathbf{x}) = \prod_{i=1}^n p(x_i | \text{pa}(x_i)) ``` For undirected graphs (Markov Random Fields): ```math p(\mathbf{x}) = \frac{1}{Z} \prod_{C} \psi_C(\mathbf{x}_C) ``` where: - $\text{pa}(x_i)$ are parents of variable $x_i$ - $\psi_C(\mathbf{x}_C)$ are clique potentials - $Z$ is the partition function ### 2. Parametric Models **Likelihood Function:** ```math L(\boldsymbol{\theta}; \mathbf{x}) = p(\mathbf{x} | \boldsymbol{\theta}) = \prod_{i=1}^n p(x_i | \boldsymbol{\theta}) ``` **Log-Likelihood:** ```math \ell(\boldsymbol{\theta}) = \log L(\boldsymbol{\theta}; \mathbf{x}) = \sum_{i=1}^n \log p(x_i | \boldsymbol{\theta}) ``` **Prior Distribution:** ```math p(\boldsymbol{\theta}) = \prod_{j} p(\theta_j) ``` **Posterior Distribution:** ```math p(\boldsymbol{\theta} | \mathbf{x}) = \frac{p(\mathbf{x} | \boldsymbol{\theta}) p(\boldsymbol{\theta})}{p(\mathbf{x})} ``` ### 3. Exponential Family Models Many probabilistic models belong to exponential families: ```math p(x | \boldsymbol{\theta}) = h(x) \exp\left(\boldsymbol{\theta}^T \mathbf{t}(x) - A(\boldsymbol{\theta})\right) ``` where: - $\mathbf{t}(x)$ are sufficient statistics - $A(\boldsymbol{\theta})$ is the log-partition function - $h(x)$ is the base measure **Properties:** - Natural parameters: $\boldsymbol{\theta}$ - Mean parameters: $\boldsymbol{\mu} = \nabla A(\boldsymbol{\theta})$ - Variance: $\text{Var}[\mathbf{t}(X)] = \nabla^2 A(\boldsymbol{\theta})$ Related: [[exponential_families]], [[natural_parameters]], [[sufficient_statistics]] ### 4. Hierarchical Structure **Multi-level Models:** ```math \begin{aligned} \text{Level 1: } & y_i | \boldsymbol{\theta}_i \sim p(y_i | \boldsymbol{\theta}_i) \\ \text{Level 2: } & \boldsymbol{\theta}_i | \boldsymbol{\phi} \sim p(\boldsymbol{\theta}_i | \boldsymbol{\phi}) \\ \text{Level 3: } & \boldsymbol{\phi} \sim p(\boldsymbol{\phi}) \end{aligned} ``` **Advantages:** - Parameter sharing across groups - Regularization through hierarchical priors - Uncertainty propagation across levels Related: [[hierarchical_models]], [[multilevel_modeling]], [[mixed_effects_models]] ## Applications ### 1. Scientific Modeling - Physical systems - Biological processes - Chemical reactions ### 2. Machine Learning - [[bayesian_neural_networks|Bayesian neural networks]] - [[probabilistic_programming|Probabilistic programming]] - [[gaussian_processes|Gaussian processes]] ### 3. Decision Making - [[active_inference|Active inference]] - [[reinforcement_learning|Reinforcement learning]] - [[optimal_control|Optimal control]] ## Implementation ### 1. Software Frameworks - [[probabilistic_programming_languages|Probabilistic programming languages]] - [[statistical_computing|Statistical computing]] packages - [[inference_engines|Inference engines]] ### 2. Computational Methods - [[monte_carlo_methods|Monte Carlo methods]] - [[variational_inference|Variational inference]] - [[message_passing|Message passing algorithms]] ## Best Practices ### 1. Model Selection - [[model_complexity|Model complexity]] considerations - [[cross_validation|Cross-validation]] - [[information_criteria|Information criteria]] ### 2. Validation - [[posterior_predictive_checks|Posterior predictive checks]] - [[sensitivity_analysis|Sensitivity analysis]] - [[robustness_testing|Robustness testing]] ## References 1. Bishop, C. M. (2006). Pattern Recognition and Machine Learning 1. Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective 1. Gelman, A., et al. (2013). Bayesian Data Analysis