Let $X, f(x, \theta)$ be our [[Bayesian]] statistical model, and let $X^*$ be a future observation that we would like to predict. Assume $X^*$ is independent of $X$ (e.g., we don't have a time series).
Bayesians will derive a distribution of possible values for $X^*$. This distribution will take into account the variability in (1) the data generating process and (2) in $\theta$. Note the posterior predictive distribution will not necessarily be the same distribution as the posterior distribution (one is over the data and one is over the parameter). For example, the posterior predictive distribution for the number of coin tosses that come up heads will be binomial, whereas probability of fairness $p$ may be a beta posterior.
The posterior predictive distribution $f(x^*|x)$ is the integral over the likelihood times the posterior, giving the distribution of future observations, $X^*$, given the observed data.
$
f(x^* \, | \, \mathbf{x}) = \int f(x^* \, | \, \theta)\pi(\theta \, | \, \mathbf{x}) \partial \theta.
$
Typically this is a high dimensional integral that is difficult if not impossible to compute analytically. [[Markov Chain Monte Carlo]] is used in practice to estimate the posterior predictive distribution numerically.
- For a large number of repetitions, first sample from the posterior distribution for $\theta$,
- then use that to generate a prediction $x^*$ (typically based on the distribution that describes the data generation process, e.g., binomial for $n$ Bernoulli trials).
The values of $\mathbf{x}^*$ will follow the posterior predictive distribution.