The [[sampling distribution]] of the least squares estimator $\hat \beta$ is defined as the probability distribution of $\hat \beta$ treated as a random variable across random samples of size $n$. With the sampling distribution, we can compute statistical significance and confidence intervals.
$\hat \beta$ is a [[linear combination of normal random variables]], and so also has a [[normal distribution]]
$\hat \beta \sim N(\vec \beta, \ \sigma^2(X^T X)^{-1})$
where the matrix $\sigma^2 (X^TX)^{-1}$ holds, on the diagonals, the [[variance]] of each parameter coefficient in the model and, on the off diagonal, the [[covariance]] of each pair of parameters. This is called the **variance-covariance matrix**.
To prove this, we calculate the expectation and variance of $\hat \beta$.
# expectation of $\hat \beta$
We can show that the expectation of $\hat \beta$ is equal to $\beta$. Thus $\hat \beta_0$ is an unbiased [[estimator]] of $\beta$.
By definition,
$E(\hat \beta) = E \Big [ (X^TX)^{-1} X^T \vec Y \Big ]$
Note that $(X^TX)^{-1}X^T$ is a constant as we are assuming that the $x_i