Variance measures the spread of the data from the mean. The term is squared so that each distance is positive.
$V(X) = E[(X - u_x)^2]$
> [!NOTE]
> You might wonder why we square the deviation rather than use the absolute value? Solving for absolute value is a nasty problem with a corner in most cases, and this is simply easier to work with. As a bonus (potentially), squaring the difference magnifies the impact of extreme outliers.
The computational formula for V(X) is
$V(X) = E(X^2) - (E(X))^2$
The first term $E(X^2)$ is called the "second moment" of the function $E$ (see [[moments]]).
## Transformations
When factoring a constant scalar of variance, you must square it. Shifting by a constant factor does not change the variance.
$V(aX + c) = a^2 * V(X)$
When calculating the variance for a linear combination of random variables, you must square any scalars and add the scaled [[covariance]] $2abCov(X,Y)$.
$V(aX + bY +c) = a^2 * V(X) + b^2 * V(Y) + 2abCov(X,Y)$
When $X$ and $Y$ are [[independent]], the covariance term is $0$ and the above simplifies to
$V(aX + bY) = a^2 * V(X) + b^2 * V(Y)$
Linear combinations of variances are the sum of the variances of the contributing random variables and double their covariance. Note this is the special case where $a = b = 1$.
$V(X + Y) = V(X) + V(Y) + 2Cov(X,Y)$
Importantly, the variance of the difference between $X$ and $Y$ is still the sum of the contributing variances (less twice their covariance for non-independent events)!
$\displaylines{\begin{align}
V(X - Y) &= V(X) + (-1)^2V(Y) +2(-1)Cov(X,Y) \\
&= V(X) + V(Y) -2Cov(X,Y)
\end{align}}$