Variance measures the spread of the data from the mean. The term is squared so that each distance is positive. $V(X) = E[(X - u_x)^2]$ > [!NOTE] > You might wonder why we square the deviation rather than use the absolute value? Solving for absolute value is a nasty problem with a corner in most cases, and this is simply easier to work with. As a bonus (potentially), squaring the difference magnifies the impact of extreme outliers. The computational formula for V(X) is $V(X) = E(X^2) - (E(X))^2$ The first term $E(X^2)$ is called the "second moment" of the function $E$ (see [[moments]]). ## Transformations When factoring a constant scalar of variance, you must square it. Shifting by a constant factor does not change the variance. $V(aX + c) = a^2 * V(X)$ When calculating the variance for a linear combination of random variables, you must square any scalars and add the scaled [[covariance]] $2abCov(X,Y)$. $V(aX + bY +c) = a^2 * V(X) + b^2 * V(Y) + 2abCov(X,Y)$ When $X$ and $Y$ are [[independent]], the covariance term is $0$ and the above simplifies to $V(aX + bY) = a^2 * V(X) + b^2 * V(Y)$ Linear combinations of variances are the sum of the variances of the contributing random variables and double their covariance. Note this is the special case where $a = b = 1$. $V(X + Y) = V(X) + V(Y) + 2Cov(X,Y)$ Importantly, the variance of the difference between $X$ and $Y$ is still the sum of the contributing variances (less twice their covariance for non-independent events)! $\displaylines{\begin{align} V(X - Y) &= V(X) + (-1)^2V(Y) +2(-1)Cov(X,Y) \\ &= V(X) + V(Y) -2Cov(X,Y) \end{align}}$