Let $\theta_1, \dots, \theta_J$ be a set of parameters and $c_1, \dots, c_J$ be known constants. Then $\gamma = c^T \theta = \sum_{j=1}^J c_j \theta_j$ is a contrast if $\sum c_j = 0$. Consider an experiment of three groups where we want to determine if group 2 has a higher mean than the other two groups. To formulate this as a hypothesis, we can state the null hypothesis as the mean of group 2 is equal to the average mean of groups 1 and 3. $\begin{align} H_0: \mu_2 = \frac12 (\mu_1 + \mu_3) && H_1: \mu_2 > \frac12 (\mu_1 + \mu_3) \end{align} $ We can rewrite the null hypothesis on one side in the correct order as $-\frac12 \mu_1 + \mu_2 -\frac12 \mu_3 = 0$ and use the sample mean $\bar Y$ to estimate the true parameter $\mu$. Thus the estimator for the unknown contrast is $\hat \gamma = -\frac12 \bar Y_1 + \bar Y_2 -\frac12 Y_3$ This is a true contrast because $\sum c_j = -\frac12 + 1 -\frac12 = 0$. We can show that the contrast estimator, since it results from a [[linear combination of normal random variables]], has the [[normal distribution]] under the null hypothesis. $\hat \gamma \overset{H_0}{\sim} N(\mu_\gamma = \gamma, \sigma^2_\gamma)$ To test hypotheses using a contrast, the relevant test statistic $t$ will have a [[t-distribution]] with $n-J$ [[degrees of freedom]]. $t= \frac{\hat \gamma - \gamma}{\hat \sigma_\gamma} \sim t_{n-J}$ To estimate $\hat \sigma_{\gamma}$, note that $Var(\bar Y_j) = \sigma^2/n_j$ and thus, for this example, $\begin{align} \sigma^2_{\hat \alpha} = Var(\hat \alpha) &= Var(\frac{-1}{2}\bar Y_1 + \bar Y_2 + \frac{-1}{2}\bar Y_3) \\ &= \frac{1}{4} Var(\bar Y_1) + Var(\bar Y_2) + \frac{1}{4} Var(\bar Y_3) \\ &= \frac{1}{4} \frac{\sigma^2}{n_1} + \frac{\sigma^2}{n_2} + \frac{1}{4} \frac{\sigma^2}{n_3} \\ &= \sigma^2 (\frac{1}{4 n_1} + \frac{1}{n_2} + \frac{1}{4 n_3}) \end{align}$ where $\sigma^2 = \hat \sigma^2 = \frac{\text{RSS}}{n-J}$. The above example can be coded up in [[R]] to further illustrate the use of contrasts. ```R c = c(-0.5, 1, -0.5) #constants in the contrast specified above b = coef(lmod); #ANOVA regression model coefficients n = length(resid(lmod)) #total number of espressos brewed n_method = with(esp, c(length(foamIndx[method == "Bar Machine"]), length(foamIndx[method == "Hyper-Espresso Method"]), length(foamIndx[method == "I-Espresso System"]))) #vector with number of espressos brewed by each method J = length(unique(esp$method)); J ybar = c(b[1], b[1] + b[2],b[1] + b[3]); #vector of sample means of foam index for each method ybar[2] - 0.5*(ybar[1] + ybar[3]) rss = sum(resid(lmod)^2); #residual sum of squares sighat = sqrt(rss/(n-J)); #estimate of sigma^2 hat gammahat = ybar[2] - 0.5*(ybar[1] + ybar[3]); cat("The estimate of the contrast is is", as.numeric(gammahat)) se = (sqrt(sighat^2*(1/(4*n_method[1]) + 1/(n_method[2]) + 1/(4*n_method[3])))) #standard error of gamma hat z = gammahat/se # test statistic pval = 1-pnorm(z) #p-value for upper tailed test cat(". The test statistic is ", as.numeric(z), ". The p value for the test is", pval,".") ``` Luckily R has a function to compute contrasts. ```R #install.packages("multcomp") library(multcomp) contrast = glht(lmod, linfct = mcp(method = c(-0.5, 1, -0.5))) summary(contrast) ``` # orthogonal contrasts Contrasts must be orthogonal for all planned comparisons. To show that contrasts are orthogonal, list the coefficients in consecutive order and show that the dot product is equal to zero. Orthogonal contrasts prevent multiple hypotheses from overlapping in their conclusions. For example, it precludes the comparison of three means like $\mu_1 = \mu_2$, $\mu_2 = \mu_3$, and $\mu_1 = \mu_3$, because if one hypothesis were false it would imply that at least one other hypothesis is also false. (The Tukey method can be used in this case).