Welch's t test - Aidan Helfant's Digital Garden

up: [[Independent Measure T Tests]] # Welch's t test Welch's t test allows us to undergo a t test without needing to have [[Homogeneity of variance]] like [[Student's t test]]. However it does require the sample sizes to be the same with robustness to a little variation in sample sizes. Unlike the students t test where df equals n1 + n2 - 2 we must use this equation to find the degrees of freedom: ![[Pasted image 20221003150340.png]] We use these rounded down degrees of freedom to find the tcrit value on the [[T Unit Table]]. Finding the t statistic works exactly the same way as [[Independent Measure T Tests|explained here.]] Related: ### Rstudio ```r # Independent-measures t-test examples # ... with repeated-measures t-test illustration following # PSYCH 2500 samp50 = read.csv("Samples50.csv") #ctrl = scan("UntreatedSample_50.txt") #treated = scan("TreatedSample_50.txt") hist(samp50$ctrl, col=3,xlim=c(100,180),ylim=c(0,15),breaks=seq(100,180,5)) abline(v=mean(samp50$ctrl),lty=1,col="green") hist(samp50$treated,col=2,xlim=c(100,180),ylim=c(0,15),breaks=seq(100,180,5)) # Axes and bins forced to be the same as for CTRL abline(v=mean(samp50$ctrl),lty=1,col="green") abline(v=mean(samp50$treated),lty=1,col="red") # Student's t-test (uses pooled variance, Student's t-test bc var.equal=TRUE) t.test(samp50$ctrl, samp50$treated, var.equal=TRUE) # Welch's t-test (equal variances not assumed; this is the default) # Note that the df is not an integer in this "Welch test" t.test(samp50$ctrl, samp50$treated, var.equal=FALSE) # Welch test again -- identical, but report 80% confidence interval instead of default 95% # instead of default 95% CI t.test(samp50$ctrl, samp50$treated, var.equal=FALSE, conf.level=0.80) Notice that the t.test() function output is a data structure that comes out as formatted text: > t.test(Samp.Tr, Samp.NZ) Welch Two Sample t-test data: Samp.Tr and Samp.NZ t = 0.18882, df = 48.261, p-value = 0.851 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -7.153704 8.636813 sample estimates: mean of x mean of y 24.24334 23.50179 We can take this apart if we want to. The str() command gives us a hint about how to do so: > Output = t.test(Samp.Tr, Samp.NZ) > str(Output) List of 10 $ statistic : Named num 0.189 ..- attr(*, "names")= chr "t" $ parameter : Named num 48.3 ..- attr(*, "names")= chr "df" $ p.value : num 0.851 $ conf.int : num [1:2] -7.15 8.64 ..- attr(*, "conf.level")= num 0.95 $ estimate : Named num [1:2] 24.2 23.5 ..- attr(*, "names")= chr [1:2] "mean of x" "mean of y" $ null.value : Named num 0 ..- attr(*, "names")= chr "difference in means" $ stderr : num 3.93 $ alternative: chr "two.sided" $ method : chr "Welch Two Sample t-test" $ data.name : chr "Samp.Tr and Samp.NZ" - attr(*, "class")= chr "htest" Lots of information. The basic idea is that we can use the same “$” notation that we use to pull dataframes apart to pull this structure apart and get access to its individual parts. For example, to get just the p‐value, we can say: > Output$p.value [1] 0.8510253 To get just the t statistic, we say: > Output$statistic t 0.1888206 This is actually an odd sort of number called a “named num” (check it with the str() function), meaning that is a number that is labeled “t”. You can compute normally with it, but it will carry that “t” attribute around unless you “un‐name” it: > unname(Output$statistic) [1] 0.1888206 Now you have just the number itself (the value of the t statistic), and can do with it as you will. ``` ___ # Resources