Assumptions - Statistics and Methodology

# *English version* ## The assumption of normality The assumption of normality refers to the distribution of the sampling means, which is assumed to be normally distributed. The distribution of sampling means is created by taking a sample of size n from the population multiple times and plotting the mean of each sample in a frequency histogram. According to the [central limit theorem](https://www.youtube.com/watch?v=_YOr_yYPytM) the distribution of the sampling means approaches a normal distribution as sample size n increases, regardless of the shape of the population. ^c7d337 However, typically we only take one sample from the population of our interest and therefore the distribution of sampling means remains unknown. There are two instances in which we can assume that the assumption of normality is met: - The sample size of our sample is 30 or larger. - The distribution of our sample is normal. Therefore, to test the assumption of normality, it is assessed whether our sample, or in case of a general linear model the residuals, is roughly normally distributed. This can be done based on descriptives, graphs and normality tests. ### [[Assumptions for advanced statistics#multivariate normality| Advanced: multivariate normality]] ## Homogeneity of variance The assumption of homogeneity of variance assumes that the [[Variance | variance]] in the dependent variable is similar across every level of the independent variable, regardless of the level of the independent variable. | homogeneity of variance | no homogeneity of variance| |----|----| |![[Pasted image 20210104210758.png]]|![[Pasted image 20210104210809.png]]| |![[Pasted image 20210106115201.png]]|![[Pasted image 20210106115240.png]]| ## [[Levels of measurement#interval scale | Interval scaled data]] ## Independence It is assumed that the observations between groups is independent, meaning that a participant can only be assigned to one group. AND the data between participants within each groups is independent from each other. Meaning, one participant's outcome cannot influence the other partcipants' outcomes. **An example of non-independence** > Imagine you count the amount of cookies employers eat on their workday. > Employers can choose the amount of cookies they eat and when they eat those. > You place a plate with 30 cookies. > Because the amount of cookies is limited, employers who go for cookies at the end of the day are influenced by the amount of cookies that are already eaten by the other. > For example one employer wants to eat 6 cookies, but there is only 1 cookie left. Meaning that the outcome of this specific employer is influenced by the amount of cookies eaten by other employers. In general linear models the assumption of independence applies to the residuals --- --- ---