Nonparametric tests - Aidan Helfant's Digital Garden

up:: Tags:: #🌱 # Nonparametric tests Parametric tests and nonparametric tests assess hypothesis about parameters. Parametric tests, however, require assumptions about the underlying population distributions like them being normally distributed. Nonparametric tests, however, are distribution free tests and thus don't make hypothesis about specific parameters as they hold no assumptions about the underlying population data being normal. This makes them ideal for nominal and ordinal data. ###### What does a Chi-square test measure? The chi-square statistics X^2 measures the discrepancy between the null hypothesis distribution and the distribution of the observed data. ###### What is the difference and similarities between chi-squared tests and one sample t tests? Chi squared tests and one sample t tests both are intended to use data to test hypothesis about a single population. The main difference is chi squared tests are used on nominal and ordinal data where as one sample t tests are used on interval or ratio data. ###### What is the difference between the expected and observed frequencies? ![[Pasted image 20221116145934.png|700]] ###### How does sample size effect chi-square? Higher the sample size higher the sensitivity to differences. ### Example For example, in 100 coin flips the expected distribution might be 50/50 but the observed frequency might be 47/53. ###### What does the chi-squared goodness of fit test ask? It tests whether a sample of categorical data is the same as from a defined distribution set. H0: null hypothesis defines a distribution, usually one in which there is no preference between categories, and then states that the set of categorical sample data doesn't differ from that distribution. Generally, this means there is no relationship between the variables. H1: alternative hypothesis states the set of categorical sample data is different from the defined distribution set by the null hypothesis. Generally this means there is a relationship between the variables. ###### How does sample size affect statistical power in chi-squared tests? Just like in all other tests the higher the sample size the more statistical power. ###### How does the Chi-square distribution table differ from the normal distribution, t distribution, and f distribution? The higher the df the closer the chi-squared distribution gets to looking like a normal distribution. ![[Pasted image 20221116151403.png]] ## Steps for hypothesis testing in a Chi-squared test? State hypothesis. H0: null hypothesis states that the set of categorical sample data doesn't differ from a known discrete distribution probably the population. H1: alternative hypothesis states the set of categorical sample data didn't come from the discrete distribution. Find degrees of freedom by taking the number of categories and subtracting by one. Then find the critical values using the [[Chi-squared distribution table]]. Calculate Chi-square value: ![[Pasted image 20221116151452.png]] Make conclusions based on if the chi-squared value falls outside of the population value. To report a conclusion for a chi-squared test you put X^2(df, n = sample size) = X^2 value, p greater or less than alpha value. ## Chi-square test for independence Chi-squared independence tests test whether there is a relationship between two or more categorical variables. Like the other chi-squared test there is no need for numerical data but just frequencies. ### Hypothesis testing with Chi-square test for independence 1. Define H0: and H1:. scenario the H0 is that there is no relationship the variables. The H1: states that there is a relationship between the variables. 2. Find df by and xcrit using the equation ![[Pasted image 20221116154506.png]] where R is the number of rows and C is the number of columns. 3. Find the expected frequency for each block with the expected frequency equation and create another table with all of the expected frequencies. ![[Pasted image 20221116154256.png]] 4. Find x^2 statistic using the x^2 statistic equation 5. Come to conclusion by measuring the X^2 value against the X^2crit value. ### Effect size for Chi-squared tests ![[Pasted image 20221116155127.png|700]] ![[Pasted image 20221116155201.png|700]] Standard format for reporting Chi-Squared value: ![[Pasted image 20221206083601.png]] ###### What are the limitations of the chi-squared test? 1. There must be an independence of observations. If someone contributed multiple data points that would be violating this concept. 2. The size of expected frequencies must be greater than five for each of the cells. The problem can be avoided by using sufficiently large samples. ## Chi Squared Tests for Ordinal Data ## Kruskal Wallis Test A nonparametric alternative to the one factor independent measures ANOVA. It can be used on ordinal data or ranked from interval/ratio data. However, like the chi-squared test you must have sample sizes that are greater than 5 for each group. ### Hypothesis Testing ![[Pasted image 20221201094223.png]] ![[Pasted image 20221201094424.png]] The df = k - 1. ###### How do you get the H statistic? ![[Pasted image 20221201094411.png]] The smallest H statistic value is gotten when all the T values for the groups are the same. Thus the larger the H-value the more likely it is that the groups are ununiformly distributed. ## Friedman Test The Friedman test is a one factor nonparametric alternative to the [[Repeated Measures ANOVA]]. It's great for being used on ordinal data and ratio/interval data that is ranked. ### Hypothesis testing ![[Pasted image 20221201094810.png]] ![[Pasted image 20221201094913.png]] ![[Pasted image 20221201095010.png]] ## Wilcoxon rank-sum test The Wilcoxon rank sum test works by rank ordering two separate samples under some variable and testing to see if they significantly differ by being concentrated on opposite ends of the ranking spectrum. ###### What do the null and alternative hypothesis state? The null hypothesis is that the two samples will be randomly distributed across the rank distribution because there is no difference. The alternative hypothesis is that the two samples will be concentrated non randomly on the rank distribution. ![[Pasted image 20221128150513.png]] ###### What is the reason to use Wilcoxon's test instead of a t test? In principle the t test or Wilcoxon could be done on numerical data with means. However, the Wilcoxon test doesn't require that the population distributions be normally distributed meaning it doesn't need sample sizes of over 30 for the [[Central Limit Theorem (CLT)]] to hold true. ### U statistic The U statistic equals the sum of the point totals for the opposite group that have higher ranks. We always choose the U statistic that has the lesser of the two U statistics. ###### What does the U statistic represent? Therefore two completely non overlapping samples have a U = 0. And unlike all the other statistics we have talked about the lesser the U statistic the more likely the rejection of the null hypothesis. The higher the U statistic, however, in comparison to the sample size of the two samples, the more overlap. The highest overlap possible, in other words no difference between the samples, is half of the sample size. ###### Computational formula for finding Up and Ur ![[Pasted image 20221128152957.png]] R stands for the ranks of R or P. #### Example Table to perform the Wilcoxon test ![[Pasted image 20221128152058.png]] Once you have the smaller U statistic, use the [[U unit table]] to find the critical value. ### Normal approximation of the U statistic ![[Pasted image 20221128154037.png]] By creating a [[Normal distributions|normal distribution]] out of the rank values through converting them into [[Z scores]] we can perform a hypothesis test using z testing instead. We can do this because we are finding the means and standard deviations of the ranks rather than of the original scores. ###### Finding the U statistic by converting it into a Z score ![[Pasted image 20221128155455.png]] ## Wilcoxon signed ranks-test Wilcoxon signed rank tests are an alternative to [[Repeated measure t tests]]. Like the Wilcoxon ranked sum test it's generally used for hypothesis testing with low sample size and in a repeated measures study design. Like the Welch's t test, it doesn't require homogeneity of variance. The Wilcoxon signed ranks test analyzes differences in distribution not differences in variability across the samples. This is because variability is accounted for when you change scores into ranks. Like the Wilcoxon ranked sum test a statistic higher than the critical score indicates that the null hypothesis is true. The test assumes that data is continuous because for it to work well ties should be rare. ### Hypothesis Testing ###### What are the H0: and H1: for nonparametric tests? ![[Pasted image 20221130151627.png]] Ranks are assigned based on the rank differences and the numbers closer to zero are always assigned lower ranks. The sign of the differences are accounted for after the ranks are assigned by splitting them into positive sign and negative sign columns. Zeros are split evenly down the positive and negative columns. If there is an odd number of zeros you simply take one out of the data. ###### How do you get the Wilcoxon T statistic? You add up the sign columns and take the smaller number to find the Wilcoxon T statistic. This is what the table looks like: ![[Pasted image 20221130150119.png]] You find the Wilcoxon T statistic critical value by using the [[Wilcoxon t table]]. You report the critical values like this: ![[Pasted image 20221130151407.png]] ### Normal approximation for determining critical values of Wilcoxon T. You can normally approximate to find the Wilcoxon Tcrit values once you have a sample size that is greater than 20. ![[Pasted image 20221130152209.png]] Related: ___ # Resources