Category Archives: Tukey

Tukey 1991 explains Null-Hypothesis Testing in 8 Paragraphs

1. We need to distinguish regions of effect sizes and precise values. The value 0 is a precise value. All positive values or all negative values are regions of values.

2. The most common use of null-hypothesis testing is to test whether the point-null or nil-hypothesis (Cohen, 1994) is consistent with the data.

3. Tukey explains that this hypothesis is likely to be false all the time. “All we know about the world teaches us that the effect of A and B are always different”. Many critics of NHST have suggested that this makes it useless to test the nil-hypothesis because we already know that it is false (the prior probability of H0 being true is 0, no data can change this).

4. NHST becomes useful when we think about the null-hypothesis (no difference) as the boundary value that distinguishes two regions. We are really testing the direction of the mean difference (or the sign of of a correlation coefficient). Once we can reject the nil-hypothesis (p < alpha) in a two-sided test, we are allowed to interpret the direction of the mean difference in a sample as the mean difference in the population (i.e., if we had studied all people from which the sample was drawn).

5. Some psychologists have criticized NHST because it can never provide evidence for the nil-hypothesis (Rouder, Wagenmakers). This criticism is based on a misunderstanding of NHST. Tukey explains we should never accept the nil-hypothesis because we can never provide empirical support FOR a precise effect size.

6. Once we have evidence that the nil-hypothesis is false and the effect is either positive or negative, we may ask follow-up questions about the size of an effect.

7. A good way to answer these questions is to conduct NHST with confidence intervals. If the confidence interval includes 0, we cannot draw inferences about the direction of the effect. However, if the confidence interval does not include 0, we can make inferences about the direction of an effect and the boundaries of the intervals provide information about plausible values for the smallest and the largest possible effect size.

8. In conclusion, we can think about two-sided tests as an efficient way of conducting two one-sided tests without inflating the type-I error probability. Rejecting the hypothesis that there is no effect is not interesting. Determining the direction of an effect is and NHST is a useful tool to do so.

9. I probably made things worse by paraphrasing Tukey. Therefore I also posted the relevant section of his article below.