Hypothesis Testing

Classical statistics following the work of R. A. Fisher relies heavily upon randomization to cover a multitude of sins. This was heavily emphasized in my graduate statistics classes, but only reading Statistics for Experimenters do I come to appreciate this position more fully. R. A. Fisher was an experimentalist himself, and he understood the common difficulties encountered in designing experiments to produce valid results. One of the requirements of a typical significance test is that the distribution of the parent populations follows a normal distribution. In practice, this requirement is often somewhat relaxed, because many of the most popular techniques are robust with respect to normality.

Fisher wanted to create a technique that does not have a normality assumption. This technique explains the importance of the null hypothesis in standard hypothesis testing. The null hypothesis states that the treatments in question, often food or fertilizer or soil in Fisher's experiments, literally have no effect. Thus, he reasoned, we ought to be able to take the actual data, and mix up the treatments to create all the possible combinations.

Let us take a very simple example. If we have two fertilizers, A and B, and we have eight plots of land to apply them to, there are 8!/4!(8-4)! = 8!/4!*4! = 70 ways that the two fertilizers could be applied to those eight plots in groups of four. Fisher took all 70 combinations and applied them to the actual data to produce a randomized reference distribution. Under the null hypothesis, each combination ought to be identical except for unknown factors. You then checked to see whether the mean of the actual combination you tested was uncommonly large or small by comparison to the other means.

It was determined that Student's t distribution closely approximates the reference distribution produced by this procedure, and in the days before ready access to computers or pocket calculators consulting a t-table was much faster than calculating the reference distribution, especially for large numbers of data points.  A big benefit to Fisher's technique is that it freed you from both the normality and random sampling assumptions. However, you still needed to randomize your treatments to make this work. Thus a big emphasis on randomization, and a relative deemphasising of the other two assumptions.

This is the origin of significance testing of hypotheses, and the origin of p-values. In this day and age, it would actually be easy to calculate the randomization reference distribution for a given set of data, but p-values taken from parametrized distributions is the most common method. Bayesian techniques exist, but are far less commonly applied.