Page:Biometrika - Volume 6, Issue 1.djvu/2

2 But, as we decrease the number of experiments, the value of the standard deviation found from the sample of experiments becomes itself subject to an increasing error, until judgments reached in this way may become altogether misleading.

In routine work there are two ways of dealing with this difficulty: (1) an experiment may be repeated many times, until such a long series is obtained that the standard deviation is determined once and for all with sufficient accuracy. This value can then be used for subsequent shorter series of similar experiments. (2) Where experiments are done in duplicate in the natural course of the work, the mean square of the difference between corresponding pairs is equal to the standard deviation of the population multiplied by $$\sqrt{2}$$. We can thus combine together several series of experiments for the purpose of determining the standard deviation. Owing however to secular change, the value obtained is nearly always too low, successive experiments being positively correlated.

There are other experiments, however, which cannot easily be repeated very often; in such cases it is sometimes necessary to judge of the certainty of the results from a very small sample, which itself affords the only indication of the variability. Some chemical, many biological, and most agricultural and large scale experiments belong to this class, which has hitherto been almost outside the range of statistical enquiry.

Again, although it is well known that the method of using the normal curve is only trustworthy when the sample is “large,” no one has yet told us very clearly where the limit between “large” and “small” samples is to be drawn.

The aim of the present paper is to determine the point at which we may use the tables of the probability integral in judging of the significance of the mean of a series of experiments, and to furnish alternative tables for use when the number of experiments is too few.

The paper is divided into the following nine sections:

The equation is determined of the curve which represents the frequency distribution of standard deviations of samples drawn from a normal population.

There is shown to be no kind of correlation between the mean and the standard deviation of such a sample.

The equation is determined of the curve representing the frequency distribution of a quantity $$z$$, which is obtained by dividing the distance between the mean of a sample and the mean of the population by the standard deviation of the sample.

The curve found in I. is discussed.

The curve found in III. is discussed.

The two curves are compared with some actual distributions.

Tables of the curves found in III. are given for samples of different size.

and The tables are explained and some instances are given of their use.

Conclusions.