Page:EB1911 - Volume 22.djvu/415

LAWS OF ERROR] table the columns for the normal distribution and for the discrepancy e should each be halved; and accordingly the column for e2/m should be halved. Thus e2/m being reduced to 22.9, P as found from Professor Pearson's table is between 995 and 629. That is, such a distribution might be expected to occur once on an average some once or twice in a hundred times. If actual duplication of this sort is not common in statistics, yet in all such applications of the Pearsonian criterion—and in other calculations involving the number of observations, in particular the determinations of probable error—a good margin is to be left for the possibility that the n observations are not perfectly independent: e.g. the accidents of wind or nerve which affected one shot may have affected other shots immediately before or after.

158. (2) The Generalized Law of Error.—That the normal law of error should not be exactly fulfilled is not disconcerting to those who ground the law upon the plurality of independent causes. On that view the normal law would only be exact when the numbers of elements from which it is generated is very great. In general, when that number is large, but not indefinitely great, there is required a correction owing to one or other of the following imperfections: that the elements do not fluctuate according to the normal law of frequency; that their fluctuations are not independent of each other; that the function whereby they are aggregated is not linear. The correction is formed by a series of terms descending in the order of magnitude.

159. The first term of this series may be written

where c2/2 is the mean square of deviation for the compound and also the sum of the mean squares of deviations for the component

elements, k1 is the mean cube of deviations for the compound and the sum of the mean cubes for the components, and the elements are supposed to be such and so numerous that k1/c3 is of the order 1/√n. This second approximation, first given by Poisson, was rediscovered by De Forest. The present writer has obtained it by a variety of methods. By a further extension of these methods a third and further approximations may be found. The corrected normal law is then of the form

where k = k1/c3, k2 = k2/c4, k1 and c are defined as above, k2 is the sum of the respective differences for each element between its mean fourth power of error and thrice its mean square of error, and also the corresponding difference for the compound. The formula may be verified by the case of the binomial, considered as a simple case of the law of great numbers. Here

These values being substituted for the coefficients in the general formula, there results an expression which may be obtained directly by continuing to expand the expression for a term of the binomial.

In virtue of the second approximation a set of observations is not to be excluded from the affinity to the normal curve because, like the curve of barometric heights, it is slightly asymmetrical. In virtue of the third approximation it is not excluded because, like the group of shot-marks above examined, it is, though almost perfectly symmetrical, in other respects apparently somewhat abnormal.

160. If the third approximation is not satisfactory there is still available a fourth, or a still higher degree of approximation.

The general expression for y which (multiplied by ∆x) represents the probability that an error will occur at a particular point (within a particular small interval) may be written

where y0 is (the normal error-function) $dy⁄di$e−x 2/2k, k is the mean square of deviation; k1, k2,. . ., &c., are coefficients formed from the mean powers of deviation according to the rule that kt is the difference between the tth mean power as it actually is and what it would be if the (t−1)th approximation were perfectly correct. Thus k1 is the difference between the actual mean third power and what the third power would be if the first approximation, the normal law, were perfectly correct, that is, the difference between the actual mean third power, often written 3, and zero, that is 3. Similarly k2 is the difference between the actual mean fourth power of deviation, say 4, and what that mean power would be if the second approximation were perfectly correct, viz. 3k2. Thus The series k1, k3, k5, &c., k, k2, k4, &c., form each a succession of terms descending in the order of magnitude, when each k, e.g. kt, has been divided by the corresponding power, i.e. the power (t+2) of the parameter or modulus c = √(2k), which division is secured by the successive differentiations of y0, with which each k is associated, e.g. kt with $$\left(\frac{d}{dx}\right)^{t+2}$$. Moreover, the first term of the odd series of k's when divided by the proper power of the parameter, viz. c3 is small in comparison with the first term of the even series, viz. k, properly referred—divided by c2 ( = 2k).

161. Whatever the degree of approximation employed, it is to be remembered that the law in general is only applicable to a certain

range of the compound magnitude here represented by the abscissa x. The curve of error, even when generalized as here proposed, coincides only with the central portion—the body, as distinguished from the extremities—of the actual locus; a greater or less proportion.

162. The law thus generalized may be extended, with similar reservations, to two or more dimensions. For example, the second approximation in two dimensions may be written

where z0 is (the normal error-function)

x and y are (as before) co-ordinates measured from the centre of gravity of the group as origin, each referred to (divided by) its proper modulus; r is the ordinary coefficient of regression; 3,0k is the mean value of the cubes x3, 2,1k is the mean value of the products x2y, and so on; all these k's being quantities of an order less than unity. This form lends itself readily to the determination of a second approximation to the regression-curve, which is the locus of that y, which is the most probable value of the ordinate corresponding to an assigned value of x. Form the logarithm of the above-written expression (for the frequency-surface); and differentiate that logarithm with respect to x. The required locus is given by equating this