Page:Proceedings of the Royal Society of London Vol 60.djvu/512

Rh The only theory of correlation at present available for practical nse is based on the normal law of frequency, but, unfortunately, this law is not valid in a great many cases which are both common and important. It does not hold good, to take examples from biology, for statistics of fertility in man, for measurements on flowers, or for weight measurements even on adults. In economic statistics, on the other hand, normal distributions appear to be highly exceptional: variation of wages, prices, valuations, pauperism, and so forth, are always skew. In cases like these we have at present no means of measuring the correlation by one or more “ correlation coefficients ” such as are afforded by the normal theory.

It seems worth while noting, under these circumstances, that in ordinary practice statisticians never concern themselves with the form of the correlation, normal or otherwise, but yet obtain results of interest—though always lacking in numerical exactness and frequently in certainty. Suppose the case to be one in which two variables are varying together in time, curves are drawn exhibiting the history of the two. If these two curves appear, generally speaking, to rise and fall together, the variables are held to be correlated. If on the other hand it is not a case of variation with time, the associated pairs may be tabulated in order according to the magnitude of one variable, and then it may be seen whether the entries of the other variable also occur in order. Both methods are of course very rough, and will only indicate very close correlation, but they contain, it seems to me, the point of prime importance at all events with regard to economic statistics. In all the classical examples of statistical correlation ( ., marriage-rate and imports, corn prices and vagrancy, out-relief and wages) we are only primarily concerned with the question is a large as usually associated with a large y (or small y') ; the further question as to the form of this association and the relative frequency of different pairs of the variables is, at any rate on a first investigation, of comparatively secondary importance.

Let Ox, O ybe the axes of a three dimensional frequency-surface drawn through the mean O of the surface parallel to the axes of measurement, and let the points marked (x) be the means of successive as-arrays, lying on some curve that may be called the curve of regression of as on y. Now let a line, RR, be fitted to this curve,