Page:Popular Science Monthly Volume 65.djvu/136

132

FEW months ago, while studying the variation and interrelation of certain sentence constants, as average sentence-lengths, predication averages and simple-sentence frequencies in prose composition, my attention was called to an allied investigation, directed by Dr. T. C. Mendenhall, which takes for its basis the words used by an author rather than the sentences. The investigation in which I was then employed made it clear that the theory which asserts that an author uses invariable average sentence proportions is not true except when modified in essential respects, and I recognized at once that similar modifications would become necessary if the word instead of the sentence were taken as the element of composition.

The allied investigation to which I refer is set forth in two papers by Dr. T. C. Mendenhall, one in Science, March 11, 1887, entitled 'The Characteristic Curves of Composition,' the other, 'A Mechanical Solution of a Literary Problem' in December, 1901.

These papers deal with the relative frequency of words of different lengths employed by an author. It was found that different groups of a thousand words each, taken from the same author, manifested a rather remarkable uniformity in the frequency of words containing a given number of letters. Larger groups showed still greater uniformity, and hence it was inferred that if sufficiently large groups of words from the same writer were examined, they would yield practically the same relative frequencies of words with a given number of letters.

The results were exhibited graphically. The number of letters per word were used as abscissas, the number of words per thousand containing a definite number of letters were taken for ordinates, and the resulting points connected by straight lines. Thus a graph or diagram was obtained which presents to the eye in a simple manner the relative frequencies of words of different lengths. Two such diagrams from the same author will agree more or less closely, depending upon the number of words in the groups upon which the averages are based. In the writer's own words: "When the number of