Page:The World Within Wikipedia： An Ecology of Mind.pdf/11

Information 2012, 3 analyses were conducted to test for multicollinearity of COALS, ESA, and WLM by regressing each on the other two. The obtained tolerances, all between 0.49 and 0.60, suggest that the three models are not collinear. The explanation that each model is contributing substantially and equitably to the prediction is further supported by the similar magnitudes of β in Table 6.

Table 5. Average distance by WordSimilarity-353 semantic category.

{[center|Table 6. Regression on ranks of COALS, ESA, and WLM, for human judgment ranks (N = 353).}}

Notes: R = 0.80, ∗p < 0.0001.

To address the question of the maximum potential of the COALS, ESA, WLM, and W3C3 models for correlation with human ratings, an oracle analysis was undertaken. The oracle first converts the output of each model to ranks. Then for each word pair, the oracle selects the output of the model whose rank most closely matches the rank of the human rating. This procedure generates the best possible correlation with the human ratings, based on the assumption that the oracle will choose the closest model output every time. Using this methodology with all four models, the oracle correlation is r(351)=0.93. Using only the three constituent models, the oracle correlation is r(351)=0.92, which is equivalent to the previous best reported oracle correlation that used roughly an order of magnitude more data than the present study. So the maximum potential correlation of the three constituent models matches the previous best result, with a minor improvement due to including the W3C3 model in the oracle.

The preceding analyses provide fairly strong evidence for reason behind the W3C3 model’s efficacy. The W3C3 model has significantly higher correlations than the constituent models on the entire dataset,