Page:Sm all cc.pdf/72

 * accidental correlations (1 of 20 random data comparisons is ‘significant’ at the 95% confidence level);
 * two effects of a third variable that is causal and possibly unknown (X⇒A & X⇒B);
 * causally linked, but only indirectly through intervening factors (A⇒X1⇒X2⇒B, or B⇒X1⇒X2⇒A); or
 * directly causally related (A⇒B or B⇒A).

Earlier in this chapter, we examined quantitative measures of correlation strength and of the significance of correlations. Only an inductive conceptual model, however, can provide grounds for assigning an observed correlation to one of the four categories of causality/correlation. No quantitative proof is possible, and the quantitative statistical measures only provide clues.

Many factors affect or ‘cause’ change in a variable. Usually, our interest in these factors decreases with decreasing strength of correlation between the causal variables Ai and the effect B. In general, we judge the relative importance of various causal variables based on two factors: the strength of correlation and the rate of change dB/dAi. High correlation strength means that much of the observed variation in effect B is somehow accounted for by variation in possible causal variable Ai. High rate of change means that a substantial change in effect B is associated with a modest change in causal variable Ai. However, rate of change alone can be misleading, for the total natural range of two causal variables A1 and A2 may be so different that dB/dA1 could be larger than dB/dA2 and yet A2 causes more variation in B than A1 does. Earlier in this chapter, we employed the correlation coefficient as a quantitative measure of correlation strength and the linear-regression slope as a measure of rate of change.

If one has three variables (C, D, and E) that are correlated, correlation strength can be used to infer likely relationships among them. Statistical techniques such as path analysis and analysis of covariance are best for determining these interconnections, but we will confine the present discussion to a more qualitative consideration of the problem. For example, suppose the correlation strengths among C, D, and E are as follows: C/D strong, D/E strong, and C/E weak. Probably, the weak relationship C/E is a byproduct of the two stronger correlations C/D and D/E, each of which may be causal. Direct causal connections (A⇒B) usually generate much stronger correlations than indirect ones (A⇒X1⇒X2⇒B). Extraneous factors affect each of the steps (A⇒X1, X1⇒X2, and X2⇒B) of the indirect correlation, thus weakening the overall correlation between A and B. Note, however, that relative strengths of correlations cannot establish causality; they only provide evidence about relative proximity of links among variables. For example, the pattern of C/D strong, D/E strong, and C/E weak could result either from C⇒D⇒E or from E⇒D⇒C. Many surveys of U.S. voting patterns have shown that those who vote Republican have, on average, more education than Democratic voters. Does this mean that education instills Republican voting, or perhaps that higher intelligence inspires both greater education and Republican voting? Hoover [1988] uses this example to illustrate how social sciences need to beware of correlations induced by an unidentified third variable. More detailed and well-controlled surveys demonstrate that family wealth is the third variable: children of wealthier families tend to acquire a higher level of education and to be wealthier than average, and the voting pattern of wealthier individuals is more likely to be Republican than Democratic.