Page:The World Within Wikipedia： An Ecology of Mind.pdf/5

Information 2012, 3 meaning representation of each word is defined by its co-occurrence with other words, COALS can be considered to be a word-word level model. The COALS matrix is constructed using the following procedure. For each word in the corpus, the four words preceding and following that word are considered as context. The word at the center of the window is identified with a respective row of a matrix, and each of the eight context words in the co-occurrence window are identified with a respective column. Thus for a particular window of nine words, eight matrix cells can be identified corresponding to the row of the center word and the columns of the eight context words. These eight cells are incremented using a ramped window, such that the immediate neighbors of the center word are assigned a value of 4, the next neighbors a value of 3, and so on such that the outermost context words are assigned a value of 1. Thus the eight cells are incremented according to a weighted co-occurrence, where the weight is determined by the distance of the context word from the center word. After the matrix is updated with all the context windows for the words in the corpus, the entire matrix is normalized using Pearson correlation. However, since the correlation is calculating the joint occurrence of row and column words (a binary variable), this procedure is equivalent to calculating the phi coefficient, which we present as a simpler description of the normalization process. Let v be the value of a cell in the co-occurrence matrix, c be the column sum of the column containing v, r be the row sum of the row containing v, and T be the sum of all cells in the matrix. Table 1 summarizes the entries for calculating the phi coefficient.

Table 1. Per cell calculation of the phi coefficient.

The corresponding phi coefficient is

$$\phi = \frac{T\upsilon-cr}{c(T-c)r(T-r)}$$

In addition to transforming each cell value into its corresponding phi value, COALS “sparsifies” the matrix by replacing all non-positive cells with zero, such that for any cell value υ

$$\upsilon = \begin{cases} \phi & \mbox{if }\phi\mbox{ is}>0 \\ 0 & \mbox{otherwise} \end{cases}$$

Thus the final representation for a given word is its associated row vector, whose only non-zero components are positive correlations between that word and context words. The semantic similarity between two such words may be compared by locating their corresponding row vectors and calculating the correlation between them.

It is worth noting that the original COALS article proposes several variants based around the above process. One such variation removes 157 stop words before processing the corpus; another restricts