Page:The Paging Game.djvu/4

Vol. 4/No. 7 '' X(1) must be in real storage before X(1) can be referenced. If the page happens to be on the paging device, it must first be read into real storage or "paged-in."''
 * X(1999)=40.
 * Action: X(1999) is set to 40. However, since X(1999) is not in the same page as X(1), X(1999)'s page may first have to read from a paging device.
 * X(2)=30.
 * Action: X(2) is set to 30. X(2) is in the same page as X(1); unfortunately, since some time has elapsed since X(1) was referenced, X(2)'s (and X(1)'s) page may have been "paged-out" to a paging device again. Remember that we have a timesharing system in which all jobs take turns using the same (or parts of the same) real storage, just as all the players in the crating game use the same workshop. Consequently, the whole paging process is more active than your job alone would have it be. So X(2)'s page might have to be reread from a paging device into real memory.
 * X(2000)=20.
 * Action: X(2000) is set to 20. X(2000) is in the same page as X(1999). Unfortunately, just as for X(2)'s and X(1)'s page, X(2000)'s page might have to be paged-in.
 * STOP
 * Action: Program terminates.
 * END
 * Action: None

Now we'll trace Program B.


 * Statement / Action
 * DIMENSION X(2000)
 * Action: None
 * X(1)=50.
 * Action: X(1) is set to 50. Like Program A, X(1)'s page may have to be read from a paging device.
 * X(2)=30.
 * Action: X(2) is set to 30. Unlike Program A, X(2)'s page will almost certainly not have to be reread, since it was referenced only an instant ago.
 * X(1999)=40.
 * ''Action: X(1999) is set to 40. Like Program A, X(1999)'s page may well have to be read from a paging device.
 * X(2000)=20.''
 * Action: X(2000) is set to 20. Like X(1)'s page, X(2000)'s page probably won't need to be reread.
 * STOP
 * Action: Program terminates.
 * END
 * Action: None

Program B is better than Program A because it tends to require a smaller number of page-reads. This is true because Program B localizes its references; it does all its work on one page before moving on to the next, whereas Program A bounces its references back and forth from page to page.

"So what?" someone says. "Page-reads are free, aren't they?" No, we say. Although the system doesn't charge for page-reads per se, each page-read uses a certain amount of CPU time, part of which you are charged for. Also, page-reads increase the elapsed time of a run by making the system (which operates much faster than the paging device) suspend your job until your page can be recalled into main memory from the paging device.

As we said, the example we've used isn't quantitatively accurate — in actual fact, Program A probably would have needed no more page-reads than Program B. The four statements would probably have been executed so quickly that once both pages came into real memory, either program — A or B —would have finished before the vagaries of the paging mechanism found it necessary to put any pages back onto the paging device.

However, for large arrays, these subtleties can grow into expensive disasters. How does this happen? Let's look at a more true-to-life example.




 * Program C: ||                 || DIMENSION X(1000,250)
 * ||                 || DO 100 I=1,1000
 * ||                 || DO 100 J=1,250
 * || 100   || X(I,J)=0.0
 * ||                 || STOP
 * ||                 || END
 * }
 * || 100   || X(I,J)=0.0
 * ||                 || STOP
 * ||                 || END
 * }
 * ||                 || END
 * }




 * Program D: ||                 || DIMENSION  X(1000,250)
 * ||                 || D0 100 J=1,250
 * ||                 || D0 100 I=1,1000
 * || 100   || X(I,J)=0.0
 * ||                 || STOP
 * ||                 || END
 * }
 * || 100   || X(I,J)=0.0
 * ||                 || STOP
 * ||                 || END
 * }
 * ||                 || END
 * }

These two programs were compiled and run by your author. Program D (the right way) took 4.12 seconds of CPU time and required 17 page-reads. Program C was stopped before it had finished; when it was stopped, at about 1/8 toward completion, it had used 0.50 seconds of CPU time and had needed more than 3560 page-reads! The test was made on the 360/67 under conditions of moderately heavy loading. In a very busy time, the difference would have been greater; in a slack time, the difference would have been less. Why? Let's look at how these programs referenced their data.

Multi-dimensioned arrays in FORTRAN are stored internally in column-major order. This means that the array X in Programs C and D would have been stored in the following order:
 * X(1,1),...,X(1000,1),X(1,2),...,X(1000,2),...,X(1,250),...,X(1000,250)

That is, the array elements for column 1 (for all rows) are stored at the beginning addresses of the array; next, in ascending addresses, are all of the elements of column 2, and so forth. The elements of column 250 occupy the highest addresses in the region of storage allocated to the array X. Note that the total size of this array is 1,000,000 bytes— four 5