Page:Lawhead columbia 0054D 12326.pdf/38

 what we set out to do to begin with! It need not trouble us that the pattern we’ve identified doesn’t hold everywhere in S—identifying that pattern (if indeed there is one to be identified) is another project entirely.

When we’re investigating a sequence like S, then, our project is two-fold: we first pick a region of S about which we want to make predictions, and then attempt to identify a pattern that will let us make those predictions. When we have a candidate pattern, we can apply it to heretofore unobserved segments of our target region and see if the predictions we’ve made by using the pattern are born out. That is: we first identify a particular way of carving up our target data-set and then (given that carving) see what patterns can be picked out. That any patterns identified by this method will hold (or, better, that we have good reason to think they'll hold) in a particular region only is (to borrow the language of computer programmers) a feature rather than a bug. It's no criticism, in other words, to say that a putative pattern that we've identified relative to a particular carving of our subject-matter holds only for that carving; if our goal is just to make predictions about a restricted region of S, then identifying a pattern that holds only in that region might well make our jobs far easier, for it will give us license to (sensibly) ignore data from outside our restricted region, which might well make our task significantly easier.

Let's think about another potentially problematic case. Suppose now that we're given yet another piece of S:

S$3$: 0010100100010

S$3$ is almost consistent with having been generated by R—only a single digit is off (the bolded

28