Page:Login USENIX Newsletter feb1983.djvu/9

;login: space saving technique for ordered data, known as incremental encoding, has been adapted for the similar task of dictionary compression [Morris/Thompson, 1974]. Here, a count of the longest prefix of the preceding name is computed. For example,
 * /usr/src
 * /usr/ src/cmd/aardvark.c
 * /usr/src/cmd/armadillo.c
 * /usr/tmp/zoo

transforms to
 * 0 /usr/src
 * 8 /cmd/aardvark.c
 * 14 armadillo.c
 * 5 tmp/zoo

If we choose to delimit the pathname residue with parity-marked count bytes, decoding can be as simple as (omitting declarations):
 * fp = fopen ( COMPRESSEDFILELIST, "r" );
 * while ( (count = (getc ( fp ) & 0177)) != EOF ) {
 * for ( p = path 4- count; (*p++ = getc ( fp )) < 0200; )
 * ; /* overlay old path with new */
 * ungetc ( *-p, fp );
 * *p- = NULL;
 * if ( match ( path, name ) = = YES )
 * puts ( path );
 * }

where match is a favorite routine to determine if string path contains name. In fact, since the coded filelist is about five times shorter than the uncoded one, and the decoding is very easy, this program runs about three to four times as fast as the efficient grep on the expanded file.

Speedier Yet
Useful as it is, there is still room for improvement. (Aside: this code is best inserted into the distributed find. There is no need to burden UNIX with another command [and manual page] when we can improve an existing similar program. Conveniently, there is no two-argument form of find so we can fill the vacuum with an unadorned
 * find name

to perform the function.) Notice that the above code fragment still searches through all the characters of expanded list, albeit in main memory instead of disk. It turns out that this can be avoided by matching the name substring backwards against a reversed pathname, until the boundary delineated by the repetition count. Assuming namend points to the final character of a NULL-byte prefixed name, then replace match by
 * for ( s = p, cutoff = path -I- count; s > = cutoff; s— ) {
 * if ( *s == *namend ) { /* quick first char check */
 * for ( p = namend - 1, q = s - 1; *p != NULL; p—, q— )
 * if (*q != *p)
 * break;
 * if ( *p = = NULL ) {
 * puts ( path );
 * break;
 * }