Page:10 Rules for Radicals.djvu/31

 intended for casual use only. I'll grant you that 20 million pages had perhaps exceeded the expectations of the people running the pilot public access project, but surprising a bureaucrat isn't illegal.

From previous experience putting Court of Appeals decisions on-line, I was pretty sure this PACER data was going to be a mess. Rather than release the data on the net, I started an audit looking for privacy violations.

For the next two months, a series of scripts ran that looked for personal identifiers. Any files with a hit were manually examined. Many of them were false positives, such as government contract numbers.

But, there were also a whole bunch of files that did have problems, and for each of those I looked around for things the regex didn't catch and ended up finding even more Social Security numbers, and other illegal data like the names of minors and bank account numbers.

There was the obvious stuff, like the IRS suing a citizen and forgetting to redact their Social Security number on tax returns filed as evidence. Or, redacting the number by placing a black rectangle on top of the text or turning the color of the text to white.

There was also some really heart-wrenching stuff, like a list of 350 patients of a doctor who was being sued for malpractice. For each patient, the supporting document listed their home address, birth date, Social Security number, and a list of all their medical problems. Or the list of the members of labor unions involved in pension disputes, with their personal identifying information, home address, and earnings history.