User:GrafZahl/How to digitalise works for Wikisource

This is a quick rundown of how I usually create a Wikisource-ready DjVu scan of an old public domain or otherwise free work. tiffsplit scan.tif tifdir/article
 * Often, I do not own the relevant works myself, so I visit a library which has them. I'm most interested in old mathematics journal articles, and most libraries provide those only for reference, not for borrowing. So I have to use a local photocopier. Normally, I let the machine send the copies directly to my private e-Mail address as 300dpi or 400dpi TIFF or PDF file. Not only are the fees much smaller than for creating hardcopies but this also saves an additional A/D step, leading to higher output quality. Plus, library photocopiers are much faster than my personal scanner. Unfortunately, this option is not available in small libraries with old photocopiers.
 * When I work with the raw scans, I use the PBM file format (that's the format created by my personal scanner).
 * To convert TIFF files to PBM, I first create a subdirectory called  and split the original TIFF into its individual pages with the   program from libtiff:
 * will create files named,  , and so on in the   subdirectory. Then I use the   program from ImageMagick to convert the file to PBM (and possibly rotate the file in the process). For example

cd tifdir/ for file in *.tif; do convert -rotate 90 "$file" "$file".pbm; done;
 * will create rotated PBM files (Warning to mathematicians: the rotation algorithm uses left-handed (clockwise) rotation.)
 * To convert PDF files to PBM, I use the  utility from the Xpdf suite. The output file format depends on the format of the image embedded in the PDF. If it's not already PBM, you can use   like above to convert the files to PBM.

for file in *.pbm; do cjb2 -dpi 400 -clean "$file" "$file".djvu; done; djvm -c finished_work.djvu *.djvu
 * Sometimes, the PBMs need to be cropped before they are converted to DjVu. I use a quick-and-dirty home-brewn  program for that which lets you specify the coordinates of the extraction rectangle (so you can read them off directly from some image manipulation program like The GIMP). The reason I don't use any off-the-shelf image manipulation software is that they're often not sufficiently capable of handling bitonal files.
 * Once I have the PBM files ready, I convert them to DjVu using the  and   programs from the DjVuLibre suite:
 * Obviously, you may have to change the  option depending on your situation. The   option removes "flyspecks", leftover artefacts from the scanning process. Of course, this also means the compression is no longer lossless, so depending on your source material you may want to omit this option.


 * The finished DjVu can the be uploaded to the Commons. Don't forget to fill out the info template, specify a licence, and categorise. Example: commons:Image:Über die Vertauschung von Argument und Parameter in den Integralen der linearen Differentialgleichungen.djvu.
 * Once the file is uploaded to the commons, the text should be transcribed and proofread for Wikisource (or OCR'd, but OCR software does not work very well on texts with a lot of mathematical symbols). The ProofreadPage extension makes this process quite easy. See also Help:Side by side image view for proofreading.