Wikisource:Sources/Ebook.lib.hku.hk notes


 * Examining the JavaScript at the HKU site (http://ebook.lib.hku.hk/res/js/heading.js) reveals that the PDF files are stored in a “pdf” subdirectory under each title’s directory. So, for The Art of Cross-Examination (http://ebook.lib.hku.hk/CADAL/B31423735/index.html), the images are in http://ebook.lib.hku.hk/CADAL/B31423735/pdf/. Files are named , where   is the page number (padded out with zeroes to make an 8.3 filename—so,  ,  , etc.  Although you can’t browse the contents of the pdf directory, you can request individual files. If you know that the last page number is (for instance) 289, you should be able to write a short script that calls wget 289 times and get them all that way. Tarmstro99 00:15, 16 May 2008 (UTC)


 * And if you were to do such a thing and concatenate the scans, you might end up with something like Image:The Art of Cross-Examination.djvu. :-) Tarmstro99 01:39, 16 May 2008 (UTC)


 * {| style="background:transparent; padding:2px 2px 2px 2px;" border="0" cellpadding="0" cellspacing="0"


 * Sweet. It's alive!  It's alive! --❨Ṩtruthious ℬandersnatch❩ 04:05, 16 May 2008 (UTC)
 * }
 * }


 * It sounds like an interesting book: here is colum one of a two column NYT book review. John Vandenberg (chat) 07:48, 17 May 2008 (UTC)

Here's a Python program to download a 700 page book:

import urllib for x in range(1,700): s="00000000"+str(x) s = s[-8:] urllib.urlretrieve('http://ebook.lib.hku.hk/CADAL/B31440708/pdf/'+s+'.pdf', s+'.pdf')

and you can then merge the files with:

pdftk *.pdf cat output combined.pdf