Wikisource:Scriptorium/Help

__NEWSECTIONLINK__

Scan resolution (question for the technical people)
I'm getting frustrated with the poor quality of the scan image when proofreading A Dictionary of Hymnology. Have a look at Page:Dictionary of Hymnology 1908.pdf/44—the fine print is barely legible, even though I have increased the "Scan resolution in edit mode" to 2000. When viewing the PDF directly, the print is perfectly crisp.

I am guessing that the Wikimedia software takes the scan image at its default resolution, heavily JPG-compresses it, then increases the resolution of the compressed image, rather than scaling up before converting and compressing. This results in high-fidelity images of JPEG artefacts instead of actually usable scan images. I also have found a related task T38597, to replace JPG with PNG in these images, which would presumably mitigate this issue—but this ticket is ten years old and hasn't been touched for years.

Anyway, my question is this: is there any way to improve the scan image inside ProofreadPage? Or do I just have to open the PDF in a separate window (which is what I have been doing)? —Beleg Tâl (talk) 18:06, 2 July 2024 (UTC)


 * Don't know much about it, but there was a discussion a few months ago about the same problem and there the answer given was to use DjVu, not PDF. — Alien333 (what I did &amp; why I did it wrong) 18:36, 2 July 2024 (UTC)
 * Lol thanks, should have searched the archives first :D —Beleg Tâl (talk) 18:49, 2 July 2024 (UTC)
 * Taking a quick look at the code, the PdfHandler extension generates jpgs which are then retrieved by us. Which jpg is retrieved might vary but it doesn't regenerate the images at a higher resolution if the original conversion is a poor representation. MarkLSteadman (talk) 21:31, 2 July 2024 (UTC)
 * It may be possible to regenerate the pdf outside and then upload it such that the conversion goes smoother. MarkLSteadman (talk) 21:36, 2 July 2024 (UTC)
 * I use User:Inductiveload/jump to file, which is a very useful workaround if the file is from one of the sources it supports, although it is a workaround rather than a proper fix. —CalendulaAsteraceae (talk • contribs) 02:00, 3 July 2024 (UTC)
 * Let's do a little math…The file as uploaded has 1796 pages and is 194.31 MB, which works out to about 110 kB per page. A modern smartphone photo averages about 6 MB, which means each page image here is somehow $1/55$ smaller than the photos your iPhone makes. How is that possible given bulk book-scanning rigs are literally a DSLR mounted over a plate with some lights and other gizmos? Well, if you go look at the raw scan images at IA you'll find they add up to 2.7 GB, and even the cropped and colour-corrected images are 1.3 GB. But surely that's non-compressed images? Oh no, these are JPEG 2000 (.jp2) wavelet-compressed images (i.e. the 1.3 GB is already compressed size), which works out to about 760 kB per page. This means the PDF that IA produces takes already compressed images and then compresses them a further 7x.Then we get to the images on Commons. When a page image is requested, MediaWiki essentially uses Ghostscript to "print" that page out of the PDF and into a JPEG file. It does this by extracting the image data out of the glorified Postscript format (PDF is just PS with some sugar on top) into its own internal raster representation and then serializing that into the requested file format, in this case JPEG, including (lossy) compression. Proofread Page always requests thumbnails that are 1024 pixels wide (height is set automatically to preserve the original aspect ratio), which means that for page images that were originally less than 1024 pixels wide the extracted image is then decompressed, scaled up to 1024 pixels wide, and then recompressed before being sent to the web browser as a JPEG. Now the OpenSeadragon embedded in Proofread Page takes over and crams that image into the Page:-namespace viewer (OSD requests 1.5x, 2x, and 3x assets too, which complicates this a bit, but lets simplify for illustration purposes). This multiply-rescaled and recompressed image data is then what OSD and the web browser zooms in and out of and which you're trying to proofread from. That is, you're looking at an image that has been lossily recompressed at least 3 times and upscaled beyond what image data was there to begin with twice.So why is DjVu better? Well, the purely technical advantage isn't all that huge, but as it happens IA over-compresses their PDF files (they deliberately use very aggressive compression settings when making the PDF in order to achieve a small file size). When I make a DjVu file I grab the original scan images (the 1.3 GB zip), extract and convert the JPEG 2000 files to PPM (lossless), and then directly convert them to DjVu with moderate compression settings. That saves one recompression, and the compression settings are a lot less lossy. In addition, the DjVu compression algorithm (also a wavelet-based algorithm), designed specifically for scanned text (vs. JPEG that was designed for general photos), does a lot better at preserving original image data (it's a lot less lossy for this case). And finally, instead of the awful Ghostscript-based method for PDFs, MediaWiki uses the native DjVuLibre tools to extract a single page image, and it does a much better job at extracting the page image. MediaWiki (Thumbor) still rescales the resulting image based on Proofread Pages request, but since the starting image is of much higher quality with fewer compression artefacts, the resulting output is usually also much better. There are pathological cases where the result is bad, but these are extremely rare (usually from some random web service that converts the IA PDF into DjVu, achieving only making things worse).So… When I say I strongly recommend using DjVu whenever possible I really mean "Come on people, why would you ever use the IA PDF?!?! Get with the program and use DjVu because even if you have to bend over backwards and jump through hoops to get that DjVu it's still going to be better!" And it's why I have an open invitation to anyone to ask me to make DjVu files for them, that I try to prioritise as much as I can (which isn't very just now, but…). There are issues with lack of user-friendly end-user tools for DjVu (i.e. you can't view DjVu inline in web browsers any more), and there are big questions about the long-term viability of the format (there's no commercial backing and no significant community around it), but it is still a much much better choice than the current state of PDF and PDF tooling. Longer term (much longer) the new target is probably support for "Collections" on Commons so that we can upload the original JPEG 2000 scans (zero loss) but still get an atomic pseudo-"file" for Proofread Page to work on. But given the pace of development and lack of resources the WMF assigns both Commons and Wikisource this is still a long way in the future so we still need the lesser evil in the mean time. Xover (talk) 08:40, 3 July 2024 (UTC)
 * Thanks for the detailed info! I knew that IA highly compresses the PDF files, but since I am able to see the page clearly in a PDF viewer I would not have expected that to be the issue. In fact, DjVu vs. PDF claims that PDF has a higher resolution. Most discussions I have seen (here on enWS, and also on commons) seem to take the view that there is no longer any reason to use DJVU ...
 * Perhaps I'll need to update DjVu vs. PDF with some additional reasons why DJVU should be preferred where possible :D —Beleg Tâl (talk) 13:21, 3 July 2024 (UTC)
 * Oh, and the "Scan resolution in edit mode" option in Index: pages… It's been a long time since I dug into what that actually did, so I'm very vague on the details, but as I recall its effect was essentially about how big to display the image in the web browser but the image generated was exactly the same. I.e. it's a kind of hard-to-use zoom that's been obsolete for years. I could be wrong, but my conclusion at the time was that the option was useless. Xover (talk) 08:45, 3 July 2024 (UTC)

Brackets for vocal + piano scores
I'm transcribing a score that follows the common pattern of one line of vocal music together with a treble and bass piano part, the piano parts marked with a curly bracket. At the moment, on the pages I've transcribed (2 and 3), the vocal line is also included in the bracket. Could someone fix this? —CalendulaAsteraceae (talk • contribs) 18:41, 4 July 2024 (UTC)
 * You've got all three Staffs inside the PianoStaff. You need to nest them like this:

&lt;&lt; Staff PianoStaff << Staff Staff >> >>
 * —Beleg Tâl (talk) 22:24, 4 July 2024 (UTC)
 * Great, thanks! —CalendulaAsteraceae (talk • contribs) 23:00, 4 July 2024 (UTC)

Unfamiliar chord notations
Page:Hello Hello Who's Your Lady Friend.pdf/4 has what appear to be chords, but I'm not familiar with the notation, and I'd appreciate help from someone who is. (I expect once I've seen an example I'll be able to do subsequent pages myself.) —CalendulaAsteraceae (talk • contribs) 18:31, 6 July 2024 (UTC)


 * Hi, this is sol-fa notation, which is used as an alternate way of representing the melody for those who don't read graphical music. d=doh or the tonic; r=re (supertonic); m=mi (mediant); &c. The lines and colons indicate how long to hold the note. There's currently no satisfactory way of representing this in Lilypond and I have quietly ignored such in the transcriptions I've done. Beeswaxcandle (talk) 02:43, 7 July 2024 (UTC)
 * Cool, thanks! —CalendulaAsteraceae (talk • contribs) 13:54, 7 July 2024 (UTC)
 * Follow-up question, what version of LilyPond are we on and does it support repeats with alternate endings? This is for Page:Hello Hello Who's Your Lady Friend.pdf/7. —CalendulaAsteraceae (talk • contribs) 02:14, 8 July 2024 (UTC)
 * We're on 2.22.0. And yes, repeats with alternate endings are supported. Beeswaxcandle (talk) 06:59, 8 July 2024 (UTC)

Footnote query
I can't figure out how to address the following footnote variation in Index:The Remains of Hesiod the Ascraean, including the Shield of Hercules - Elton (1815).djvu. On page 129 there is a footnote beginning Tis time to sow. This footnote continues onto the following page 130, where there is a footnote within the footnote. I have dealt with such footnotes before where both the main footnote and the sub footnote are on the same page but I can't figure this out. I have tried using the new-ish but it doesn't seem to support the use of 'name' and 'follow', unlike, which are needed to cover the spread of the main note over two pages. Any suggestions? Chrisguise (talk) 21:51, 6 July 2024 (UTC)


 * @Chrisguise The documentation of refn may not look like it supports name and follow, but clicking edit on the template (without touching anything) seems to indicate that name and follow are parameters. I have attempted to use on said pages (and 131 for the additional follow on), and checking your transclusion, it looks like it is working. Thanks for all your efforts, and sorry for Wikisource's (lack of) documentation. Regards, TeysaKarlov (talk) 23:37, 6 July 2024 (UTC)
 * Thanks for that. I did try refn with 'name' and 'follow' and couldn't get it to work. I guess I must have made a mistake somewhere. Thanks again. Chrisguise (talk) 07:43, 7 July 2024 (UTC)

Page missing in Index:Comparative Grammar of the Sanskrit, Zend, Greek, Latin, Lithuanian, Gothic, German and Slavonic languages (Bopp 1885).pdf
The first page of the Preface to the Second Edition (two pages below the title page) is missing in the index, but present in the Google Books scan. Can someone who knows add it? I assume, also, that the pages following it in the Index should be moved +1, but I do not know how to do this. Mårtensås (talk) 22:36, 9 July 2024 (UTC)

How to deal with poetry.
I'm trying to proofread the 11th page of the 1st issue of Punch, which has a poem about some actors who got caught drinking. There are lines that have multiple spaces in front of them in-poem, and I was wondering how to best represent that in-text. CitationsFreak (talk) 06:31, 13 July 2024 (UTC)


 * For the record, the way they say (as per Help:Poetry) is with Ppoem. CitationsFreak (talk) 06:42, 13 July 2024 (UTC)
 * Yes, you should use ppoem and put :'s at the beginning of lines, one per em. — Alien333 (what I did &amp; why I did it wrong) 13:01, 13 July 2024 (UTC)
 * Also:
 * for use ld
 * for (with varying parameters) use ***
 * for stanza breaks just leave a blank line
 * for italics use  rather than
 * — Alien333 (what I did &amp; why I did it wrong) 13:24, 13 July 2024 (UTC)

Index:World Fiction 1922–1923.djvu
Any idea what might be wrong with Index:World Fiction 1922–1923.djvu? Why is there "Error: Invalid interval" instead of the pagelist and why there is no thumbnail? -- Jan Kameníček (talk) 20:58, 13 July 2024 (UTC)
 * Jan Kameníček: I cleared the caches; does it work now? TE(æ)A,ea. (talk) 21:20, 13 July 2024 (UTC)
 * Yes, it does, thanks! I also used all "Purge", "Hard purge" and "Null edit" buttons before, but none of it helped. Did you do the same or did you do anything else? --Jan Kameníček (talk) 21:30, 13 July 2024 (UTC)
 * Jan Kameníček: I cleared the cache for the file on Commons, and then for the index here. It is usual for files to not render properly on Commons; a cache purge generally fixes the problem. TE(æ)A,ea. (talk) 22:09, 13 July 2024 (UTC)

User:ShakespeareFan00/SPC_Colors
Hi.. I found something on Hathi - https://babel.hathitrust.org/cgi/pt?id=osu.32435075986307

So I was using some rather recent (as of 2023) CSS support, to make the color tables. However, doing this manually seems to be a waste of time. Is there someone versed in Lua that can come up with a way to generate this programitcally. Also having a Lua script that can convert Munsell Colors back to sRGB would be nice. ShakespeareFan00 (talk) 11:26, 14 July 2024 (UTC)
 * Also this is a mid 1970's "spec" - Was this updated? ShakespeareFan00 (talk) 11:29, 14 July 2024 (UTC)
 * This would be possible to do with a module to support the CYMK values... However, it needs someone to write a module to do roughly...
 * Normalise the ratios of CYMK amounts to percentages.
 * For the Yellow, write
 * calculate the remaining % to 100% and store it as x0
 * For the Magenta scale m value by x0 Call it z1, and work out the remaining amount to 100%, Store that as x1...
 * Append to the above
 * Recurse Steps 3-4, for Red Brown, Cyan, and Key if values present, using z2,z3,..zn x2,x3,...xn etc.
 * On the last pair of values scale them both by the xn value reached,
 * write
 * This should write a series of nested color-mix functions that mix a pusedo CYMK color.  The same approach could be used to make an n-tuple color-mix function that could be invoked from elsewhere, like the Pages of "The Color Painter" for example. ShakespeareFan00 (talk) 22:50, 19 July 2024 (UTC)
 * I have no idea what it is you want me to do here. Convert from CMYK to RGB? Xover (talk) 06:28, 20 July 2024 (UTC)
 * (sigh) - Essentially, I was wanting to reconstruct the charts in the PDF linked.  It gives various CYMK values (as percentages), and the base colors for the CYMK (also a Red Brown it uses additionaly.) as xyz or xyY values.  The template I had written was a first attempt at doing the conversion.  However I was subsequently advised (off-wiki) that I needed to scale values in the nested color-mix statements, which gets complicated in pure wikitext. I was therefore asking if someone was able to produce a Lua module to generate the nested color-mix functions, doing the scaling as values as needed.  The list of steps above was my attempt to explain what the module needed to do.  ShakespeareFan00 (talk) 07:38, 20 July 2024 (UTC)