User talk:Mpaa/Archives/2016

Re: Scriptorium
Hello. Are you still interested in this topic? I ask as an interested party myself; and of course the Scriptorium entry is shortly destined for archive oblivion. I did initiate some experiments (of a variant upon your scheme) with the assistance of Ineuw but held off in light of George Orwell III revealing this is but the resurrection of an older abandoned scheme. Even if this devolves into a project of interest to only a few individuals I am happy to advertise the result; but then again if your advice is to "let it die" I'll just keep it to myself. Please pardon the name-dropping above but it does serve to set context (especially after Wikisource-bot wields its specialised memory-axe!) In short, what do you advise? AuFCL (talk) 21:42, 17 January 2016 (UTC)
 * Yes, I am still interested. IMO it should be a gadget tat upon a preview action highlights the "usual suspects" and on a save action, asks for a confirmation if there are "suspects". Even better if "suspects" could be selected/loaded per user (or also work would be nice).
 * As I see it, in a project like PSM or POTM this would be of great help. I scanned PSM and found a lot of minor mistakes undetected that could have been intercepted by this.
 * For expert proofreaders with their own set of tools, this might be superfluous, for beginners it might be an help.— Mpaa (talk) 20:38, 18 January 2016 (UTC)
 * Thank you for replying. Well I have an 80% solution (and yes I am well aware that completing the final 20% of any project takes 200% of the time or thereabouts!)
 * Before proceeding I hope I am repeating the obvious when I caution you against executing un-trusted code with bureaucrat privileges. I have done my best to write safe code but lay no claim to being a security expert.
 * So in case you take the prudent path, but still want to see what this thing actually does, installing User:AuFCL/common.js/typoscan.js and then opening Page:Folk-lore - A Quarterly Review. Volume 10, 1899.djvu/219 yields File:St George screen capture.png.
 * Now for the hurdles/drawbacks etc.:
 * It performs this mark-up at all times except for an embarrassingly long and apparently ever-growing number of exception cases, instead of only when "Save" is selected upon "Preview". Implementing this would solve a lot of problems but introduces ones when false-positives are flagged and/or the user wants to force a save in any case.
 * I don't know the jQuery library well and as a consequence I suspect I have reinvented a lot of wheels a purist might improve mightily upon.
 * Although I have attempted to make this as gadget-ready as I am able to it is clearly not there yet and needs to be formally (somehow?) split three ways:
 * Stable code
 * Shared scan-set
 * User-customisable scan set
 * The actual highlighting code might be done better using an entirely different technique.
 * Some means of gracefully handling "false positives" needs to be added (entirely missing here.)
 * So far as possible the temptation to turn this into a spell-checker needs to be resisted as that is a function better handled by other utilities.
 * In short this is only really a proof of concept, and I am not pretending it is in any way of means ready for prime-time. At best it is a stop-gap until another alternative steps forward. AuFCL (talk) 03:00, 19 January 2016 (UTC)
 * Hi. At least someone gave it a try ... :-) I hope that this could be useful and someone will "productify" it. I will try it and give you feedbacks. Right now actually I do most of my checks using pywikibot. An idea could be to advertise it through POTM.
 * As I said, I do not have enough skills in this area to attempt to carry on the development, otherwise I would be glad to help you.— Mpaa (talk) 21:16, 19 January 2016 (UTC)
 * You got it in one. I lack the skills to do much better than this but at least it is as you say "a try" and I sincerely hope it might prompt somebody with better skills than my own to improve/supplant it. And at least doing this much has made me better appreciate features that a fuller solution might involve. AuFCL (talk) 06:01, 20 January 2016 (UTC)

Typo corrections
Hi and thanks for all your corrections as you hover over me like the angel of typos. I recommend that you install AuFCL's script because it makes noticing typos much easier. This is my version, and it is installed as a subfolder of common.js for the time being, until it's debugged. It is called by the following code placed in the common.js:

// mw.loader.load('//en.wikisource.org/w/index.php?title=User:Mpaa/common.js/typoscan.js&amp;amp;action=raw&amp;amp;ctype=text/javascript'); //

My version distinguishes [] square braces from {} curly braces by different color highlights. The procedure highlights typos in both the page, and the main namespace where I use it. In the main ns, it's easier to scroll through an article. When a typo is found, I open the page where its also highlighted. The script highlights most, if not all typos, but spell check is another matter. The most common spelling errors caused by poor scanning are already flagged by the scripts. But there are other words which the scan alters the meaning, but it is still a valid English word, and these are problematic but few and far in between.

Finally, I found that my most common errors overlooked, are the curly brace "{" instead of a parenthesis "(", followed by the caret "^". Volumes 1 to 5 were riddled with typos and errors, but this diminished gradually to almost none. — Ineuw talk 03:27, 1 February 2016 (UTC)
 * Thanks. I prefer to use pywikibot directly. It also open a page in the browser in case of need. I am scanning PSM against most commons errors. E.g. one of the reason why "{" instead of a parenthesis "(" are diminishing might be that I scanned against them :-) ? (see https://en.wikisource.org/wiki/Special:Contributions/MpaaBot). I noticed that the typo are very "volume-dependent". Now I am addressing ^.--Mpaa (talk) 19:38, 1 February 2016 (UTC)
 * WOW! At least you can say that I am consistent in my errors and their oversight. — Ineuw talk 18:26, 2 February 2016 (UTC)

... and ellipsis
Hi. I see, after doing some replacements, that your bot was putting in Template:... rather than &hellip; I am not sure whether there was a request for that, or what, as I wasn't aware that is was a practice that we were looking to undertake a stylised look-alike, rather than utilise the respective character. — billinghurst  sDrewth  04:41, 15 February 2016 (UTC)
 * I didn't start to use it, but after it became the de-facto standard for this work, I took care to align everything with it. See Index_talk:Hunger_(Hamsun).djvu.— Mpaa (talk) 18:50, 15 February 2016 (UTC)
 * I can align to … if it is preferred.— Mpaa (talk) 20:07, 17 February 2016 (UTC)

Typos of PSM
Greetings and salutations, or in another word, Hi.

Thanks for checking and correcting typos in my favourite project. Would you consider to check some articles randomly in volume 44, and let me know your findings? I ask because it would help me to further analyze AuFCL's script, and ask him to make modifications if need be. I recently proofread and scanned V44 for typos, before listing it for validation.

Once AuFCL's script was running, I undertook to check every volume (currently checking volume 16), from the main namespace because it's easy to scroll through the articles. The results (to me) were interesting because it made me realize how many typos the proofreaders (meaning me), and the validators overlooked. So far I completed typo check of volumes 1 to 15, but spell check is another matter. Did fix a number of spelling errors, seen while fixing typos, but I am sure that there are many more. — Ineuw talk 20:57, 14 February 2016 (UTC)


 * I wouldn't mind to help. If you have a list of the most common errors, or regexes to detect them, send it to me. It would me very quick for me to scan against it.— Mpaa (talk) 21:14, 18 February 2016 (UTC)


 * My sincere apologies for the confusion and misunderstanding I caused. I am doing fine with AuFCL't tool and don't need help. Would never ask anyone to undertake such a task, since I feel that it's my "responsibility." I meant, whether you would be inclined to randomly check PSM V44 with your pywiki tools to see if I missed anything, and whether there is room for improvement. — Ineuw talk 03:42, 19 February 2016 (UTC)
 * Without knowing what to look for, pywiki tools are lost ... they need hints or clues.— Mpaa (talk) 17:47, 19 February 2016 (UTC)

Author:Thomas Adams
will be likely to stand for w:Thomas Adam and therefore should loose some "s"? -- Gymel (talk) 15:19, 8 March 2016 (UTC)
 * Good catch! There are a few "typos" in the printed book itself, one of which is this, as well as alphabetic order of names. This definitely should be fixed. (Missed the fact that there wasn't an "s". Sorry.) Humbug26 (talk) 16:38, 8 March 2016 (UTC)

Need help in Bengali Wikisource
Hi I need help for Bengali Wikisource. We had one proofread completed Index file with 2 missing page (200 page book). Now We found the scan book with full page and uploaded to commons. The missing pages were 12 and 13. So now total pages are 202. Now I need to shift/move 182 page to proper page. ( as like 200-->202, 199-->201, 198-->200 page.) Are there any pywiki script/tool? Because from the history I have seen that similar kind of work you have done here.Thanks in advance for help.Jayantanth (talk) 17:07, 8 March 2016 (UTC)
 * Moving the pages in Page ns is not an issue. I need the Index filename and the definition of what need to move where. Better to uses pages referred to the DjVu file, not the book page numbering. So something like: Dnnn-Dmmm -> Dnnn+2-Dmmm+2. The tricky part is if you have already transcluded the work, I need to revamp an old script of mine there.— Mpaa (talk) 19:42, 8 March 2016 (UTC)
 * Thanks Mpaa, for reply. I shall share the all problematic file. I just wanted to know how do you do that. Are there any AWB custom module or script? If have, could you please share this? Jayantanth (talk) 16:33, 9 March 2016 (UTC)
 * I use pywikibot scripts.— Mpaa (talk) 18:31, 9 March 2016 (UTC)

Bot is breaking index pages
Oh dear. Your bot is breaking pages. Hesperian 02:05, 24 March 2016 (UTC)
 * Any page where the table of contents field begins with a table is now broken, because the brace that initiates a table must be at the start of the line. Hesperian 02:11, 24 March 2016 (UTC)
 * Fixed them (hopefully). I think they were in the order of 15-20. Thanks for spotting that. Let me know in case I did not find all of them— Mpaa (talk) 19:14, 24 March 2016 (UTC)
 * Hi. This may or not relate to "breaking" but I have these three pages which show up on the index page as not proofread. Would you please tell me how to correct them (teach me how to fish)?


 * Page:Popular Science Monthly Volume 45.djvu/800


 * Page:Popular Science Monthly Volume 45.djvu/823


 * Page:Popular Science Monthly Volume 45.djvu/827


 * Thanks. — Ineuw talk 08:41, 25 March 2016 (UTC)
 * They are fine for me. Maybe usual caching ... (purge, null edit, etc. already tried I guess ...?).— Mpaa (talk) 16:58, 25 March 2016 (UTC)


 * Did it all, of course by this hour it's OK. Have you purged it today? Or, perhaps Mediawiki purged the cache?. — Ineuw talk 00:35, 26 March 2016 (UTC)


 * Might I venture a theory? By modifying this it is possible you loaded up the job queue with requests (i.e. for each and every of the roughly 10,000 pages affected) which had to be processed before your purge was able to be acted upon? AuFCL (talk) 02:11, 26 March 2016 (UTC)


 * Moved topic to my talk page. — Ineuw talk 03:56, 26 March 2016 (UTC)

Maybe I'm misunderstanding this, but does https://en.wikisource.org/w/index.php?title=Index:1917_Dubliners_by_James_Joyce.djvu&curid=1173751&diff=6163499&oldid=6055949 look right to you? Dubliners was transcluded, I think. Outlier59 (talk) 23:44, 25 March 2016 (UTC)
 * Page:1917 Dubliners by James Joyce.djvu/5 was not marked with Category: Not transcluded when the bot made the edit, AuFCL marked it after the bot edit.— Mpaa (talk) 07:58, 26 March 2016 (UTC)
 * Pardon if I only confused matters further. I was unsure whether this was the correct thing to do; let alone retrospectively after MpaaBot had made its pass. The checker display does not appear to change, so I am not sure how useful an activity this is at this point in time in any case. AuFCL (talk) 08:07, 26 March 2016 (UTC)
 * You made the right tagging. The idea is to check that everything that needs to be transcluded is transcluded, and what is not, is done on purpose (this is expressed by assigning the page to 'Not transcluded').— Mpaa (talk) 08:10, 26 March 2016 (UTC)
 * Actually, even if it were, I am breaking down the work in small steps, and for now I only accept 'Without text' pages as acceptable Not trascluded pages. These cases will be handled later on. Feel free to mark it as transcluded=yes.— Mpaa (talk) 07:58, 26 March 2016 (UTC)

Filling Pages with OCR text
Hi. You have a script that might be able to help with something I posted on the help section of Scriptorium?.

Much appreciated if you took a look. ShakespeareFan00 (talk) 17:38, 21 April 2016 (UTC)
 * I am afraid my script cannot help. It just fetch the text 'as is' from the pdf file.— Mpaa (talk) 18:30, 21 April 2016 (UTC)
 * OK do you know of a different script that may help? I've not got very far so I don't mind loosing the few odd pages I've done if you are able to extract direct from the PDF. ShakespeareFan00 (talk) 18:32, 21 April 2016 (UTC)
 * No, what I can do is just save the page with the same text you find when clicking on the redlink.— Mpaa (talk) 20:29, 21 April 2016 (UTC)

Index transcluded tags
Hi, I noticed that the MpaaBot inserted the tag HERE and maybe on others. FYI, all PSM indexes between volumes 1 to 87 have been transcluded to the main namespace. Just thought to bring to your attention. — Ineuw talk 05:40, 4 May 2016 (UTC)
 * I asked for this to be done as I am slowly working through the validated and proofread works to check the transclusion status. For all sorts of works there have been errors and omissions in transclusions, and PSM has been better though not perfect in that regard. It is a slower maintenance task as we identify complete works that have not been transcluded, or missing pages, or purposefully marking pages that will not be transcluded. — billinghurst  sDrewth  07:11, 4 May 2016 (UTC)
 * And technically, that is true after the checks that were done by the recent run as MpaaBot. This page Page:Popular Science Monthly Volume 40.djvu/742 is not transcluded (or marked as Not Trancluded), so the tag is correct.— Mpaa (talk) 18:12, 4 May 2016 (UTC)


 * Knowledge liberates, so thanks for the explanation. Should have known that there is a reason. :-) — Ineuw talk 06:31, 5 May 2016 (UTC)


 * Found where two images were supposed to be, and inserted them in their respective main namespace articles. When the tagging of PSM is completed, please let me know and I will deal with them.
 * Up to vol 87 is OK. The only miing is Index:Popular Science Monthly Volume 26.djvu, due to the TOC, which I didn't know how to handle.— Mpaa (talk) 18:29, 5 May 2016 (UTC)


 * Sorry, but I don't understand the TOC problem. Every Volume is laid out the same. However, they are inverse transclusions — that is, they are defined in the Main namespace and then referred to by the Index pages. I also compared Index 26 to other Index pages and they are lad out identically. — Ineuw talk 21:52, 5 May 2016 (UTC)
 * See at the end of the index.— Mpaa (talk) 06:06, 6 May 2016 (UTC)


 * I marked them as without text, hoping this satisfies you. If I was wrong, so be it, and anyone who wishes to proofread them, they are welcome to do so. At the beginning of this undertaking, I searched for TOC's, and found some volumes which had a page or two, but most none, so I designed my own TOC's. I wasn't going to duplicate them. Currently, I have another stored Vol 26 .JP2 copy downloaded from IA, which has no advertisements and no TOC's. About the advertisements, in later volumes one can find hundreds of duplicated ads. Those that were possible to clean up I cleaned and uploaded them. — Ineuw talk 19:55, 6 May 2016 (UTC)
 * Is this still needed: Popular Science Monthly/Volume 26/Advertisements? — Mpaa (talk) 10:49, 7 May 2016 (UTC)

Mass populate
I note pp. pp. 315-334 of Index:UK Traffic Signs Manual - Chapter 8 - Part 1 (Traffic Safety Measures and Signs for Road). Designs 2009.pdf seem to be the same index as pp, 210-299 of Index:UK Traffic Signs Manual - Chapter 8 - Part 2- Traffic Safety Measures and Signs for Road Works and Temporary Situations) - Operations 2009.pdf

I.e : Page:UK Traffic Signs Manual - Chapter 8 - Part 1 (Traffic Safety Measures and Signs for Road). Designs 2009.pdf/315 is identical to (expect page numbers}}

Page:UK Traffic Signs Manual - Chapter 8 - Part 2- Traffic Safety Measures and Signs for Road Works and Temporary Situations) - Operations 2009.pdf/212 and so on from there...

Any chance of a semi-automated populate from the former to the latter? Thanks ShakespeareFan00 (talk) 00:09, 17 May 2016 (UTC)
 * I guess you have already done it. Or I am lost ...— Mpaa (talk) 19:21, 18 May 2016 (UTC)
 * Yes, already done. ShakespeareFan00 (talk) 18:09, 19 May 2016 (UTC)
 * However I did have another pagepopulation issue,

namely User:ShakespeareFan00/Sandbox/TSGRD2016... I'd already manually put some of the pages into the relevannt postions but it looks like ti could be done by some automated process..18:09, 19 May 2016 (UTC)

Pre-editing and the preparation of PSM pages
Hi,

Please proofread ten or so pages of PSM, which you edited/prepared for proofreading. Please select pages that contain hyphenated words and references, and then tell me if it was worth doing what you did. — Ineuw talk 20:21, 27 July 2016 (UTC)
 * Sorry, I lost you ... Did not even get if I did something good or bad ... I guess bad ... If you give some example, that might help, so I can take a look and possibly learn. And it would be fair to weight some possible unlucky cases vs. the positive ones. I am pretty sure the benefits will win.— Mpaa (talk) 21:12, 28 July 2016 (UTC)


 * Much appreciate you efforts to help with the PSM pages by cleaning the &#xFFFD; characters, by adding the page headers, and indicating smallrefs in the footer. However, merging hyphenated words at the end of a row, is forcing me to proofread according to your method which is different from the methods developed by proofreading thousands of pages, and it slows me down.


 * Merging hyphenated words is incorrect because it shifts the text of the following line and I use the original to locate words in the text, akin to an X - Y coordinate.


 * I leave merging of hyphenated words to line wrapping after proofreading. Line wrapping by my Autohotkey macro, or pathoschild's proofreading script, identify end of line hyphenation. Since some words must remain hyphenated, I add a hyphen to the beginning of the second segment before line wrapping.


 * I don't bother with the reference tags until I proofread the page sequentially line by line, and this includes the references at the end of the page. Only then are the tags applied, and moved to where they belong.


 * The placement of in the text is another time consuming and confusing issue. I have to erase the word "Footnote" and restore the * because it's easier to notice in the text. Your system identifies only the first footnote. If there is more than one, then the remainder need to be tagged which means that two different methods of identifying footnotes are used. — Ineuw talk 21:06, 29 July 2016 (UTC)


 * Then you should have said "Please proofread ten or so pages of PSM,according to my current method, and ...". How am I supposed to know what you have or will have in mind as way of working to proofread now or in the future? Some of those volumes where done time ago. I will stop doing that, but I guess there are not many volumes left untouched.— Mpaa (talk) 22:24, 29 July 2016 (UTC)


 * I can fix the . Just state what you want instead.— Mpaa (talk) 23:00, 29 July 2016 (UTC)


 * Don't waste your time, it's not worth correcting it. I wrote this in case you were planning to continue cleaning. Didn't check how far you got with the cleanup. As for proofreading some of your corrected pages, you can still do it for your future reference, as to what not to do. — Ineuw talk 04:09, 30 July 2016 (UTC)


 * Unfortunately, your script made other unacceptable errors. It deleted commas "," and periods ".", when they were before a "quotation mark". It also split end of line words which ended with "!", "?" and "'s" and forced these to the following row. I wish that you consider resetting the pages to the Thomasbot original. There are just way too many unnecessary errors to correct and are often missed. Just compare the originals to not-proofread pages. — Ineuw talk 21:42, 31 July 2016 (UTC)
 * Can you give some examples? So I can see what happened. I can restore it, it will take a while, even if I strongly discourage it as one needs to weigh pro and cons. How many errors out of how many fixes?— Mpaa (talk) 21:57, 31 July 2016 (UTC)
 * Nevermind, it does not really matter. Previous text should be there for Vol. 53. Let me know in case you want also the others back.— Mpaa (talk) 22:51, 31 July 2016 (UTC)

Rolland Life of Tolstoy ellipses
Please pardon my tardy reaction. Regarding this edit should I avoid use of &amp;hellip; altogether in this work, or is the criticism limited to that one page? (I notice the style of this publication is to use either three or four full-stops in a series and perhaps leaving them as separate characters is the safest choice in any case.) I am currently aware of four other pages I have (possibly?) incorrectly changed and would like to know your view before changing them all back to separate dots or otherwise proceeding with the other pages. AuFCL (talk) 23:49, 3 August 2016 (UTC)
 * Hi. Yes, the style is three or four dots as suspension, but do not bother. I will make an alignement pass at the end of the work. It is very fast for me.— Mpaa (talk) 06:08, 4 August 2016 (UTC)
 * I just wanted to make sure I did not make the situation worse. (I was mainly working through reducing Category:Empty ref tag and proof/validating more as a side-task, so had not picked up on the "house style.") AuFCL (talk) 06:30, 4 August 2016 (UTC) Thank you for your indulgence. I have finished fooling around with Rolland'Tolstoy per my original intent, and await your review and corrections. (This came across far more cynically than I had intended at the point of composition. No offence intended—I plead tiredness!) AuFCL (talk) 11:13, 4 August 2016 (UTC)
 * No problems, hope the missing ref tags will disappear quickly.— Mpaa (talk) 19:15, 4 August 2016 (UTC)

question about "p.m." and "p. m."
Thank you for Mpaabot. I do not understand this edit:. The source says "8 p.m." and "10 p.m." and uses small caps. Is "8 p. m." better, or more standard? It takes up more space and varies from the source. Stakes are small here and I don't really care, but if the extra space is considered better then perhaps this task and the reasoning should be listed at User:MpaaBot. Yours in perfectionist precision, -- econterms (talk) 21:16, 11 October 2016 (UTC)


 * I can't remember. Best guess I can do right now is that there were several 'flavours' across different volumes and I picked one.— Mpaa (talk) 21:25, 11 October 2016 (UTC)


 * E.g. here .— Mpaa (talk) 21:31, 11 October 2016 (UTC)


 * My apologies for butting in, but in earlier volumes, the originals and  always included the space, and was always in small caps. From the very beginning, my intent was to be consistent, regardless of the style change made by various type setters. Unfortunately I haven't recorded it in the PSM Guide.


 * Am also aware that I place the comma consistently within the template enclosure. I did this because in editing mode, it seemed to me to be floating between the words. However, it makes no difference when in read mode. — Ineuw talk 16:35, 30 October 2016 (UTC)

Font Size
As responding to what you said, I have a question. I am already half way through the text, should I mimic the formatting for the rest of the way, or should I delete the previous examples from the pages before? Thanks! -Khu'hamgaba Kitap
 * You can leave it as is. They can be quickly fixed later, once the whole work is done.— Mpaa (talk) 13:09, 30 October 2016 (UTC)

SQL statement for sale - very cheap.
Greetings. I've written this SQL statement which extracts names of the proofreaders/validators of the Book of the Month/ Project of the Month, from the Page namespace. One has to replace the monthly project name common to all pages (no page number which would extract only a single page) and terminate it with the % sign which is the MariaDB/MySQL wildcard character. Beeswaxcandle mentioned that you are working on something like this. Interestingly, now, the results are also offered formatted as a wikitable.

This example is the list of contributors of "Tom Swift and His Airship.djvu" — Ineuw talk 09:02, 7 November 2016 (UTC)


 * Hi. I made a one-time query and gave it to BWC for his convenience. I am not active on this right now.— Mpaa (talk) 18:33, 7 November 2016 (UTC)


 * Thanks, I don't know if he used it, but will leave a message to read this page. — Ineuw talk 19:05, 7 November 2016 (UTC)

Spacing around dashes in Aaron's Rod
Please don't do this. I have reverted those such edits you just made. I deliberately left those spaces in to reflect how the text was printed: note that most dashes are unspaced, but those ones are. BethNaught (talk) 23:23, 7 December 2016 (UTC)
 * Sorry.— Mpaa (talk) 18:34, 8 December 2016 (UTC)

Removing the old pages from a deleted index file
Could you delete the old pages displayed on this Index:Life in Mexico vol 1.djvu? The original source file was deleted from the commons, but the pages showing on the index are still from the old book. If you look at the publisher at the bottom of Page:Life in Mexico vol 1.djvu/5 (Chapman and Hall) and compare it to the cover on the Index itself (Charles Little and James Brown, Boston) you can see that they are different. All pages, are from the old copy and they can be all be deleted. The contents of the proofread pages were saved. — Ineuw talk 08:04, 15 December 2016 (UTC)
 * Done.— Mpaa (talk) 19:33, 15 December 2016 (UTC)
 * Much thanks. Could I have removed the pages using SQL? — Ineuw talk 22:13, 15 December 2016 (UTC)
 * Do not know but I am inclined to think not possible.— Mpaa (talk) 18:27, 16 December 2016 (UTC)