User:SnowyCinema/QT.py/Workflow


 * How can we keep track of how much time it took for each stage of the process below? Is there a timer we can build?

Preparation

 * How can we create or modify author pages?


 * How can we create or connect a Wikidata item for that author?


 * How can we create and maintain author disambiguation pages?
 * What if it needs to be in the mainspace? See also general disambig section


 * How can we find the best and most original scan of the work?


 * How can we create a versions page if necessary linking to other versions of the work?


 * How can we download the scan of the work?
 * What if it's from HathiTrust?
 * What if it's from Google Books?
 * What if it's from another site?


 * How can we upload the scan to Wikimedia Commons?


 * How can we crop, process, and save all images in the work?

Transcription

 * How can we take the OCR and initially correct likely scannos?
 * What if it was from Gutenberg? What corrections need to be made if so?


 * If a certain scanno etc. needs mass-fixing throughout the text, how can this be applied quickly?
 * QT markup needs to be separated out first, see below


 * On that note, how can QT markup be explicitly identified by the parser, and be separated from the rest of the text if need be?


 * How can we quickly and at least almost accurately split the OCR content page by page before beginning transcription?
 * What if the text layer was from Gutenberg and pages are listed?
 * What if it's from Gutenberg and they aren't listed, so a match and split method needs to be employed?
 * What if it's from IA?
 * What if it's from Google Books itself (highly not recommended)?


 * How can we properly implement a page list?
 * The system basically
 * So make a subpage of documentation on that


 * How can we deal with running header and footer automation? How can we set up rules for that?


 * How can we get a quick MediaWiki preview of what a page, chapter, or maybe the entire work will look like when transcluded?
 * Jump to parts of the verification process with QT parsing errors?

Verification

 * How can we identify and fix hyphenation inconsistencies?


 * How can we identify and correct very likely scannos in order?


 * How can we keep track of different types of scannos, how likely it is that they are transcription errors, and report on them in the long run?


 * How can we identify and fix QT parsing errors?


 * How can we identify likely author names and work names used in the transcription for linking?


 * How can we pull and use existing data from works by the same author to use for potential finding of hyphenation inconsistencies with those works only?

Transclusion

 * How can we upload our work images to Commons?
 * Use descriptive file names based on labels given within the work
 * Or DEFAULT file name is  + iterative number, if no label is given
 * Automatically insert description as "This image was cropped from  () by ."


 * How can we insert the correct information into the Index page?


 * How can we copy the correct content into the Page namespace?


 * How can we transclude the content into the mainspace?
 * Depends somewhat on the type of work...


 * How can we create and manage disambiguation pages and redirects?


 * How can we create a Wikidata item for both WORK and VERSION?
 * How can we connect those Wikidata items to the correct pages?
 * How can we identify possible duplicate Wikidata items based on similar titles and similar entered information, and what should we do when that is found?


 * How can we add a list of poems/short stories/articles as works from a collection to an author page?


 * How can we be notified from afar if there is an obvious error that is thrown with the transclusion?
 * Maybe a Discord bot?

Review

 * How can we quickly review all automated transclusion edits, especially the ones that could be wrong?
 * Review by task # with a WS gadget


 * How can we identify if the Wikidata item has a Wikipedia page to give our new Wikisource transcription free advertising?


 * How can we even quicken the process of writing into New texts? (IT IS AGAINST THE [UNSAID] RULES TO USE A BOT TO ADD TO NEW TEXTS DIRECTLY, but you can autogenerate something and add that text in manually with your main account)
 * Easy peasy.

Retrospection

 * How and where can we properly log the actions performed during this job, when and where they were done, and what the revision code is?


 * How and where can data on the transcription, not the bot task, be logged?
 * That's separate from the job, because the job will be an attempt which could have to be aborted.
 * A transcription could also be postponed or aborted.
 * Data on the work itself but also STATISTICS on how the transcription was done (time it took, time each element took, page-by-page estimation, length of each page, number of pages, etc.) that can be reviewed to see general and specific performance
 * Pages proofread in what time compared to the amount of change you had to make to the originally barely past OCR'd text


 * How can we find and add probable scannos collected during this session to the primary collection list (see Transcription)?


 * What else can we learn from this transcription project? How else could the software or workflow of the QT system be improved?
 * Are there any scannos not commonly occurring, or that cause other problems, that might need to be removed from the list?


 * Is there any way this transcription experience could help build better QT documentation?