Wikisource:Proposed deletions/Archives/2013-06

=Kept=

The Screaming
=Deleted=

Book of Common Prayer chapters

 * Note that the title page and about 5 sections remain, The 5 sections contain content were not specifically included in deletion discussion. Jeepday (talk) 11:09, 6 May 2013 (UTC)

Pages in category "Index - Text Layer Requested"
{{closed|Delete, no objections to recreation with quality scans. |text= The following 47 pages are in this category, out of 47 total.

OCR failure details
I've gone 12 rounds with all 47 source files for the above listed Indexes in hopes of embedding a text-layer for each by one way or another. Sadly, none of the source files are able to be processed by the free OCR routines out there for one of two reasons (if not both)....


 * 1) The original conversion from whatever type of parent file the work was originally secured in to DjVu collapsed all the layers (foreground, background & hidden) into a single layer. In short, this "confuses" the OCR engine by appearing to be a single image per page rather than dozens & dozens of black words (text-foreground) per page (colored background). Poor or outdated sourcing prevents the [re]processing of the original file into a workable DjVu at this point in time; and/or
 * 2) The original conversion from whatever type of parent file the work was originally secured in to DjVu was made with a DPI (dots per inch) setting too low (btwn ~72 thru ~151, even sometimes @ 200) for the free-online OCR services to be able to recognize any text characters at all. Poor or outdated sourcing prevents the [re]processing of the original file into a workable DjVu at this point in time as well.

Because these Indexes are based in what amounts to flawed source files as well as the fact most are 3-columns of text or more per page to begin with, the likelihood anybody is going to transcribe them by eye is remote. In addition, if any pages normally found under any of the listed Index: pages were to be created in their respective Page: namespaces, the content would come up as "blank" & the Bot maintenance script used to detect and mark such pages as either missing images or as without text would status all those pages as "without text" in error.

I propose the deletion of the listed Indexes (and any Pages that may have already been created under them) for the reasons given. -- George Orwell III (talk) 05:10, 6 April 2013 (UTC)


 * I have no issue with most of these nominated for deletion, and agree with the assessment, given the problems. However, is there no means of salvaging volume 2 of Gibbon's Decline and Fall of the Roman Empire?  Deleting the Index would not solve any problem, and would leave a gap in this seven-volume work.  I'd prefer to see someone locate a better copy and see if we can upload over the current problematic copy at Commons. If the DjVu file is replaced, then the Index can be kept and adjusted. --EncycloPetey (talk) 05:16, 6 April 2013 (UTC)
 * Yeah that one and the three on Nullification are a sore spot here too. You'd have to ask the mad scientist that took the original 150 DPI, 13M DjVu upload to a 600 DPI 86M Djvu overwrite on why a text-layer was not generated at that point in time as well. In 2009, the original DjVu with "tan" coloring from 2007 was overwritten  so the "good" parent PDF converted at the "bad" dpi to DjVu is gone (The current DjVu seems to have been re-generated from the "flattened" 2007 original PDF but I can't say that is for sure). The pointer to Google Books is still valid but starting around page 400, every other page is now clipped in half or worse. I'm open to solutions & working with others to resolve the issue rather than delete the Index: but the clock is still ticking until then. -- George Orwell III (talk) 05:37, 6 April 2013 (UTC)
 * Problem averted - I restored the original DjVu and now there is a (poor) text-layer generated by a DjVu created at 150 Dpi. If and when somebody wants to work the Index:, we can re-visit what to do about getting a higher Dpi and a superior text-layer at the same time then. -- George Orwell III (talk) 06:55, 6 April 2013 (UTC)

As uploader of the Gazettes I do not object to them being deleted. I concede that proofing is not going to happen if OCR can't be provided. Hesperian 12:33, 6 April 2013 (UTC)

}}
 * Keep just because there is no next layer is no reason to delete. The images are at Commons, shouldn't be deleted and any person may want an article and come and to type it. PrP still works for validation. They are not taking up space, and they still have value. — billinghurst  sDrewth  12:22, 14 April 2013 (UTC)
 * Errr... maybe the nuances here have not been entirely digested (i.e. the title of this section Pages in category "Index - Text Layer Requested" for starters). I totally agree with the statement just because there is no next layer is no reason to delete and all the rest... Provided, the work(s) remain properly status-ed on their Index: pages to indicate that Proofreading may in fact begin. The issue here is either your next potential contributor has already come along and returned the status back to "Needs OCR" after PR'ing 2 or 3 pages the hard way from scratch -or- the maintenance script(s) have poked a Page creation or two under the Index:, detected no text-layer dump taking place in the edit box and automatically changed the status of the Index: to "Needs OCR" if wasn't set to that already.  I've tried to get these files to OCR but there are structural flaws (basically the dpi's are too low) within the source files themselves that prevent the current day [free] OCR routines from properly recognizing any meaningful text (only chicken scratch if anything at all). Instead of continuing with the circular logic of "lets keep", followed by, "oh wait - they're bad", soon after, -or- running with the underhanded practice of generating phony, pasted or chicken scratched text-layers like I've seen done in a few instances to date, I brought the issue here to be resolved once and for all, one way or the other (not both).  If you are willing to change all these Indexes: to "ready to proofread", make note of the inability of the current source files that these Indexes are generated from to produce workable text-layers, and then lock & monitor those Indexes so nobody changes the status levels back again - I'd have No problem with "keeping" them around for as long as it takes to fully proofread each & every single one of them.  Otherwise, I see no justification to continue to keep playing grab-ass with the proofreading status levels - . -- George Orwell III (talk) 14:23, 14 April 2013 (UTC)

Muktigāthā mahāmānavācī: pūrṇayogī Aravinda: jīvana āṇi tattvajñāna.
=Other=