Index talk:The National Gazetteer - A Topographical Dictionary of the British Islands, Volume 1.djvu

Project notes
Please take note of the following notes and guidelines when proofreading or validating this project. —Pipian (talk) 17:36, 11 February 2017 (UTC)

Image quality
The image quality is largely workable, but is slightly out-of-focus. As a result, of this, the DJVU plugin will sometimes threshold letters in such a way that they become difficult to decipher, even at high zoom levels. You may want to keep a tab open to reference the original Internet Archive page, which is slightly easier to read at high zoom levels.

OCR quality
The OCR quality can vary drastically from page to page, likely due to the small margins and slightly out-of-focus source material. You may want to keep the button enabled for handling such pages.

Scannos to watch for

 * The original OCR tends to substitute "o" and "c" for "e" (and sometimes "s").
 * "6" is sometimes substituted for "5". Pay attention to whether the flag (the top of the digit) is curved ("6") or not ("5")
 * Watch for periods becoming commas, especially after the abbreviation "patron." (for "patronage")
 * The original OCR adds a space at the end of every line. Remember to remove these at the end of a paragraph.

Abbreviations
This work makes extensive use of abbreviations, which should be correctly annotated with the $(${abbr$)$} template. Below is a non-exhaustive list of common abbreviations and text substitutions that may speed up your proofreading if applied before starting to read/proof the text:


 * The abbreviations noted on page 1, as well as their plurals (* in the following table marks plural abbreviations not yet observed in the source text through page 5, and some notes are made to abbreviations that may need to be carefully ordered when doing automated search/replace):


 * Being a gazetteer of the British Isles, some colloquial abbreviations of county names may be employed (without periods):


 * Several assorted other abbreviations see use within the text:

The * character
As noted in the list of abbreviations on page 1, the "*" character is sometimes employed to note a living with a parsonage and glebe. This should be noted by including it in an appropriate {{abbr}} template, such as in:

{{abbr|*|, with parsonage and glebe,}}

or like:

{{abbr|vic.*|vicarage, with parsonage and glebe,}}

Entry names
Do not enter the name of the entry in all capital letters. Capitalize the entry name normally within the {{uc}} template instead:

ABBERWICK, a tnshp. in the par. of Edlingham and north div. of Coquetdale ward, union of Alnwick, in the co. of Northumberland, 3 miles W. of Alnwick. It is situated on the river Alne.
 * 1) Abberwick ##

Columns
Please mark the start of text on the second column with the comment, even if it interrupts a word. If the column breaks between words, you can place  on its own line, and Wikisource will join the two separated lines as if it was one line. For example:

Here is some text that describes a particular place in England.

Here is some text,

and here is some more text.

will format the same as:

Here is some text that describes a particular place in England.

Here is some text, and here is some more text.

Sections
As this is a gazetteer consisting of many entries, the entries will eventually be transcluded into pages of their own in the main namespace. Please take care to note them using the  syntax, with the name of the entry as the section name, as in the Abberwick example, above.

Where a place name is repeated across several entries, add the county name (and any other disambiguating locations in order of increasing importance) as a parethentical to disambiguate:

ABBEY, a hmlt. in the par. of St. Dogmells, in the hund. of Kemess, in the co. of Pembroke, South Wales. It is not far from Cardigan. ABBEY, a tythg. in the par. and hund. of Axminster, in the co. of Devon. It is not far from the town of Axminster. ABBEY, near Hartland, in the par. and hund. of Hartland, in the co. of Devon, 46 miles W.N.W. of Exeter. The seat of Mrs. Orchard, a mansion built on the site of the abbey founded by Githa, wife of Earl Godwin, in the 11th century, and rebuilt by Geofrey Dinant in 1184: it passed at the Reformation to Serjeant Abbot.
 * 1) Abbey (Pembroke) ##
 * 1) Abbey (Axminster, Devon) ##
 * 1) Abbey (Hartland, Devon) ##

Linking between entries
As a reference work, occasionally entries will explicitly reference other entries, usually using small-caps. Please link to these as subpages of The National Gazetteer: A Topographical Dictionary of the British Islands, by using the {{NatGaz lkpl}} template:

ABBAS-STOKE. See.
 * 1) Abbas-Stoke ##

Take care to make sure that you land on the correct entry, using the section naming rules in case there are multiple with the same name.

NOTE: There are many implied references between entries, simply because entries will describe a location with respect to other places. Please do not link every place name, but only those which are explicitly referenced with explicit direction in the text (e.g. through a See reference).

Local OCR can definitely improve (sometimes)
Concur about local OCR being an improvement, though can note from experience that sometimes that can be for part of a page. It may be worthwhile that prior to re-OCR to be copying the text out to an external text editor (Notepad or Notepad++) and OCR, then choose one or the other, or sometimes one can blend. — billinghurst  sDrewth  00:41, 13 February 2017 (UTC)


 * Agreed. The small text of the page does not obviously make one OCR or the other better.  Based on the patterns I'm seeing (recto pages are generally pretty good as is, verso pages are more consistently a mess) it may be worth retweaking an OCR configuration to specifically tackle the verso pages. —Pipian (talk) 22:24, 13 February 2017 (UTC)

Suggest change to link template name
Generally we have been using the nomenclature "... lkpl" (link plain) for article to article (internal) linking of a work, and "... link" for fully qualified, so from outside the work to an article component. Have a poke at Category:internal link templates.

Also note that we have a couple of master template to make these things easier Template:Authority/link and Template:Authority/lkpl. — billinghurst  sDrewth  00:45, 13 February 2017 (UTC)


 * Yeah, I suspect lkpl will be more helpful than link. The EB1911 links were the first I saw to use the idea from, but I don't anticipate this will be used externally as much as the EB1911 ones are.  So in that sense, lkpl may be the better option to follow.  I'll take a closer look tomorrow. —Pipian (talk) 22:24, 13 February 2017 (UTC)


 * Template:Authority/lkpl seems to work well enough, so I've added Template:NatGaz lkpl, though I haven't deleted Template:Natgaz link (yet) —Pipian (talk) 16:39, 18 February 2017 (UTC)

How do you propose to present
Worthwhile getting this question in early. How were you thinking to present the finished product? In a section by section approach per little article, where we end up with (A) lots of mini-pages "work/Town A, work/Town A junction, work/Town A railway station"; or were you looking to (B) clump present, either as "work/B, work/C ..." or "work/Ba, work/Bb, work/Bc ...".

Before you answer that, can I ask how you envisage the end product, and its use xwiki.

If you are looking to link from enWP article to article, or looking to link to parts in Wikidata? Then the mini-pages works well, and is how we generally do our biographical and encyclopaedic works. To do that you need to get section markers in place, they need to be uniquely distinguishable to be extractable, and &lt;section begin="work/Town A" />&lt;section end="work/Town A" /> ... &lt;section begin="work/Town A junction" />&lt;section end="work/Town A junction" /> and do include the " ".

If it is ultimately to just be a collected and presented work, that may have some anchors placed into it (either by manual addition or linking to a page number), and available as a reference, then the clumped will work fine.

PS. Depending how you see this transcription working, you may wish to create a project to capture volunteers, field questions and house a central repository for style guide. It is how we managed the DNB. — billinghurst  sDrewth 


 * I'm still working through a few early pages to make sure I've covered the bases here (and to tweak the long-term style that will apply equally to volumes 2 and 3), but by and large, the entries largely cover populated places (though this does have exceptions). I assumed that mini-pages would work well for exactly the reason you state (the description for various British places can be linked through Wikidata, which isn't possible with the clumped approach).  That said, there will probably need to be a multi-tiered clump approach for people who approach the gazetteer directly from Wikisource (much like how the Encyclopedia Britannica clumps by volume, then by three-letter range in a table, finer in the next tier of pages, and finer still in the tables in those pages.


 * That said, I can definitely see this spawning far more mini-pages than the EB (with 81 entries in the first 5 pages alone), so the relatively short article size may be a strike against a mini-page approach for most articles (but hardly all; contrast Sheffield, which gets parts of 4 pages in Vol. 3 (437–440).


 * So I think there's value in both options, and it could go either way, though I'm partial to the mini-page approach myself (It's why I advised that users include county name, and, if needed, more specific location information, to disambiguate otherwise identical names, using the -style section names)


 * I do want to get to the point of bringing in volunteers owing to the sheer size of the work (it takes me about 30-45 minutes per page owing to text size and properly proofing/reflowing the text between entries, and there are 2700 pages across all three volumes, so it's far outside the scope of one person), but I want to make sure that a first draft of the style is pretty well set in place before I do so. —Pipian (talk) 22:24, 13 February 2017 (UTC)

How do think that the transcription will progress?
(Last of the series of initial questions.) Can you give me an idea how you think that the transcription process will work, and if you have initial milestones, any intended priority use of components from the work. Depending on what are your answers will change my next set of questions of how we can best look to assist. — billinghurst  sDrewth  01:15, 13 February 2017 (UTC)


 * At the moment I don't have any proper milestones set up, though the rough plan was as follows:


 * 1. Finish a couple pages to my satisfaction to be confident that the above draft of notes/guidelines provides pretty good coverage (e.g. have I had the opportunity to see most of the abbreviations that will be encountered?) I'm getting pretty close to that at the moment, but was thinking I might tackle one or two more pages to be sure.


 * 2. Ideally try to rope more volunteers to help push the transcription at a faster pace.


 * 3. Once a workable chunk of the volumes are in play, start actually setting up proper landing spaces in the Wikisource namespace (no need to leave a ton of red links if it's not anywhere close to being transclusion-ready), and move on to automating cross-wiki links from there.


 * In terms of long-term goals, most of my thinking has oriented around my own possible external use-cases (tinkering with mapping old parishes/counties of England), rather than within Wikimedia. There's certainly plenty of room to add cross-links to Wikipedia though, and it's one of the best places I could see some proper cross-wiki benefits coming out of.  —Pipian (talk) 22:24, 13 February 2017 (UTC)

Leave some standard word and work for bots, or text replacement scripts, or work specific template
For the transcription process with your abbreviations, you may wish to consider a) the use of bots, or b) a refined use of TemplateScript (which we can set up), to populate the   parameter, or c) we create template:ngabbr to house all the abbreviations and tips.  Adding it manually over and over and over is just going to lead to errors, and take valuable time. — billinghurst  sDrewth  01:34, 13 February 2017 (UTC)


 * Yeah, the automation of that process is something that I think would be necessary to scale, so I'm all ears to any tools that can help with that, though they will depend heavily on cleaning up the OCRs first (or the right abbreviations won't be found). —Pipian (talk) 22:24, 13 February 2017 (UTC)