Wikisource:WikiProject Popular Science Monthly/Proofreading guide

Background information

 * This manual is focused on the proofreading formats used in . The information here is specifically lays out the style for this project and information on templates and methods which are not not used, are omitted from this guide. Nevertheless, the information provided here may be of help in other projects.


 * This project began in the autumn of 2009 and at the time of this writing in the autumn of 2014 five years later, we have 28 of 87 volumes proofread. The additional volumes no longer deal with the original premise of the publication, but they are available for proofreading.


 * The elements of the project are standardized and consistency is enforced. This applies to the page layout, font sizes, article headers.

Breakdown of the approach used to proofread

 * The volumes have no table of contents and this was constructed by paging through and collecting article titles, authors, tagging pages containing images and tables - and proofreading the indexes which are located at the back of each volume.
 * The content of the volume indexes revealed what subject needs to be linked to the index (anchored).
 * The titles and the volume indexes collected are stored in 87 offline MSAccess databases which are linked into a single master database. This database contains the nearly 2,500 author names, identifies the list of monthly recurring article sections, formats the multipart article list, formats the main namespace article headers, generates author page headers and the volume indexes.

What was completed overall

 * Article title pages.
 * Index pages at the end of each volume.
 * Pages with images.
 * Pages with tables.
 * Table of contents.
 * Main namespace article pages.
 * Major categorization of the main namespace articles.

What needs to be done

 * Proofreading of pages.
 * Inserting overlooked tables.
 * Inserting overlooked images.
 * Replacing font size templates with PSM specific font templates.
 * Linking the volume index entry anchors to the titles.

Relevant namespaces, a short explanation

 * Index namespace - Container storing the individual pages of a book.
 * Page namespace - page by page storage contained by the Index container. It is where most of the work takes place, the proofreading of the pages.
 * Transclusion process - links the proofread pages and the Main namespace.
 * Main namespace - Assembled display of the transcluded pages from the Page namespace.

Hyphenation

 * Hyphenated words, which by themselves are correct, are left as is, being the typesetting style at the time.
 * A hyphen at the end of line is often used to justify the text. Use your judgement if the words should be hyphenated.
 * If the last word of the page is hyphenated, check following page for the complete word and enclose the first part of the hyphenated word at the bottom of the page and enclose the second part of the hyphenated word at the top of the following page  and this will merge the two parts into the word complete when transcluded in the main namespace.
 * The abbreviated form of the hyphenation templates are hws and hwe Click to see this example Pages 16 and 17

Single and double quotes

 * Use standard English typewriter double quotes "...." (ANSI 034) or the curved quotes “....”(ANSI 147 and 148), but not Guillemets «....» (ANSI 171 and 187).


 * Check for matching opening and closing quotes and close up the space between the marks and the enclosed text.
 * There is an occasionally used typographical style applied to a series of paragraphs where the beginning of each paragraph is opened without a closing double quotation mark.
 * For single quotes use the standard English typewrite single quote '....' (ANSI 039) They are used to enclose text, within, or in place of, double quotation marks.

Typographic characters, ligatures, and symbols

 * In some volumes, symbols and characters are ignored by the OCR. These include the em dash (—), currency symbols ($ and £), the temperature indicator º, and the centered decimal point, etc.
 * Check for italics  in the text. Referenced publication names are always italicized.
 * Check for the missing 'em—dash' — (ANSI 0151) character. This is available on the advanced editor toolbar, or, by request can be added to the user's Charinsert preference, or use the --template.
 * Check for ambiguous text. They may be incorrectly rendered scientific, technical, or currency symbols like, fractions, degrees '°' (ANSI 0176), currency '£' symbols {ANSI 0163} or centered decimal points '·' (ANSI 0183).
 * Check for the characters 'ae', and 'oe', which are most likely to be the ligatures of 'æ', (ANSI 0230) and the 'œ' (ANSI 0156). Assumptions can be made of their existence based on the article's subject matter.
 * Characters, symbols and ligatures can also be inserted by using the HTML equivalents. See References for the HTML ANSI codes.
 * All the above mentioned symbols can be had in the "User" selection of the Charinsert gadget. Just post a request in the Scriptorium/help.

Fonts sizes and font templates

 * Fonts larger than 100% are of no concern. Use any size deemed to be matching the original.
 * For font sizes that are less than 100%, the following list of templates were designed for the project because they include line heights proportional to the font size.


 * fs90 is used to enclose Author names,
 * fs90/s fs90/e used to enclose a bloc of paragraphs and/or span pages. When used to span pages, the is placed in the footer of the first page, to terminates the block and  is placed in the header of the following page to begin the new block. This way the transcluded text in the main namespace will be enclosed with a single set of templates because headers and footers are excluded. Click on this link to see an example.
 * fs85 85% font size and 100% line height. - Used exclusively for image captions and subtitled sections of recurring monthly features.
 * fs75 75% font size and 95% line height. - Used to enhance the diversity of font sizes of article sub-titles.
 * fs70 70% font size and 90% line height. - Used inline to match the line height of fraction templates and.
 * Named font templates are not used in the PSM project
 * Link to the 100% and smaller font size and style comparisons table.

Titles

 * Article titles are set by templates. See list of title templates in the Reference section.
 * This page provides examples of the article title, author, subtitles and Roman numeral paragraph numbering.

Author names

 * The main title is followed by the author's name for which there is no template. The name is enclosed in the small caps template, then the set to font size is 90% always using the fs90 template and then centered on the page.

Secondary title font sizes

 * If there is a subtitle below the author's name, it is centered and wrapped with the is 75% font size, using the fs75 template.
 * If there is a secondary subtitle, center it and wrap it with a font size comparable to the original. This may be 100% or if smaller, use the 85% font size fs85 template.
 * This page has one main and five sub titles. Otherwise, article titles consist of one main and, at most, three sub-titles. Since the styles differ, there is good visual contrast, even when the font-size difference is less than 10%. New articles can start anywhere on a page.

Paragraphs

 * Contrary to the original scan, proofread paragraphs are not indented. However, there are exceptions in poems in which alternate lines are indented, and indented lists, where inserting a table is not warranted. In such cases there are two templates available:
 * Use gap template where there is a wide gap or indent in the text.
 * Use spaces template where there is a short gap or indent in the text.
 * All Roman numeral numbered titles are 90% font size and enclosed with the fs90 template.
 * Use the {{tl|Dropped initial) or {{tl|Di}} template to format an article's first letter.
 * The double height row template {{tl|Dhr}} is used in places where 2 or more empty lines separate paragraphs. This template also accepts a height specification if the space between two sections {{color|blue| {{|Dhr|4em}} }} indicates a vertical spacing of 4em. {{color|blue|Click to see this page by opening it in edit mode.}}
 * If the end of a paragraph is also the end of the page, terminate a page with the {{tl|nop}} template. This prevents the transclusion process from joining this paragraph to the subsequent paragraph. This template must be placed on it's own line and must not be followed by any character or space.

Paragraph titles

 * Paragraph titles of the CORRESPONDENCE sections are CAPITALIZED LETTERS, centered, and enclosed with the fs85 template.
 * Paragraph titles of the Editor's Table are also the same font size, but the title is italicized.

Paragraph spacing and separators

 * Where a line separates topics in the original, paragraph separators are standardized to be rule of 4em in length, and padded before and after with Dhr.

End of topic space

space Start of topic (Header)

Poems

 * Poems, without exception, are wrapped starting innermost with the fs90/s fs90/e font template, followed by the tags, and then enclosed in block center/s block center/e templates.
 * An alternate mode of achieving these parameters, is using  and terminating it with.
 * The template order is necessary because the font template line height is not applied to the contents, unless it is the innermost template.
 * The block center/s template is the most versatile template for multiple paragraphs and page spanning.
 * The tags can't span pages. In poems that span pages the tag must be terminated at the last line of the poem and inserted anew in the following page.
 * Most poems begin with a double quote which requires the use of the Floating quotation mark or fqm template, to retain the proper centering of the poem.

 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat."

 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat."

References, footnotes and endnotes

 * Use the smallrefs template in the page footer to render footnotes in small font. Footnote references are automatically numbered.
 * Footnotes which span over pages requires a named reference tag on the the page where it begins and a "follow" reference tag on the subsequest page(s) as . Click to see an example of a footnote spanning two pages.
 * In the main namespace footnotes are converted into numbered endnotes.

OCR shortcomings

 * The OCR process has difficulty in distinguishing certain characters and commonly misreads the following:
 * Words beginning especially beginning with "W are preceded by a double quotation mark. Compare to the original.
 * Short words beginning with 'w' are occasionally garbled as in 'w T here', which is supposed to be 'where'. Correct these by searching for ' w ' surrounded by spaces.
 * Occasionally, the lowercase 'h' is rendered as 'b'.
 * Words containing 'g' is problematic.
 * Words containing 'p' are often rendered as 'jj'.
 * Uppercase 'N' is often rendered incorrectly.
 * The uppercase "R" is often rendered as 'K' 'E', or 'B'. Spell check finds the error, unless the change is a meaningful word.
 * Ligatures.

Tags

 * When a page specific code is required, the .djvu number of the page is used because it guarantees uniqueness. Printed page numbers are not unique or accurate. If the code contains no spaces, then enclosing the code with quotes can be omitted.

Section tags

 * The codes are made up of the following segments:


 * End of article begin and end section code segments:

E = End of article 27 = .djvu page number


 * The article following on the same page uses the code segments, except prefixed by 'B' to indicate the beginning section of the article.

B = Beginning of article 27 = .djvu page number


 * Click to see an example of the above coding scheme in edit view of .djvu page 27/Page 17. The final results are visible on this transcluded page.

Single image FIS template layout for center, or offset images

 * With the implementation of George Orwell III’s FreedImg/span FIS, the abbreviated form) makes Image display and the accompanying caption much simpler. The caption is part of the template.


 * In the offset floating mode the text flows unbroken around the template.
 * PSM only uses the following parameters and unused parameters are to be removed from the template.
 * The required parameters are the File name, image width and the float position.

Centered image examples

 * Stacked images with various primary and secondary captions & styles.
 * Full page width 500px image.
 * Mixed text style in caption.

Offset image examples

 * Note how the text abuts to the open and close brackets of the template to provide seamless flow without a paragraph break.


 * Image offset to the right, width is less than than half the page width (215px.) and having mixed text style.
 * Image offset to the left half page width and a short caption.
 * Image offset to the left half page width and a long caption.

One image with two captions

 * The image and the combined caption width must equal to the overall table width.

{|align=center width=500
 * colspan=3 ||frameless

Centered captions

 * width=210px |
 * width=10px|
 * width=210px |
 * }

Hanging indent justified text captions

 * width=210px |
 * width=10px|
 * width=210px |
 * }

Click to see an example of a single image with two short captions.

Two separate images with two captions
{|align=center width=500
 * width=210px ||frameless
 * width=20px|
 * width=210px ||frameless

Centered captions

 * }
 * }
 * }
 * }

Hanging indent justified text captions

 * }
 * }
 * }
 * }

Click to see an example of three images side by side.

Tables

 * As an introduction, this page contains a written conversation about a particular advanced table design:Page talk:Mexico, Aztec, Spanish and_Republican, Vol 2.djvu/178

Formatting codes declared in the table header (which affects the whole table)

 * For complete reference List of table style shorthand codes


 * mc = centers the table on page the order of values are clockwise Top, Right, Bottom and Left. (margin:0 auto 0 auto;)
 * ar|al|ac|aj = aligns the contents of all cells. Base this universal alignment by the content alignment of the majority of columns.
 * bc = border collapse. If omitted, cell borders are double.
 * border/border=1 = single line border of every cell.
 * bt|br|bb|bl = single line border around the table. where
 * |-ac|bb = when declared on a table row indicator, aligns content with a single line bottom border of a row of cells.


 * sm90|lh12 = font size of 90% and the matching line height of 120%.
 * sm85|lh11 = font size of 85% and the matching line height of 110%.
 * pt.5|pb.5 = cell padding of .5em top OR bottom.
 * ptb.5 = cell padding of .5em top AND bottom.
 * pr1|pl1 TO pr5|pl5 = cell padded 1em to 5em (increments of 1em) on the right OR left of the cell.


 * The template top row is the header for centered column titles with padding top and bottom of the cells.
 * The template second row is the first row padded on the top.
 * The template third row is no padding.
 * The template fourth row is the last table row, padded on the bottom.

Table layout for tables with single line borders for various font sizes, and matching line height.
{| (standard 100% font size and matching standard line height of 140%)

{| (90% font size and matching line height)

An analysis of table design
Click this link to see this page.

Dictionary and spell check

 * Using the spell check of the browser is sufficient.
 * Bad spelling in the original is indicated by [sic]. The sic template is invisible in read mode, but in edit mode indicates that a previous editor was aware of the error.
 * Outdated, but correct spelling, is left as is.
 * Spelling variations of English words are to be accepted as it is.
 * An alphabetic list of archaically spelled words and proper names collected from the Volumes can be found on this page.
 * An alphabetic list of archaic spellings and proper names collected from Volume 1, can be found on this page., although the list need to be cleaned up.
 * Recommended word reference Wiktionary.

Comments on the contents
For students of science, technology and social history, the publication provides a fascinating view through the window of the printed word, and what a view it is. To read the articles promulgated by the great minds of 19th century, the depth and diverse range of subjects covered is a mine, of pure gold. The language, the terminology, and the spelling of the day, coupled with an occasional tone of condescension employed in addressing the reading audience, enhances the experience.

The publication aimed to reach a wide audience by disseminating information, and publicizing issues of wide ranging interest for the emerging 19th century middle class thirsty for knowledge. The novel approach of fusing the perceived desire of the public, and serving as a platform for the dissemination of academic thought, was well received.

Of great interest is the level of scientific knowledge and the social issues of the day. It's somewhat eerie to read that the then prevailing views expressed on matters of public health, education, nutrition, employment, natural resources and pollution are still familiar in our time. The knowledge espoused range from the quaint to the surprisingly advanced, with many theories still in the process of being debated and formulated when this is written.

The typesetting
The display of increased confidence in the viability of the enterprise is palpable as indicated by proudly published reviews on on this page of the June 1872 issue. There was a positive reception by the press, the academic community, and the interested public. Subtle changes appear progressively after this issue. The typesetting style is progressively streamlined and displays increased professionalism.

The composition is progressively improved and the payoff for the Wikisource proofreader is the reduced frequency of typographical embellishments. The number of quotes, italics and em dashes used on the pages are no more than a couple per page, and that's good news. Of course there are some extreme exceptions, like idiosyncratic writing where the word "practical", enclosed in quotes, appears nine times on a single page.

After spending time at a daily paper, observing Linotype machines in operation, surrounded by typesetters and proofreaders at work, a sense of amazement is felt when one considers that these pages were manually set character by character, space by space, block by block, to a justified paragraph format, laid inversely and with very few errors.

Subject and style
While random sampling of articles by topic, an interesting relationship can be discerned between the article's subject, the typographical style, and even the word count per page. Articles on morality, religion, and religious thought, contain an increased number of typographical embellishments to emphasize their absolute, exhorting, admonishing and cautionary messages. This is indicated by an increased number of em—dashes, double quotes, single quotes, italics, and capitalized text. There is a lot of rolling of the holy.

PSM article title templates

 * Project templates category page


 * This is the list of title templates designed specifically for the project. All template names begin with the letter "P". The multiple templates of similar names reflect changes in the original style. Many changes are minor, but separate templates also allow the use of different font styles and sizes when a greater range of web fonts is available, to match the original as closely as possible.

General templates

 * Attempt was made to use Templates almost exclusively. HTML tags are limited to the minimum, where a template wasn't available or possible.

Changes in Wikisource

 * Recent changes implemented in the Wikisource Proofreading extension, (between the fall of 2014 and spring of 2015), relating to font style and size, as well as the line height), had some minor effect on the templates referred to above.