Wikisource:Style guide/Orthography

Scope
English texts will commonly contain characters from other Latin alphabets, which contain characters not used in modern English but used in other European languages (e.g. ß) or diacritics (e.g. é, ö). Older texts can also contain characters once used in English but no longer understood by the average reader (e.g. þ, ð, ƿ, œ, ſ). Ligatures are also common in many older scans (e.g. ƈt, ﬃ), representing an archaic typography. This page provides guidance on their use. Non-latin characters (Hebrew, Cyrillic, Chinese, etc.) are beyond the scope of this guide and should simply be duplicated using the appropriate non-Latin character set - a tooltip translation or interwiki link (e.g. to wiktionary) may be desired but would be treated as an annotation.

Considerations
There are numerous considerations when deciding whether to use a particular letter form or to modernize the orthography/typography.

Phonological significance
Ligatures have no phonological significance. Other characters such as þ and œ have distinct phonological importance; i.e. there is a sound associated with the orthography that cannot be properly indicated using the modern English alphabet (e.g. þ does not equal "th" perfectly and œ (German ö) has no equivalent in modern English—although the sound does occur, for example in "earth").

Search engine compatibility
Search engines, such as Google, do not recognize most letter forms, including ligatures, as the modern English equivalents. For example, Google will not match "tact" and "taƈt". Search engine compatibility is a serious consideration as the inability to find a text using common search engines reduces the accessibility of texts on Wikisource. Because it is most important that users can find the text displayed in the mainspace, lack of search engine compatibility in the pagespace is of less concern. Therefore, ligatures should never be added directly but may be added using templates which display the ligature only in the pagespace. There is some question though as to the usefulness of displaying pure ligatures that have no phonological value and it is generally acceptable, if not preferable, to omit them.

Consistency
When templates are used that display an archaic orthography in the pagespace but a modern orthography in the mainspace (e.g. Long s), an inconsistent result may occur in the mainspace where other archaic characters are transcribed directly (such as ð or ß). Furthermore, the line between modernizing orthography and modernizing spelling is arguable.

Clarity
The difficulty in distinguishing some letter forms from others (e.g. "ſ" and "f", especially in certain typefaces) should be taken into account. Additionally, many older texts mix the letters u and v and sometimes their relatives w, y, and f; also the letters i, j and g; as at the time of the writing the letters had not established fixed sounds and were often seen as variants of the same letter. Although the phonological value of a character may not be certain, it is often relatively easy to determine whether a consonantal or a vocalic value was intended.

Understandability
Modern readers may have difficulty reading documents which use ſ. Because it is phonologically equivalent to "s", it may be desirable to display the modern orthography in the mainspace.

Historical value
Some users may find it useful to see the changes in orthography. This is lost when orthography is modernized.

Flexibility and future development
Templates make it simple to change the display of characters throughout the project and make finding characters easier. In addition, templates can be used to display a different character in the mainspace than in the pagespace, if desired (e.g. ls displays "ſ" in the pagespace and "s" in the mainspace). Finally, templates make it easy to implement technical improvements that allow user preferences or other forms of alternate display, if those become available in the future.

Ease of use
Characters that are not in the modern English alphabet require extra work to code but may be found in the box below the edit summary in edit mode or may be easily made into shortcuts. Templates are more work to add but may also be turned into shortcuts or a regex script may be created to replace other more easily typed characters. However, text that is full of templates is hard to read in edit mode and does not facilitate proofreading.

Best Practices
There is no attempt here to dictate practices and much must be determined based on weighing the above considerations in respect to a particular work. These are merely general guides to the display of orthography in Wikisource works.

Phonetically distinct archaic letter forms
For phonetically distinct archaic English letter forms (e.g. ð, œ) archaic letter form should generally be displayed by default. A modern equivalent may be displayed by alternate text (e.g. via tooltip), if one exists, so long as search engine results are not affected. A template may be useful to track and create optional displays of such characters but is not generally necessary.

Phonetically equivalent archaic letter forms
For phonetically equivalent archaic English letter forms (e.g. ſ, ꝛ), a template (e.g. Long s) is generally desirable to track and maintain flexibility for the display of such characters. However, in those cases where the archaic form is necessary (e.g. a work that is comparing letter forms or satirizing archaic styles), ſ may simply be entered. Although some major search engines recognize ſ as equivalent to "s", it is unlikely that all do, so generally "s" should be displayed by default in the mainspace.

Non-English letter forms
Diacritics and ß characters should always be displayed by default, these are rare in English except in some loan words. A tooltip is not normally desirable unless the entire word needs further clarification.

Ligatures
Typographic ligatures such as ƈt, ﬀ, ﬁ, ﬂ, and ﬅ should not be used in page text even if they appear in the original source (as they interfere with the searchability of the text). The ligatures æ and œ, however, are allowed since they are typically matched to "ae" and "oe" by search engines. Note, using templates such as ff does not address the problem with search (and is therefore deprecated), as a search engine will still fail to interpret it as "ff." Best to simply type the two separate characters.