Wikisource:Portal classification system adaptation

This page describes the method by which the Library of Congress Classification system (LCCS) was adapted to Wikisource and the subsequent changes to that system.

In addition to casual interest, this page can be used to:
 * Explain the differences between the two systems and solve related problems with the classification of works.
 * Solve any problems that may occur when cross referencing the original and adapted systems.
 * Provide a basis for future versions of the system, if necessary, or for incidental changes.
 * Provide a blueprint if it is necessary to revert or repair parts of the system.

Original
The outline of the Library of Congress Classification system can be read in full on Wikisource at Library of Congress Classification; which includes a search function to find topics and their classifications. It can also be read at the Library of Congress's website. More information can be found via the Wikipedia article.

The LCCS system uses 21 classes, each represented by a letter of the alphabet. Each class is broken down into subclasses, represented by one or two further letters of the alphabet. More specific classification then uses a number from 1 to 999, which is itself followed by a cutter number.

Background
Wikisource's portal space was not initially organised in any way. Most indices were, at that time, in the Wikisource namespace, and also not organised (leading to duplication in some cases). It was desired that any subject index for Wikisource, which is essentially the purpose of portal space on the project, should use a published and authoritative system rather than something homegrown. This also has the advantage of being complete and having been already tested in practice.

The Dewey Decimal system qualifies under these terms but it is largely under copyright (the original version is in the public domain but it has gaps where reference to the modern world would be needed; adding these will either cause confusion or risk copyright infringement). LCCS is a work of the United States government and therefore in the public domain.

As this classification system is for use with portals and not individual books, the level of detail in the original system in unnecessary. So, the numeric portions of the call numbers were dropped, leaving just the subclass.

For example, the official Library of Congress call number for On the origin of species by means of natural selection'' by Charles Darwin is QH365.O2. On Wikisource, any portals corresponding to this book, and those like it, would be classified by just the subclass, the initial alphabetic portion of this call number, or QH.''

Further, as explained below, during implementing the system, it needed to be adapted to fit Wikisource's specific needs. Two more classes (I & X) and several more subclasses were added, while one class (E) was slightly reinterpreted to fit pre-existing material. This is justified as an existing practice when adapting the LCC system to local or specific use. Wikipedia notes that "The National Library of Medicine classification system (NLM) uses the classification scheme's unused letters W and QS–QZ. Some libraries use NLM in conjunction with LCC, eschewing LCC's R (Medicine). Others prefer to use the LCC scheme's QP-QR schedules and include Medicine R."

Split subclasses
UPDATE: In a later re-examination of this particular example, subclass PT, it was obvious that all of these languages were actually in the Germanic language family. So they could all be conflated together as simply "Germanic literature" without the need for any subclasses. This was done to simplify the list of subclasses in Class P and remove minor subclasses that were unlikely to be used. The point stands, however, that some other official subclasses needed to be split to meet Wikisource's needs. The subclasses as they exist are not all directly usable as portals on Wikisource. The most extreme example is Subclass PT: German literature - Dutch literature - Flemish literature since 1830 - Afrikaans literature -Scandinavian literature - Old Norse literature: Old Icelandic and Old Norwegian - Modern Icelandic literature - Faroese literature - Danish literature - Norwegian literature - Swedish literature. The directly equivalent portal, Portal:German literature - Dutch literature - Flemish literature since 1830 - Afrikaans literature -Scandinavian literature - Old Norse literature: Old Icelandic and Old Norwegian - Modern Icelandic literature - Faroese literature - Danish literature - Norwegian literature - Swedish literature, is impractical for use on Wikisource. Therefore, where this problem occurs, the Library of Congress Classification system subclasses have been divided into new subclasses. The first term of each retains the old classification; subsequent terms add a third letter to create the new classification.

For example, Subclass PT: German literature - Dutch literature - Flemish literature since 1830 - Afrikaans literature -Scandinavian literature - Old Norse literature: Old Icelandic and Old Norwegian - Modern Icelandic literature - Faroese literature - Danish literature - Norwegian literature - Swedish literature becomes:
 * Subclass PT: German literature
 * Subclass PTA: Dutch literature
 * Subclass PTB: Flemish literature
 * Subclass PTC: Afrikaans literature
 * Subclass PTD: Scandinavian Literature
 * Subclass PTE: Old Norse Literature
 * Subclass PTF: Modern Icelandic Literature
 * Subclass PTG: Faroese Literature
 * Subclass PTH: Danish Literature
 * Subclass PTI: Norwegian Literature
 * Subclass PTJ: Swedish Literature

Note: Subclass AC is an exception to this pattern. There was an existing Collective works index when this system was implemented, so this was used as the first term instead of Collections, which became the second term.

This is more complicated within Class K, Law, which is explained separately (below).

Some classes contain subclasses with the same code as the class itself. For example, subclass P (Philology and Linguistics) within Class P (Language and Literature). These are represented by a non-alphabet symbol as the second letter of the classification. The examples of this here use an asterisk; however, this causes problems in practice as the wikicode interprets this as a bullet point in some cases. Any symbol or number can be used in its place instead (for example, a hyphen).

Classes E & F: History of the Americas
The Library of Congress Classification system has two classes that cover the history of the Americas. Class E covers the United States while Class F covers the "local history" of the United States in addition to the history of British America, Canada, Dutch America, French America, Latin America and Spanish America.

At the time this classification system was implemented, Wikisource already had Portal:States of the United States with subportals for each state. Therefore, this was used as the equivalent of Class E with little change to the pre-existing portal. Class F was left to cover all other aspects of the history of the Americas, including any aspects of United States history that applies to more than one state.

New classes X & I
During implementation of the system, it was necessary to create two entirely new classes unique to Wikisource. Both use one of the letters omitted from the original classification system.

First, some pre-existing indices on Wikisource did not fit into the Library of congress Classification system. In order to accommodate these, the new Class X ("Wikisource") was created (X being a traditional wildcard term). This class is generally for Wikisource-specific classification. Subclasses are added to Class X as and when a situation arises where one is needed, starting with subclasses for WikiProjects and specific eras (ie. Ancient, Medieval etc)

Second, there was another pre-existing index, Texts by Country (and its subportals and indices), that did not easily fit any class in the system. These portals were national indices that covered each nation in general instead of the LCCS's more specialised areas (history of-, law of-, literature of- etc). Instead of dismantling or severely modifying a functioning index, this was declared to be a new Class I (I was the first unused letter in the alphabet). Each portal in this class serves as a hub for that nation, including works and/or linking to more specialised portals as necessary.

Class K: Law
Class K of the Library of Congress Classification system already makes extensive use of the third letter of the classification, which makes some adaptation (as described above) more difficult. Subclasses could not always be created by adding a letter; some were created by changing the existing third letter to the nearest unused letter. Others required more drastic alterations, changing the second letter of the classification for a batch of subjects and then selecting appropriate third letters from there.

The complete list of subclasses is extensive and can be found at: Portal:Law/Subclasses

Some sections from the Law of the Caribbean in subclass KG were moved to the vacant subclass KC due to space limitations.

Class Z
Update: In the official LCCS, class Z is divided into just two subclass, subclass Z and subclass ZA. Subclass Z covers several different areas: ''Books (General). Writing. Paleography. Book industries and trade. Libraries. Bibliography''. This needs to be split to be used on Wikisource. The first version of this split attempted to preserve the order as seen in the LCCS. The official subclass ZA prevented the second letter of the call number being used, so this was left blank and the third letter was used. For example, "Writing" was split to subclass Z_A. This was unwieldy and awkward, so the second version drops the attempt to preserve the order and moves all of the new subclasses to succeed subclass ZA. For example, "Writing" becomes subclass ZC. The following table shows both versions of this scheme:

Subsequent changes
This section is an appendix to the essay. It may be helpful in understanding the adaptation and the classification system if all alterations to it are clearly logged.


 * 18:16, 10 November 2010: Subclass GE changed from "Environmental Sciences" to "Environment" (to match existing portal)
 * 19:25, 11 November 2010: Subclass HTB changed from "Races" to "Race studies" (to match existing portal)
 * 23:27, 12 November 2010: Subclass BL changed from "Religions" to "Religion" (to match existing portal)
 * 22:51, 29 November 2010: Subclass BPA changed from "Bahaism" to "Bahá'í Faith" (to match existing portal)
 * 13:54, 18 January 2011: Subclass KNP changed from "Law of Taiwan" to "Law of the Republic of China‎" (following page move)
 * 12:47, 15 March 2011: Subclass B* changed from "Philosophy (general)" to "General Philosophy" (improving the readability and clarity of the portal title)
 * 12:51, 15 March 2011: Subclass D* changed from "History (general)" to "General History" (improving the readability and clarity of the portal title)
 * 12:53, 15 March 2011: Subclass G* changed from "Geography (general)" to "Geography" (no disambiguation necessary in this case)
 * 12:55, 15 March 2011: Subclass H* changed from "Social Sciences (general)" to "General Social Sciences" (improving the readability and clarity of the portal title)
 * 12:57, 15 March 2011: Subclass JA changed from "Political Science (general)" to "General Political Science" (improving the readability and clarity of the portal title)
 * 13:30, 15 March 2011: Subclass PN changed from "Literature (general)" to "General Literature" (improving the readability and clarity of the portal title)
 * 17:28, 22 March 2011: Subclass KA changed from "Law (general)" to "General Law" (improving the readability and clarity of the portal title)
 * 17:34, 22 March 2011: Subclass L* changed from "Education (general)" to "General Education" (improving the readability and clarity of the portal title)
 * 17:37, 22 March 2011: Subclass M* changed from "Music (general)" to "General Music" (improving the readability and clarity of the portal title)
 * 17:39, 22 March 2011: Subclass Q* changed from "Science (general)" to "General Science" (improving the readability and clarity of the portal title)
 * 17:41, 22 March 2011: Subclass R* changed from "Medicine (general)" to "General Medicine" (improving the readability and clarity of the portal title)
 * 17:42, 22 March 2011: Subclass S* changed from "Agriculture (general)" to "General Agriculture" (improving the readability and clarity of the portal title)
 * 17:44, 22 March 2011: Subclass T* changed from "Technology (general)" to "General Technology" (improving the readability and clarity of the portal title)
 * 17:46, 22 March 2011: Subclass U* changed from "Military Science (general)" to "General Military Science" (improving the readability and clarity of the portal title)
 * 17:48, 22 March 2011: Subclass V* changed from "Naval Science (general)" to "General Naval Science" (improving the readability and clarity of the portal title)
 * 18:29, 24 April 2011: Added subclasses to Class I (to add flexibility to the classification)
 * 21:40, 6 February 2013‎: Removed subclass TEA ("Roads and pavements"), merged content back into subclass TE ("Highway engineering") as per the original LCCS. The difference in content between the two potential portals was slim and confusing; having both separate portals was redundant.
 * 02:01, 9 February 2013‎: Removed subclass JVB ("International migration"), merged into subclass JVA ("Emigration and immigration"). Same reason as above.
 * 20:32, 16 February 2013‎: Removed subclass UEA ("Armor"). Error in original interpretation; this is not distinct enough from UE ("Cavalry").
 * 10:20, 18 February 2013‎: Merged GF (Human ecology) and GFA (Anthropogeography) into GF (Human geography)
 * 17:02, 18 February 2013‎: Collapsing all of the PTx subclasses into one master "Germanic literature" subclass. Too many minor subclasses overloading Class P; on a later look at the list of languages, they were all in the Germanic family making this an obvious choice for merging them all into one simple subclass.
 * 17:06, 18 February 2013‎: Collapsing all of the PQx subclasses into one master "Romanic literature" subclass. Following the example of the previous change.
 * 16:51, 20 February 2013‎‎: Recoded Class Z. See Class Z above.
 * 23:14, 7 August 2013: Collapsed subclasses VM (Shipbuilding), VMA (Naval architecture) and VMB (Marine engineering) back into one master subclass for VM: Shipbuilding and naval architecture
 * 22:24, 20 August 2013‎: Removed subclass DKA ("History of the Soviet Union"), merged into subclass JK ("History of Russia"). As above, the two portals largely cover the same subject.
 * 22:48, 20 August 2013‎: Removed subclass DLA ("History of Scandinavia"), merged into subclass DL ("History of Northern Europe") but kept the name "History of Scandinavia". The other countries of Northern Europe are covered by other subclasses, leaving on Scandinavia anyway; no need for two subclasses.