User:Arbitan/Classification

Proposal for parallel classification system for Wikisource portals
As the number of works included on Wikisource continues to grow, it will become increasingly challenging for readers to find works. It is in our best interest to make works as discoverable as possible, and the best way to do that is to provide several paths to any given work.

There are currently three ways a reader can find a work: (1) search in the search bar, (2) click through the portals system, or (3) click on a category. All of these methods are valuable and necessary, and each has its advantages and disadvantages.

The usefulness of Wikisource's offerings will be increased if we also provide a fourth method, browsing using a familiar decimal classification. This would have the advantage that it is the most widely used system in libraries today and is therefore the most likely to be familiar to Wikisource users.

A decimal classification was considered for the Wikisource portals system back in 2010 but was rejected because the Dewey Decimal Classification is under copyright. The decision was therefore made to go with the Library of Congress Classification (LCC), since as a work of an agency of the United States Government it is in the public domain. I am very fond of the LCC. It is the system I am most familiar with, and the one I instinctively "think in" when I think about finding and organizing books. Nevertheless I think the reason for bypassing a decimal classification was mistaken.

I do not propose scrapping the LCC. A great advantage of an online repository of books is that multiple classification systems can exist side by side. What I propose is that a decimal system be added parallel to it.

But what about the copyright? The Dewey Decimal Classification (DDC) is not free to use! That is true, in its current edition. Any edition published within the past 95 years will be under copyright. But the first twelve editions were published before that and are therefore now in the public domain. There is no moral or legal impediment to anyone using one of them to classify their books. What we are not allowed to do is the following:
 * We cannot use the name "Dewey Decimal Classification", or indeed the word "Dewey" at all, as that is a trademark of the OCLC.
 * We cannot incorporate any changes to the DDC made in the 13th edition (published 1932) or later.

That second restriction is actually not that big a deal for Wikisource, since almost all the works included here were published before that. By using even the 12th edition (published 1927), we would be using an edition that is contemporaneous with or postdates almost the entire collection housed here.

I propose that we call it the Wikisource Decimal Classification (WSDC), and that we base it on the 12th edition (1927) of Decimal Clasification and Relativ Index [sic] by Melvil [sic] Dewey (https://archive.org/details/decimal12dewe). (Melville Dewey was a spelling reform enthusiast.) Alternatively, we may follow the MDS, a similar scheme created and maintained by LibraryThing (https://www.librarything.com/mds).

Adoption of the system

 * Oct–Nov 2009: Scriptorium/Archives/2009-12 expressed community frustration with the then-current state of portals
 * August 2010: Scriptorium/Archives/2010-09 raised the possibility of using an existing library classification system to organize portals
 * Aug–Sept 2010: Scriptorium/Archives/2010-12 in which User:AdamBMorgan discussed the pros and cons of different options, and proposed the LCCS
 * September 2010: AdamBMorgan created User:AdamBMorgan/LCC Proposal on how to apply the LCCS to Wikisource's portals
 * September 2010: Scriptorium/Archives/2010-10 in which AdamBMorgan announced he was commencing the project

Later comments on the system

 * November 2012: Scriptorium/Archives/2013-02 proposal for further structuring to allow cross-wiki correlations rejected
 * February 2013: Scriptorium/Archives/2013-03 brief discussion on reworking the structure of Class I, Portal:Texts by Country with no conclusion
 * June 2013: Portal_talk:Index complaint that the system is too U.S.-centric

About the Library of Congress Classification
The class and subclass have been implemented on Wikisource. User:AdamBMorgan created the portal classification system on Wikisource and used the class and subclass numbers. He included parameters in the portal header template where the LCC class -- or rather, a modified version of it -- could be included.

User:AdamBMorgan anticipated that at some time in the future the division numbers (which he mistakenly called Cutter numbers) would/could also be used as the portal system grew. He included a parameter in the header template for that, too, but on all portals that parameter is currently empty.

How it works
A LCC call number typically consists of four parts:
 * class letter(s) (typically 2 letters, but sometimes 1 and occasionally 3)
 * division number (between 1 and 4 digits)
 * Cutter number (a letter followed by 1 to 3 digits)
 * year of publication

Take for example the book The Decline and Fall of the Roman Empire by Edward Gibbon. In the LCC its call number is usually DG 311 .G5.


 * DG is the subclass History of Italy. D is the class History. DA is the history of the British Isles, DB was originally history of the Austro-Hungarian empire and now includes its successor countries (Austria, Hungary, Czech Republic, etc.), DC is the history of France, and so on. D by itself is history of the world and history of Europe in general.
 * 311 is the division number for "284–476 Decline and Fall, General works". The range DG 11 to DG 365 covers Ancient Italy, including the Roman empire. The imperial period of Roman history (27 BC to AD 476) is in the range DG 269.5 to DG 365. Within that is the subtopic "284–476 Decline and fall" which is in the range DG 310 to DG 365. The division DG 310 is sources and documents, DG 311 is general works, DG 313 is general works about Diocletian, DG 315 is Constantine the Great, etc.
 * .G5 is the Cutter number. In the LCCS, this is usually used to group books by author and make it easy to keep them alphabetized. You can read more about the Cutter system on Wikipedia (https://en.wikipedia.org/wiki/Cutter_Expansive_Classification) or on the Library of Congress website (https://www.loc.gov/aba/pcc/053/table.html). The letter, in this case G, is the first letter of the author's name. (In the case of multiauthor works, it'll be the first letter of the book title.) The numbers after the letter represent the second and third letters of the author's name (or book title). Both of the websites I linked to have tables. The i in Gibbon is a 5.
 *   is the year of the edition. An edition of Gibbon published in 1965 would be DG 311 .G5 1965. If two editions appear in the same year, one will have an a appended to the year, e.g. 1965a.

So the basic idea of the LCC system is each book is assigned to a narrowly-defined topic, and within that topic the books are alphabetized by author. "Fall of Rome" is DG 311. If you're looking for Gibbon's book about that topic, you go to DG 311 and it's under G for Gibbon.

It's worth noting that there is no rhyme or reason to the division numbers assigned to the topics. They are only there to provide a way to order the books on the shelf in a reasonable sequence and keep books on similar topics together. This is different from the decimal systems most people are familiar with (e.g. the Dewey Decimal System). In the LCCS, subjects don't always divide up at round numbers.

Roman history is an example of this, as seen in the ranges above. Take DG 300, for example. There isn't anything special about DG 300 in the LCCS, as one might expect coming from a decimal classification system like Dewey. In the LCCS, DG 298 is general works about the period 180–284. DG 299 is the reign of Commodus. DG 300 is the reign of Septimius Severus. DG 301 is the reign of Caracalla. DG 302 is the reign of Macrinus.

Another example is the history of Iran, which is DS (History of Asia), division numbers DS 251 to DS 326. The Qajar dynasty (1794–1925) is DS 298 to DS 316. DS 300 is simply social life and customs during the Qajar dynasty.

Resources
A detailed explanation of the LCC can be found on the Librarianship Studies & Information Technology blog.

Very detailed schedules of the classification system are available here: Library of Congress Classification PDF Files.

Ballooning
Ballooning is found in two places:
 * Ballooning as a flight technology: TL 616
 * Ballooning as a recreational activity: GV 762

Eventually there could be a single Portal:Ballooning, which could include different sections for tech and recreation. Then each class portal would link to each section.

In the meantime, differentiate and include a cross-link.

Photography
All books of photography go in TR, unless the entire subject is about a particular thing, like a historical event. But on Wikisource I see no hindrance to putting a work in both.

There is no photography section in Class N.

Bank note engraving
In the LCC, bank note engraving goes in HG (Money), not NE (Print media – engraving/etching/lithography). But I see no reason not to put it in both.

Draft proposals
I'm still mulling these over, so I'm not posting them in the Scriptorium yet, if ever.

Class I: Texts by Country
Class I has an awkward set of subclasses. IN redirects to I. IS for states, IC for counties, IT for towns. These subclasses were not part of the initial setup of the classes in October–November 2010. IS, IC, and IT were created five months later, in April 2011. Nothing in their talk pages or in the Scriptorium discusses why they were created that way. Then IR: Regions was created in January 2015 as a new subclass for the existing Portal:Regions.

These subclasses function fine as a place to stick portals, but they don't aid navigation. Right now it doesn't matter because there aren't many county or town portals. But in the future there could be a lot, at least for certain countries. Suppose that there's a fully fleshed-out portal system for Canada. If I'm in Portal:Toronto, my instinct when going up to the parent portal is that I'll be taken to Portal:Ontario. Instead, I'm taken to Portal:Towns and Cities. If I want to go to Portal:Quebec, here's how I can currently do it (again, assuming a future in which all these portals exist):
 * Through the header navigation: Portal:Toronto ⇒ Portal:Texts by Country ⇒ scroll to Portal:Canada ⇒ Portal:Quebec
 * Or, through the sidebar navigation: Portal:Toronto ⇒ Portal:Provinces ⇒ Portal:Quebec.

Neither of these options seems intuitive. What a typical user would expect is
 * Through the header navigation: Portal:Toronto ⇒ Portal:Ontario ⇒ Portal:Canada ⇒ Portal:Quebec
 * Or, through a new navigation sidebar specific to Portal:Canada's subportals: Portal:Toronto ⇒ Portal:Quebec, or else Portal:Toronto ⇒ Portal:Canada ⇒ Portal:Quebec, depending on how the sidebar is done.

In order to implement this, the subclasses would have to be organized into a hierarchy: World ⇒ Continent ⇒ Country ⇒ First-level subdivision (i.e. state, province, UK/IE county) ⇒ town/city

Here is a possible setup:

I Portal:World ⇒ IN Portal:North America ⇒ IN-CA Portal:Canada ⇒ IN-CA-ON Portal:Ontario ⇒ Portal:Toronto

The continent subclasses could be something like this:

Then all subclasses below that (for countries, states, etc.) would use the ISO 3166-1 and ISO 3166-2 abbreviations. ISO 3166 has the advantage of being an already established and widely understood system, and saves us from reinventing the wheel. The code is easy to find by looking in the sidebar of the respective article on Wikipedia.

Let me stress: With this system no new portals will be created. This is a reorganization of existing portals into a new hierarchy, but with the two added benefits of being easy to navigate and scalable.

Needed changes

 * PB: It makes no sense to set aside space in the system for Portal:Modern languages (it's a red link for a reason) and give it a two-letter class of its own. It's a small, niche topic that could easily be thrown into P*:Philology and linguistics. It will likely remain a red link for a very long time. By "Modern languages," the LCCS means pedagogical discussion of teaching modern languages as distinct from the classical languages of Greek and Latin.
 * Almost all of PB in the LCCS is dedicated to the Celtic languages and their literatures. That's what PB should be here, too.
 * In the LCCS, PB 1–431 is Modern languages, and PB 1001–3029 is Celtic languages and literatures. In practice, if you walk down the stacks of an American university library in the PB's, almost all the books you'll see on the shelf will be Celtic stuff. The "Modern languages" section will mostly consist of academic journals on language pedagogy, like publications by the MLA for example.
 * Proposal: Reassign Portal:Celtic languages in the classification system from PBA to PB.

Possible changes?

 * P*: I wonder if this portal should be renamed from Portal:Philology and linguistics to Portal:Linguistics? I'm ambivalent about this. On the one hand, it might be clearer for modern readers, since "linguistics" is a more familiar term than "philology" nowadays, and there's a lot of overlap between philology and historical linguistics. On the other hand, older books do use the word "philology," and to be honest it has an antiquated feel that appeals to me aesthetically.

Subclass PA: Classical languages and literatures
If division numbers be implemented:

A new portal designated PA without the division number can serve as a landing page for the Classics, with a box header/footer for each of the seven major divisions listed above.

[1] The LCC splits the Greek and Latin series of Loeb, and I can't find a good call number for a portal that would combine them. The nearest I can think of is PA 3005. I recommend splitting them into two separate portals for convenience, anyway, since it's such a long list. Each portal could point to the other in a "see also" note.

[2] When Greek language is split into its own portal, it will be PA 0000

As new child portals of PA are created, they would receive the following LCC classifications: