User:Inductiveload/Requests/Batch uploads

I can upload batches of files from the IA or Hathi Trust. However, I will require the metadata to do so. I will not do uploads if you don't give me the data (unless I really, really want to anyway).

I can also create files from batches of images. In this case, you will need to provide details of where I can get the images from. I can help you with batch downloading images if you need. If you already have the images, probably the easiest way to share them with me is to upload to the Internet Archive as an "image ZIP" following these instructions.

Data file format
I will need a spreadsheet (XLSX, CSV or ODS) with the following columns (the names are important, don't change them).


 * All data, like printer, that is available should be provided. It's a lot easier to put it in now than patch it in later.
 * You can add as many other columns as you like for your own purposes, such as building up strings. They will be ignored.

There some examples here: https://drive.google.com/drive/folders/1fW5ozskDJiyVoQycUoGEB7d-L_Uh6N7b

Authors, etc
If you provide strings like, they will be used as-is. If you provide a Wikidata ID like Q30875, then it will be used in the creator template at commons and the linked Wikisource author page (in this case, Author:Oscar Wilde) will be used for the index page.

Separate multiple authors with slashes, e.g..

Licenses and copyright
I can upload files locally to Wikisources if needed, if they are not suitable for Commons for copyright reasons.

If the file is not a US work (e.g. a non-US author), you must not specify  as the copyright if the file is going to go to Commons. You should specify a suitable template. Usually, this is PD-old-auto-expired: in that case you must also give  to show why the work is PD in the country of origin.

If the file is coming to Wikisource (usually because it's copyright in the country of origin, but not in the US), you should set  to , set   if not   and you must provide   and.

Spreadsheet automation
Note, you can often use the volume number to build the other cells with spreadsheet equations. For example, if the volume number is col G and the title is col C, then the filename for row 2 might be.

Likewise, you can increment numbers. If row 2's volume is 1, then you can make row 3's 2 using.

You can zero-pad number with, e.g.

In this way, you can save a lot of tedious typing. However, do make sure that the data stays accurate. Very often things like publisher, printer or even the date ranges of volumes can change halfway though a series.

If you use formulae, I'd prefer to receive an XLSX file than a CSV file, since I can adjust the formulae if needed.

Authority control
The OCLC number is optional, but highly recommended, because the OCLC ID is a very good way to link the files and indexes with structured data, as it (should be) a unique key.

Sending the file
You can send me the file by creating a task on my Workboard at Phabricator and attaching your spreadsheet, or commenting on my talk page and providing a link to some other file host (e.g. Google Drive, Dropbox, etc).

If you use formulae in your spreadsheet, I'd rather have the original spreadsheet (XLSX/ODS) than an exported CSV file, because if I need to make changes to anything, it's easier if the formulae still work.

Known issues

 * Pagelists are generated from the source's upstream data. The quality of this ranges from near-perfect to complete junk. It will be your responsibility to deal with that these. All indexes are created with "to be checked" statuses for this reason.
 * You can provide a  field, then it will be set to "to be proofread".

Your tasks
You have some work to do even once the batch upload is complete:


 * If the works are part of a series, any index volume list templates (e.g. American Printer volumes) in the  column should be created also
 * All the Commons categories you specify should exist and be categorised
 * Finishing the pagelists on the index pages (the upload will include an automatically-generated pagelist from the IA or Hathi metadata, but this is usually incomplete)
 * Adding small scan link templates to Author and Portal pages as appropriate
 * Generally tidying up if there are other rough edges.

By making a batch upload request, you agree to undertake these tasks.