Skip to content

Non-destructive uploading

Duncan Hall requested to merge duncanrefactor into master

Main improvements:

  • file structure from GB transfers is not altered
  • original files are only altered to add metadata
  • locally derived temporary files (like itemimage) are stored in a file structure that mirrors that generated by GB transfers
  • metadata spreadsheet is cross-validated with file structure before starting
  • logic has been divided by subject into modules

from 78rpm_uploading/main.py:

This script uploads 78rpm records files from George Blood into items with metadata from a .xslx file

First, there are a series of checks that are run on the spreadsheet and file structure. At this point certain groups of barcodes are identified as having files that are inconsistent with the supplied metadata, or vice versa. These groups of barcodes are skipped for the rest of the process and are reported at the end for review and retrial.

Barcode Group - a set of barcodes or rows that share the same barcodes up to but not including any trailing letters. A group will include a "master" row and all album rows in the case of a multi-disc work, or both sides to a single-disc work. If there are issues with any of the metadata rows in a row group or their corresponding files, none of the rows are processed into items.

Once initial checks and initial metadata processing are complete, there are a series of tasks defined in execute_process_steps() that are executed in order for each barcode group. Each group is assigned to a single multiprocessing thread to allow for work to continue while waiting for file uploads to complete.

In some cases, original files transferred to the vm by GB associates (to /1/georgeblood///*) are modified before upload:

  • .tif files have an image description added
  • .flac files have metadata added including title, artist, etc.

These metadata changes are executed in such a way that they will not stack when run multiple times row.

In other cases, entirely new files must be derived before uploading. Derived files are saved in a file structure that mirrors that of the source files, e.g. the metadata files for /1/georgeblood/// would be found in /1/georgeblood/preprocessed_files///. In the case of albums, master (in some cases merged) metadata would be written into /1/georgeblood/preprocessed_files///.

Merge request reports