Processing Ingested Digital Files

Tools to help process digital files in \\Romeo\processing

This documentation is out of date, use Arranging Digital Materials in ArchivesSpace


Logon with: 

ssh railsdev

Creating image derivatives with convertImages.py

convertImages.py creates compressed images (jpg, png) and pdfs from master images.

  1. Log on to processing server or clone the ingest-processing-workflow repo from Github.
  2. cd or open a command line shell in the ingest-processing-workflow directory.
    1. on server: cd /opt/lib/ingest-processing-workflow

    2. the git repo directory you cloned on your local machine
  3. Run: python3 convertImages.py <package ID> -i input -o output
    • Examples:
      • python3 convertImages.py apap301_h4fMLPL48CuxFPLpYxTmkL -i tif -o jpg
      • python3 convertImages.py ua950.009_qUQbs7GYhzmB3uL3yjH5uX -i tif -o pdf
      • python3 convertImages.py ua809_JxkK2VWVFu7F8VWaTe72BG -i pdf -o pdf
  4. (optional) A -p flag with a subpath limit the input to that path, relative to the masters directory:
    • Example:
      • python convertImages.py ua802.011_xMHVAto2AuzLfd2NtP9STY -i tif -o pdf -p TIFFs/edited

      • This will only convert files in:[package]\masters\TIFFs\edited
  5. Files will be created in \derivatives subfolder
    1. directory structure will also be duplicated
    2. for PDF outputs, all input images in the same folder will be joined as a single PDF in the order of the filesystem
    3. (Server only for now) PDF inputs and PDF outputs will combine in PDF files in folders in the order of the filesystem

Dependencies:

Arranging Digitized and Born-Digital Materials in ArchivesSpace

  • Use asInventory (asUpload.exe) to enter initial description in ArchivesSpace
  • Use asInventory (asDownload.exe) to export the same description from ArchivesSpace with the new identifiers
    • Be sure to use the whole spreadsheet, even if you are not adding digital objects for all items or the order will be altered. Just leave the DAO field blank for these items.
  • Export the changes you made in ArchivesSpace to ArcLight

  • Place a copy of the exported spreadsheet in the package's \metadata directory. it may help to any other .xlsx files in that folder to a subfolder so they don't affect buildHyraxUpload.py, but you can also exclude these with a -f flag.

    \\Romeo\SPE\processing\<collectionID>\<packageID>\metadata


  • Use listFiles.py to make a .txt file of all derivatives

    python3 listFiles.py apap015_CijY985mDUy6hdLSPPYqRR

  • Use the derivatives.txt file in the package root to copy and paste and arrange derivative relative paths in to DAO column in exported asInventory spreadsheet



  • If you need to add lines, begin the process of asUpload.exe and asDownload.exe again. This will create ASpace IDs for the new line which are needed for the next step.

  • Run buildHyraxUpload.py with the package ID as an argument to create Hyrax Upload .tsv file.

    sudo python3 buildHyraxUpload.py ua950.012_Xf5xzeim7n4yE6tjKKHqLM

  • If there are other .xlsx files in the metadata folder, you can move these files to a subfolder or use the optional -f flag to only use specific files


sudo python3 buildHyraxUpload.py ua397_cv3E3okEhxKARzZunE4Dom -f "Tower Tribune_exported.xslx"


  • Add Resource Type, Licenses or Rights Statements to all objects in the Hyrax upload .tsv file created by the script

Resource Types:

      • Audio
      • Bound Volume
      • Dataset
      • Document
      • Image
      • Map
      • Mixed Materials (Avoid)
      • Pamphlet
      • Periodical
      • Slides
      • Video
      • Other (Avoid)

Licenses:

      • BY-NC-ND: https://creativecommons.org/licenses/by-nc-nd/4.0/
      • Public Domain: http://creativecommons.org/publicdomain/mark/1.0/
      • Unknown

Rights Statements (if License is "Unknown"):

  • Move a copy of all derivatives to be uploaded to \\Lincoln\Library\ESPYderivatives\files
    • (Optional) Use a collection ID subfolder if convenient: \\Lincoln\Library\ESPYderivatives\files\apap101

  • Move the Hyrax Upload .tsv file to the Hyrax import directory: \\Lincoln\Library\ESPYderivatives\import

  • Upload the files to Hyrax using the Batch Upload to Hyrax documentation (Starting with Step 4)
    Note: All files will be public by default

  • When the import is finished, copy the completed .tsv file back to the package's metadata file from: \\Lincoln\Library\ESPYderivatives\complete

  • Run updateASpace.py with the package ID as an argument to add the correct URIs back into the ASpace export

    python3 updateASpace.py ua680_FbBxaYn8Jm9tBxuXsQ6R3L

  • Don't forget to unpublish/republish to make sure the collection is exported!

  • Use asInventory to re-import the ASpace spreadsheet with the correct URIs back into ArchivesSpace

  • Run packageAIP.py with the package ID as an argument to combine the processing package and the SIP in to an AIP
    • Must be run from the processing server, not a local machine
    • If running without flags, use "packageAIP" function from /etc/profile.d/processingFunctions.sh
      • Will log to \\Romeo\SPE\processing\log\<collection ID>
    • Use a -u flag to use the master files from the processing package instead of the files in the SIP
    • Use a -n flag for no derivatives, this will only package master files
    • If this is used, you must also delete the SIP manually after examination with safeRemoveSIP.py

    sudo python3 packageAIP.py ua950.012_Xf5xzeim7n4yE6tjKKHqLM

    packageAIP ua950.012_Xf5xzeim7n4yE6tjKKHqLM


    sudo python3 packageAIP.py -u ua435_LUUaFPvezhmdcwwnVX3drV

    sudo python3 packageAIP.py -n ua950.009_qUQbs7GYhzmB3uL3yjH5uX

    python3 safeRemoveSIP.py ua435_LUUaFPvezhmdcwwnVX3drV

Replacing /masters with Masters in SIP

The masters directory in /processing is a duplicate of the SIP. If you edit or delete the masters