Creating Derivatives

Creating Derivatives

For generating access files for use

Screenshot of derivatives tab on processing app

The derivatives tab will be used if and when the original file format is not jpg or png for images, or mp3 and ogg for audio, and webm for video. The image server (IIIF) that works with Spe_DAO requires image files to create the pyramidal tifs that are created for the zoom functionality. IIIF also hosts audio and video in the web browser, so these files must be converted to be compressed. You must create derivative files for any and all files that will be presented via the access system (ArcLight). 

The separate masters/ and derivatives/ subfolders in the working packages are for managing this. After ingest, derivatives/ is empty, but archivists can optionally make derivative versions of files and place them there so that they'll be included in the AIP. Often the directory structure is repeated in both, so we can semantically associate original files and derivatives.

Manual Derivatives Creation

You can manually create derivatives using any tool by accessing it though the filesystem.

Automated Derivatives Creation

The Processing app can automate the the creation of common derivative files.

  1. Enter the Package ID for the package that has files in the /masters directory you want to convert to derivatives

  2. Select the input format, such as pdf, docx, wav, or mov.

  3. Select the output format you want to convert to, such as jpg or png for images, mp3 or ogg for audio, and webm for video.

  4. Optionally check the box and enter a subpath if you want to limit to a certain folder or file path.

    1. Use Unix-style (/) path separators or a double backslash (\\) for Windows-style path separators

    2. Subpaths are relative to the \masters folder, so if you want to run OCR on the PDFs only in this folder:

      1. \\Lincoln\Library\SPE_Processing\backlog\ua809\ua809_PNTNy7MWUCgTDDPfLztoEU\masters\UniversityofAlbany SUNY 202213310\Box 03\1986\1986-02-04_v73i03

      2. Then enter:

      3. UniversityofAlbany SUNY 202213310/Box 03/1986/1986-02-04_v73i03

  5. Optionally for images enter a max pixel range to Resize

    1. Example: 1500x1500

    2. In this example all images will be reduced to 1500 pixels on the longest edge. The aspect ratio will remain the same.

    3. This is useful for large yearbooks and other 300+ page items scanned at 400 dpi, which produces PDF derivatives that are too large for users to download.

    4. Screenshot of derivatives tab on processing app with resize option circled in red
  6. Optionally enter a Density number to convert to

    1. This is a DPI, so 72 will convert images to 72 dpi and 300 will convert images to 300 dpi.

  7. Optionally select "Monochrome" to convert images to black and white (should be used rarely).

  8. Click "Submit"!

    1. The Resize, Density, and Monochrome inputs just get passed to ImageMagick

Notes for consideration:

  • Derivatives can also be used if redactions are needed within the documents

    • Redactions are used when there is personal/private information in an archival object, but the rest of the object is meaningful/valuable to the collection. 

      • For example, there was a convention, and participants listed their personal cell phone numbers as a part of their contact information, but used their work addresses and phone numbers as well. We would want the participant list because it shows who and how many people attended the convention, but we do not want researchers or others to have access to the personal cell phone numbers. 

      • I as the arranger would make the judgement call that I want this list included, but I am going to manually edit the document to strike the personal numbers, 

        • Other things that you may want to strike can include:

          • Home addresses

          • Social Security numbers

          • Banking details 

        • Basically anything that is not easily found online/through a quick Google or social media search.

          • If you wouldn't want it listed online for you, chances are someone else would feel the same! 

  • You may need to create derivatives as a way of organizing digital packages.

    • If the arrangement (or disarrangement) of the package is not compliant with DACS standards, you may use the derivatives file to clean the master's finals. 

    • You would NOT want to select "update the master files" option when you are on the "Package AIP" step