Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 31 Next »

For materials that already exist in our collections and have been previously accessioned, like scans received from a vendor for digitized in house.

Ingesting scans of paper materials, digitized audio or video content, or born-digital materiasl, places them in a consistent storage with a backup. This prevents files from getting lost or accidentally deleted and ensures that they will be managed overtime.

\\Lincoln\Library\SPE_Processing is like our stacks. Ingesting items is like placing them on the stacks instead of in an office or table somewhere.

During ingest, a second, read-only copy is created in \\Lincoln\Masters\Archives\SIP in a bagit bag. You shouldn't have to worry about this copy, but this way we always have a second copy of the ingested files in case of errors or accidental deletion.

How to ingest digital materials

  1. Place files you want to ingest in the ingest folder in a folder named with the collection ID.
        • The ingest folder is: \\Lincoln\Library\SPE_Processing\ingest
        • Digitized files will be logged in the DigitizationExtentTracker.xlsx at \\Lincoln\Library\SPE_Automated\DigitizationExtentTracker so we can track the size and quantity of what we're digitizing.
        • Files here can have subfolders and any structure that is useful for preserving any meaningful order.
      ingest/
          ├─ apap101/
          │  ├─ minutes.docx
          │  ├─ report.pdf
          ├─ ua950.012/
          │  ├─ Issue1/
          │  │  ├─ page1.tif
          │  │  ├─ page2.tif
          │  │  │  ...   
      
        • Derivatives and metadata files can be added pre-ingest by placing them in subfolders for "derivatives" and "metadata" within the collection ID folder. Note: this means that original files cannot have root directories named "derivatives" or "metadata.
      ingest/
          ├─ ua746/
          │  ├─ image1.png
          │  ├─ image2.png
          │  ├─ ...
          │  ├─ derivatives/
          │  │  ├─ image1.jpg
          │  │  ├─ image2.jpg
          │  │  ├─ ...
          │  ├─ metadata/
          │  │  ├─ image_list.csv
          ├─ ...
  2. Enter the collection ID in the Ingest tab of the processing app, and click "Submit"
  3. Checkout the log to see if the ingest was sucessful or had any errors.

Simple Ingest

  1. Create a folder named for the collection ID in \\Lincoln\Library\SPE_Processing\ingest
    1. Use Find-It to find the correct collection ID
    2. Examples:
      1. \\Lincoln\Library\SPE_Processing\ingest\apap101
      2. \\Lincoln\Library\SPE_Processing\ingest\ua809
      3. \\Lincoln\Library\SPE_Processing\ingest\apap138
  2. Log on to the railsdev Processing server
    1. Open a Command Line shell
    2. run ssh railsdev
  3. Run: ingest <collection ID>
  4. You can now type "exit" to log off the server and the command will run in the background
  5. Check if an ingest is running: check ingest
  6. Results will log to \\Lincoln\Library\SPE_Processing\ingest\log\<collection ID> as: <timestammp>-ingest-<collection ID>.txt
  7. complete (smile)

Results of Ingest

  1. Files will be packaged unto a SIP bag here: \\Lincoln\Masters\Archives\SIP\<collection ID>\<package ID>
    1. SIP and AIP packages are here: https://github.com/UAlbanyArchives/packages

  2. Processing folder for package is created in \\Lincoln\Library\SPE_Processing\backlog\<collection ID>\<package ID>

  3. Master files are placed in \masters subfolder

  4. Example Processing package:
    • ua809_JxkK2VWVFu7F8VWaTe72BG
      • derivatives
      • masters
      • metadata

Examining Running Ingest

  • When you run "ingest," you will see the PID or process ID:
[1] 16994
  • To list running ingest processes run: check ingest
$ check ingest
gw234478 16994 12.9  0.2 206436 21184 pts/0    D    10:13   0:30 python3 /opt/lib/ingest-processing-workflow/ingest.py apap301
  • if you need to stop the process (not recommended) use this command:
sudo kill -9 <PID>

What is happening with "check"

  • "check" is a function defined in /etc/profile.d/processingFunctions.sh that runs:  ps aux | grep [i]ngest, etc.

If process is completed, but ingest folder being is not deleted

  • Run python script \\LINCOLN\Library\SPE_Processing\checkIngest.py
    • It does a compare between Ingest and Backlog to see all of the files were moved successfully.
  • Check logs: \\LINCOLN\Library\SPE_Processing\ingest\log
  • An example of a log report where the ingest folder couldn't be deleted:
  • If error, sort by date modified, a successful log should resemble this:
  • I

Advanced Use

Ingesting from directory other than \\Lincoln\Library\SPE_Processing\ingest

  1. Must use path accessible to the railsdev server
  2. Must convert to Linux path:
    • \\Romeo\SPE\folder1\folder2 is /media/SPE/folder1/folder2
    • \\Lincoln\Masters\Special Collections\Electronic_Records_Library is /media/Masters/Special Collections/Electronic_Records_Library
  3. Run ingest using -p flag:

ingest apap101 -p "/media/Masters/Special Collections/Electronic_Records_Library/apap101"

  • Will still log to 

    \\Lincoln\Library\SPE_Processing\ingest

What is happening:

  • ingest is a function defined in: /etc/profile.d/processingFunctions.sh

  • It runes a python script: /opt/lib/ingest-processing-workflow/ingest.py
    • relies on packages: /opt/lib/ingest-processing-workflow/packages/SIP/...

      usage: ingest.py [-h] [-p PATH] [-a ACCESSION] ID
      
      positional arguments:
        ID                    Collection ID for the files you are packaging.
      
      optional arguments:
        -h, --help            show this help message and exit
        -p PATH, --path PATH  Path of files to ingest. Folder will be removed
                              afterwords.
        -a ACCESSION, --accession ACCESSION
                              Optional ArchivesSpace Accession ID for new
                              acquisitions.

nohup python3 /opt/lib/ingest-processing-workflow/ingest.py apap301 >> /media/SPE/ingest/apap301-ingest.log 2>&1

  • no longer uses pyenv
    • Uses a pyenv called "ingest" by running: pyenv activate ingest
      • List all pyenv with: pyenv versions

  • No labels