Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

How ArchivesSpace, ArcLight, and Hyrax are synced overnight

Overnight Export and Indexing Scripts

High-Level Overview

What Each Script Does

exportPublicData.py

  • Each night, exportPublicData.py uses ArchivesSnake to query ArchivesSpace for resources updated since the last run.
  • For collections with the complete set of DACS-minimum elements it exports EAD 2002 files and for collections with only abstracts and extents it saves them to Pipe-delimited CSVs.
  • It also builds a CSV of local subjects and collection IDs.
  • All this data is pushed to Github.

staticPages.py

Indexing Shell Scripts

  • Later, collection data is updated with git pull and indexNewEAD.sh indexes EAD files updated in the past day with find -mtime -1 into the ArcLight Solr instance.
  • There are also additional indexing shell scripts for ad hoc updates.
    • indexAllEAD.sh reindexes all EAD files
    • indexOneEAD.sh indexes only one EAD by collection ID (./indexOneEAD.sh apap101)
    • indexOneNDPA.sh indexes one NDPA EAD file, necessary because they have the same collection ID prefixes
    • indexNewNoLog.sh indexes one EAD file, but logs to the stdout instead of a log file
    • indexOneURL.sh indexes via a URL instead of from disk (not actively used)

processNewUploads.py

  • Finally, processNewUploads.py queries the Hyrax Solr index for new uploads that are connected to ArchivesSpace ref_ids, but do not have accession numbers.
  • It downloads the new binaries and metadata and creates basic Archival Information Packages (AIPs) using bagit-python
  • It then uses ArchivesSnake to add a new Digital Object Record in ArchivesSpace that links to the object in Hyrax
  • Last, it adds a new accession ID in Hyrax
  • (Also check out Noah Huffman's talk that probably does this better [Direct Link].)

dacs.py

  • A simple library that converts Posix timestamps and ISO 8601 Dates to DACS-compliant display dates.
  • exportPublicData.py uses this to make dates for the static browse pages.

imageaday.py

  • No labels