SPE_DAO
The SPE_DAO folder is the “Digital Object Discovery Storage” for Special Collections Digital Archival Objects (hence the name “SPE” for Special Collections and “DAO” for Digital Archival Objects). It replaces the traditional digital repository with filesystem storage. It is human-readable, editable, and well-structured to enable reliable automated use.
You will very rarely be working within the SPE_DAO folder but it's still beneficial to understand how it works.
When you upload, both singular objects and batch uploads, to ArchivesSpace, a folder for each individual object is created within the corresponding collection folder located within SPE_DAO.
The label of the folder corresponds to the identifier (the refID in ArchivesSpace or the identifier in the manifest).
Within each of the individual object folders is:
The original file format (in the example below, the JPG)
If the object was originally uploaded to Hyrax it would have been converted to a PDF (shown below)
The .ptif files which are the file format used in our new image viewer (it allows for “zoomable” objects so that high resolution images can be zoomed in on to view all of their details)
The hocr (HTML-based OCR) folder contains the OCR files for the digital object, allowing for users to search within the objects for specific words or phrases. The hocr is created automatically through the processing app.
There is also a txt file folder containing the transcript for all of the digital objects. Different from the hocr, the txt file contains all of the words within the file in a complete format, not individually defined so that they can be searched one by one. jkk
For both hocr and txt files, if there are individual pages or images, there will be individual hocr and text files for each page or image.
The Content txt file is a file of all of the txt transcript files combined. It is the file that is uploaded to the digital object and displayed for users when they click on “Download Text transcription”
If, for some reason, you need to update a transcript to edit a mistake (the transcripts are automatically generated and sometimes it makes mistakes due to the quality of the image/symbols), you will need to edit this file.
Manifest.json- The manifest connects the contents to the image viewer (IIIF). It is a description of the structure and properties of the compound object and carries information needed for the client to present the content to the user, such as a title and other descriptive information about the object or the intellectual work that it conveys. Each Manifest usually describes how to present a single compound object such as a book, a statue or a music album.
Metadata.yml- The metadata.yml file is a YAML file containing digital object level metadata.
Thumbnail.jpg- Every object has a thumbnail image, which is used on the website as a representative sample of the object.
The only time you will need to click into the Spe_DAO folders is to ensure that the digital objects have been created (double checking the logs) or, if you run into an error you will need to delete out the object folders to recreate it.
Also, if updates are made to the objects after you have uploaded them, you may need to Recreate the DAO as part of the Single Upload Process. You will need to check in the Spe_DAO to confirm these changes.