Disk imaging workflow
Note: Do not insert your disk into the drive until Step 4
Step 1: Create a Local Folder for your Computer
When using the Linux computer, you will need to create a folder on the desktop for the specific collection you are working on. Create it on the home screen/desktop so that it's easy to find, that way you are not writing anything to the Processing Folder on the Library drive, which carries the risk of overwriting other people’s work or more “permanent” content. You can simply name it the collection name, or the collection name and the medium if you are worried about confusing it with multiple disk images. You will be moving the content (or whole folder depending on your preference and the rate of which you image) at the end of the process, which is why the Library SPE_Processing folder and ingest folder are linked.
Note: If you are planning to add multiple disk images to one larger folder, create subfolders for each individual disk. The tools that we use will look for similar extensions and formats; having multiple similar files in the same space can possibly break or confuse the programs, so make sure that you create individual spaces for each disk image (even within a larger folder space).
Step 2: Take a photo of the media
Use Guvcview, located on the left side bar
Images will automatically be placed in /home/bcadmin unless you change the path. You can change the path by selecting “Photo” then “File” and create a new path to your desired destination. A folder has been created called “Camera Capture” that students are currently using to hold their images until they can move it into their Desktop folder.
Taking the picture will need some finagling, so don’t worry if you don’t get it right away. So many things can impact how the image comes out including the lighting in the room and the state of the material. The most important thing is to ensure that the primary details of the material are captured i.e. identifying features like names or pictures that would help a researcher or future archivist differentiate the disk from others in the collection.
The images will automatically go to “/home/bcadmin” (unless you change it) find them there and then move them to the collection folder that you created (you can do this by dragging and dropping the files). You should be making individual folders for each of the disks (if there is more than one) and the pictures of the physical forms are the first “introduction” to the disk. You do not necessarily have to create subfolders for these photos, but you can if you feel like they will get lost in the shuffle or if there are many materials being captured.
Best practice says to change the name of the photo to better identify it during the process. If you are the primary user of the materials, this is not as crucial as if you are going to be passing the material onto another processing assistant or archivist. The picture will be added to the total information of the digital object, so the title will become null, but it still needs to be identifiable outside of this context. You can change it prior to ingest, or post-ingest if you would like to consider it as a derivative item, and leave it in the folder of the item that you are working in.
Step 3: Ensure that you are in “READ-ONLY” mode or use a Writeblocker
At the very least, you should set the USB mount policy to “READ-ONLY”. You may be thinking to yourself “I am using a CD or a Floppy Disk, that’s not a USB” and you are correct, but the reader you are using to view this antiquating technology is connected to the computer via USB.
Ensuring you are in “READ-ONLY” prevents you from being able to alter the materials on the disk that you are imaging. The Writeblocker does the same, albeit in a more complicated fashion, and is especially effective and necessary for “high value” items or items that you are more concerned with affecting. The Writeblocker is also more finicky and prone to not working with the drives that hold the disk. It may work one day and not work the next. Having the USB Mount Policy set to “READ-ONLY” provides the base level protection so as to not influence the materials on the disk.
Step 4: Insert disk into drive
At this point, you can insert the disk into the drive. The drive may have a light that indicates if it is working or not. You may see the disk pop up in the file directory/Linus file explorer, but may need to wait to open Guymager to see if the disk is being read. If it is not being read, you may need to use a different drive to read the disk. Unfortunately, it is an imperfect process and not all disks or readers react the same way. Guymager is the system that will tell you if the disk is being read, so if it is not there, it is not being read.
Step 5: Use Guymager to image
Guymager is the program that is going to “take the picture”, aka image, the disk. It is located in the left side bar and looks like a brown box/bag with a “G” on it. You can run multiple disks at once, if you utilize the multiple drives, but I would only recommend doing this once you feel more comfortable with the process and where the files go.
Guymager will ask for the computer login password so make sure you have it handy. You should open it after you have inserted the disk into the drive. Linux computers treat everything connected to it as “snaps” including hard drives and the inactive drives. Sometimes the disk you are reading will be aptly named, like “USB Floppy Disk” or “Thumb Drive” and other times the names will be less intuitive. DVD drives are named “TEAC DV_W28S-B”
If you do not see your disk drive, try hitting the “Rescan” button in the top left hand corner. If that does not work, you may need to eject and then reinsert the disk. If that does not work, try using a different drive. This process can be a bit temperamental so don’t get discouraged if a disk doesn’t scan, just move on to another disk and come back (if you can) or ask an Archivist for help if you run out of options.
Once you see your disk drive, right click on the name and select “Acquire Image”.
You will then be prompted with a series of choices.
Please select the following:
Use Linux raw dd image
Split image files over 2047 MiB
Use a consistent or descriptive filename
Only md5 checksums are necessary
Verifying the image is a good idea (even if it adds time)
If you are imaging a large disk, like a hard drive or thumb drive, this step may take some time. Guymager will have a progress bar and estimated completion time. If it is going to be a while, feel free to leave the computer and move onto another task while it runs in the background. Check periodically to see if it has completed.
Once the image has been acquired, double check that the destination file has either a “.dd” and/or“.000” file within it. There will also be a log file, detailing the steps Guymager took, the result of the Hash sum, and verification confirmation of the image.
If you have confirmed that the file exists, eject the physical disk by right clicking the icon and selecting “eject”.
Step 6: Mount the disk
You will be mounting the disk image that you took. Right click on the “.dd” or “.000” file and right click and select "Disk Image Mount" under "Scripts".
The mounted image, named with the file name that you chose, will appear in the left side bar. You can also use the folder/explorer to see what mount you are viewing. At this point, you can also view the files within the image without worrying about influencing or impacting the original materials. You should still be in “READ-ONLY” mode, but you can reconfirm by following the same steps as earlier, which should prompt you to a screen that says “The USB mount policy is already in the READ-ONLY state”.
Looking at the material (and clicking into it) will help to inform which of the following (optional) steps you can take to receive more information about the image. Does there appear to be personally identifying information? Are the files paths or contents looking corrupted? Do the dates appear to be conflicting? Do you receive error messages when opening the files?
All of these issues help us to gather more information that future archivists and researchers can use to understand the material on the disk and how it came to be that way. It is supporting the context of the material.
A mounted image will look like this:
If the image doesn't mount, use disktype to detect image format
Open a Terminal (CTRL + ALT + T)
Navigate to a disk image with:
cdto change directory:cd Desktop/myfolderlsto list files and directories:ls
Run disktype
disktype myimage.dd
View output:
The filesystem type for this image is FAT16.
Step 7: Copy Files to Your Desktop Folder
From your mounted image, you will be able to see the files on your disk. You will need to copy these files into the folder you created on your desktop.
Create a subfolder in the folder associated with the disk you are imaging labeled “Files”
Click into your mounted disk and select all the files (ctrl+a) and then copy them (ctrl+c)
Click into the “Files” folder you created and paste in the copied files (ctrl+v)
Now, when you move the files into the ingest folder in the final step, you will have the accessible versions.
Note: if you find corrupt or PII files in the following step, you will need to delete those files from the folder before ingesting.
Step 8: Forensics and Reports Applications
In the top left hand corner, there is an option under “Applications” for “Forensics and Reports”. Brunnhilde and Bulk-Reviewer can both be found here.
Brunnhilde produces an easy to read and share HTML report to supplement existing file format identification and characterization tools. It helps to identify different file formats, creation and editing dates, and more. It also creates a full .csv output from Siegfried and an additional folder of .csv results which have been separated and organized based on their respective categories. All of this will help future archivists to understand how we received the material and, possibly, what we had to do to disseminate it.
Note: Through trial and error we have learned that, especially for floppy disks, the “Directory” option is working more consistently than the “Disk Image” option. The “Directory” in this case, is the mounted disk on the side bar. Click into it, and then the “open” button in the top right hand corner.
You will need to provide a space in the “Destination” folder for the Brunnhilde Report to go into. You should not create the folder before providing it a path, rather include the same path as your input, and then add some variation of a “/Brunnhilde” including the forward slash, to tell the system what folder name you would like the report to have and where to put it.
You should not worry about linking an accession number at this point as it will be enveloped in the same accession as the rest of the physical material in your collection. You do need an identifier, and you should once again name it something that makes sense to you and will hopefully make sense to a future archivist.
Once this information has been added, you can hit the “Start scan” button and the “Status” bar will begin to fill. You will receive a message that the scan is finished and you can look in your desktop collection folder for the reports within the folder you have labeled with your “identifier” field.
Brunnhilde will provide you with your carved files, or the usable files that you will upload into the system. You may need to add/edit files extensions if you get a file mismatch, but it should provide the best capture of what is on the disk. If you are unable to run a Brunnhilde report, copy and paste the mounted disk image into the folder you have created. You should move on from this step having the files in your possession (in the desktop collection folder).
Optional Step: For Personally Identifying Information concerns run BulkReviewer
If you are worried, either by the context of your collection and its materials or the information that you have received through the above steps, that there is potentially personally identifying information (PII) within the disk images you can run BulkReviewer to look for PII.
Bulk Reviewer is also located under “Applications” for “Forensics and Reports”.
You will be utilizing the “Scan new directory or disk image”. You will be using the MOUNTED disk image to complete this scan. If that is the case, choose the “Directory” option.
You can run this on a disk image, but you will want to specifically use the unmounted, .dd or .000 file version. Be careful not to move these reports into the Brunnhilde folder as some of the information may be duplicative and you want to keep it separate.
You should also expand the “Options” to ensure that you are scanning for the more accurate information.
Most importantly for this step is the “Social Security Number identification mode” which allows you to adjust the format the BulkReviewer will look for SSNs.
Remember! Many schools previously used SSNs for student ID numbers and that will not contain dashes, so if your materials relate to students records or data, you will want Bulk Reviewer to be more lenient with how it looks for SSNs.
The other options hold less weight for the work that we are doing. If the disk is more modern, you can request to “Include network data”. If you feel confident with your ability to read metadata files, you can also include that as well.
After the scan has concluded, BulkReviewer will provide buttons for you to download the reports: .json will give you a file that looks like a webpage (and also looks like the manifest pages for the digital objects hosted on the albany.archives.edu site), csv looks like an excel sheet. You can choose to download both or just one, depending on what makes sense to you and what displays the data clearly. After downloading, you should add it to your desktop folder. If you are concerned about any of the information in the report, or need help understanding it, ask an Archivist.
Optional Step: For virus concerns, run Clamtk
If you are concerned about the origin of the disk, or the materials you have seen via the disk image give you pause, feel free to run Clamtk on the directory or individual files.
Clamtk will alert you if it finds anything, but it is not a common problem.
Step 8: Ingest the Image or the files
After you have concluded all of your disk imaging for this collection, or are ready to move forward with adding your already captured disk image to this collection, you may create a folder within ingest (Library/SPE_Processing/ingest) named for the collection ID (ex: apap000, ger000, mss000 etc.). After you have done so, move your desktop folder into the ingest folder.
You will then follow the ingest workflow found here.
Make a folder in Library/SPE_Processing/ingest with the collection ID
Ingest with the Processing app using the collection ID