The Council of State Archivists (CoSA) has designated today as Electronic Records Day and we’d like to use this occasion to provide updates about our efforts to preserve and provide access to born-digital archival records within the University Archives. I wrote about born-digital records in a previous blog post, but as a reminder, challenges unique to born-digital records include bit rot, technological obsolescence, and file authenticity.
Because the last challenge, authenticity, is such a vital piece of the archival puzzle, the Princeton University Archives recently revised its instructions for donors who transfer or donate archival materials containing digital records. You can find those procedures freely available on our website, so rather than repeat them here, it’s more useful to explain why we made the change. Our new policies better reflect a core property that helps archivists demonstrate the authenticity of digital records: fixity.
Archivists understand fixity to be verifiable evidence that a digital file has remained the same over time or across a series of events. Any number of things could impact a file’s fixity, from the purely mundane to the absolute sinister; a person opens a file to delete a punctuation mark or a virus attacks a server to corrupt every sixth block of data on a disk. To generate fixity information at the University Archives, we rely on cryptographic hash values, known in other circles as checksums. Computer programs produce these unique alphanumeric characters by using a variety of hash algorithms, with Message Digest (specifically MD5) and Secure Hash Algorithm (specifically SHA-1 and SHA-256) being the most widely used in archives and libraries.
Examples of MD5 cryptographic hash values
With these cryptographic hash values created for each file, Mudd archivists are able to compile a manifest—yes, similar to a ship’s or flight manifest—and later verify if all the files that made it on board the ship (or disk or server or flash drive) are the same as those currently aboard; no additions, no subtractions, and no alterations.
After a transfer is complete, we can quickly verify fixity on each file using our newly installed Forensic Recovery of Evidence Device (FRED). Running a highly customized Ubuntu Linux operating system tailored to meet the needs of archivists and librarians handling born-digital records, this machine is capable of verifying checksums as well as reading most contemporary varieties of solid-state, magnetic, and optical media. I’ll share more about FRED in a future post.
Forensic Recovery of Evidence Device (FRED)
While it’s no secret that cryptographic hash algorithms occasionally “collide”—which is to say, a program might assign the same hash value to more than one file—and that well-known attacks have occurred on different algorithms, such instances are extremely rare and an archival repository can safeguard against collision by using more than one algorithm, which Mudd most certainly does. Nonetheless, the focus on fixity is one of many ways the University Archives is working to secure tomorrow’s digital history today, by providing future users with authentic digital records. Happy Electronic Records Day!