The University Archives and its Focus on Fixity

The Council of State Archivists (CoSA) has designated today as Electronic Records Day and we’d like to use this occasion to provide updates about our efforts to preserve and provide access to born-digital archival records within the University Archives. I wrote about born-digital records in a previous blog post, but as a reminder, challenges unique to born-digital records include bit rot, technological obsolescence, and file authenticity.

Because the last challenge, authenticity, is such a vital piece of the archival puzzle, the Princeton University Archives recently revised its instructions for donors who transfer or donate archival materials containing digital records. You can find those procedures freely available on our website, so rather than repeat them here, it’s more useful to explain why we made the change. Our new policies better reflect a core property that helps archivists demonstrate the authenticity of digital records: fixity.

Archivists understand fixity to be verifiable evidence that a digital file has remained the same over time or across a series of events. Any number of things could impact a file’s fixity, from the purely mundane to the absolute sinister; a person opens a file to delete a punctuation mark or a virus attacks a server to corrupt every sixth block of data on a disk. To generate fixity information at the University Archives, we rely on cryptographic hash values, known in other circles as checksums. Computer programs produce these unique alphanumeric characters by using a variety of hash algorithms, with Message Digest (specifically MD5) and Secure Hash Algorithm (specifically SHA-1 and SHA-256) being the most widely used in archives and libraries.

Examples of MD5 cryptographic hash values

Examples of MD5 cryptographic hash values

With these cryptographic hash values created for each file, Mudd archivists are able to compile a manifest—yes, similar to a ship’s or flight manifest—and later verify if all the files that made it on board the ship (or disk or server or flash drive) are the same as those currently aboard; no additions, no subtractions, and no alterations.

After a transfer is complete, we can quickly verify fixity on each file using our newly installed Forensic Recovery of Evidence Device (FRED). Running a highly customized Ubuntu Linux operating system tailored to meet the needs of archivists and librarians handling born-digital records, this machine is capable of verifying checksums as well as reading most contemporary varieties of solid-state, magnetic, and optical media. I’ll share more about FRED in a future post.

Forensic Recovery of Evidence Device (FRED)

Forensic Recovery of Evidence Device (FRED)

While it’s no secret that cryptographic hash algorithms occasionally “collide”—which is to say, a program might assign the same hash value to more than one file—and that well-known attacks have occurred on different algorithms, such instances are extremely rare and an archival repository can safeguard against collision by using more than one algorithm, which Mudd most certainly does. Nonetheless, the focus on fixity is one of many ways the University Archives is working to secure tomorrow’s digital history today, by providing future users with authentic digital records. Happy Electronic Records Day!

Meet Mudd’s Jarrett M. Drake

drake-jarrett

Name/Title: Jarrett M. Drake, Digital Archivist

Responsibilities: As the digital archivist at Mudd, I’m responsible for the development, implementation, and execution of processes that facilitate the effective acquisition, description, preservation, and access of born-digital archival collections acquired by the University Archives. The emphasis on ‘born-digital’ is to distinguish my work from that of digitization, which is a process that converts analog material into digital formats. Born-digital records are those that originated as microscopic inscriptions of 0’s and 1’s on a piece of magnetic media.

mfmhd21

“Magnetic Force Microscopy (MFM) of a Magnetic Hard Disk,” taken from MIT

Preserving and providing access to those 0’s and 1’s, or bits, is too challenging a problem for any single person to solve, so many of my duties require me to collaborate with others in the University Archives and across campus. This often involves me meeting with our University Archivist, the Assistant University Archivist for Technical Services (to whom I report), and the University Records Manager. As exciting as it is to dive into the past by hacking away at old and new media—and trust me, doing this is really exciting—the most important element of my success is laying the infrastructure for our Digital Curation Program, which we initiated two months ago. Infrastructure is invisible to most of us but critical for all of us. More on that in future posts.

Lest I lead you to believe that I work exclusively in the digital realm, I also do things that archivists have always done: processing paper records and performing reference services in person, on the phone, and over email.

Ongoing projects: Because our Digital Curation Program is rather nascent, I spend a majority of my time drafting policy documents for the program as well as revising workflows for how we process born-digital records. Outside of that, I contribute to several Library-wide working groups and task forces. When I’m not doing one of those two things, you can probably find me working with a new digital preservation tool or strengthening my command of various operating systems.

Worked at Mudd since: I began at Mudd in November of 2013. Prior to Princeton, I served as University Library Associate at the Special Collections Library of the University of Michigan, a post I maintained for nearly two years while I completed my master’s degree in information science at the School of Information. Before Michigan, I had brief stints at the Maryland State Archives and Beinecke Rare Book & Manuscript Library.

Why I like my job/archives: Contrary to general perception, archivists are concerned equally with the future as they are with the past. Yes, we manage records that document past activities, but we do so only for future use by researchers. In this way, I see my job as a digital archivist as one that preserves the past in order to promise the future. That promise is harder to ensure when it comes to digital records, but it’s a challenge that I find to be terrifyingly exciting and incredibly meaningful. Also, I learn something new each and every day, which is one of the most fulfilling aspects of my work.

And though I put a lot of time and energy into curating bits, I joined the profession because I like people. I enjoy assisting them with their research questions and it gratifies me that I can contribute to the creation of new knowledge about the past. The roughest days I encounter are immediately turned around when a researchers says “I can’t thank you enough for your assistance” or “without you, I’m not sure I could have answered this question.” Those are my reminders that I chose the right profession.

Favorite item/collection: Recently I responded to a researcher who sought information about the first Japanese student to graduate from Princeton. I spent some time digging around our Historical Subject Files and our Alumni Undergraduate Records collection to learn that in 1876, Hikoichi Orita was the University’s first Japanese student to graduate.

orita

Authored by Walter Mead Rankin, 1884. Found in “Orita, Hikoichi,” Box 148, Undergraduate Alumni Records, Princeton University Archives, Department of Rare Books and Special Collections, Princeton University Library.

In addition to his alumni files, we have a copy of his student diary, which I told myself I would read slowly over my career. It’s in English, in case you’re interested in viewing it, too. This is a classic example where a researcher informs the interests of the archivist, instead of vice versa.