Archiving Email at the Princeton University Archives

Changes in leadership, especially at universities, give archivists an opportunity to transfer records into the archives. Such was the case when the current Dean of the College, Valerie Smith, accepted a position as the new president of Swarthmore College, a post she will assume in just over a month. Dan Linke, the University Archivist, and I visited her office to meet with Dean Smith and her staff to inform them of our procedures for transferring office records—paper documents, as well as born digital material such as Word documents, SharePoint sites, etc. Soon into the conversation we began to discuss the prospect of email capture, a task that we had only haphazardly done in the past through preserving Microsoft Word documents used to compose memos, PDF’s generated from email applications, and printouts included within paper collections.

Pictured here is the full email header from a message in the publicly available Enron Email Dataset.

Pictured here is the full email header from a message in the publicly available Enron Email Dataset. Click image to expand.

Two compelling reasons forced us to find a way to conduct an email transfer directly from Dean Smith’s account. First, she is a pioneer at Princeton many times over; in addition to being the first black woman to earn tenure at the University, Dean Smith later served as the first director of the University’s renowned Center for African-American Studies before becoming the first black person to serve as Dean of the College. Second, we knew that the previous methods of email transfer limited access possibilities and stripped emails of their contexts, including lost attachments, missing email header information, and inefficient search capacities.

With the office’s support and cooperation, the University Archives drafted a plan of action for how, in a few weeks’ time, we would appraise, acquire, and accession close to 100,000 messages that were organized only by year. The first in that sequence, appraise, deserves a closer look because it’s the easiest to overlook yet the most important one to do.

This screenshot from AC359 Tiger Hockey Email Newsletters is an example of how a PDF conversion of email correspondence condenses header information and limits searching.

This screenshot from the Tiger Hockey Email Newsletters Collection is an example of how a PDF conversion of email correspondence condenses header information and limits searching.
Click image to expand.

The bulk of papers written about preserving email—actually, papers written about most digital records—begin by describing the technological issues these records bring. These issues are indeed important, but they can inadvertently obscure the first job of the archivist: appraisal. Archivists don’t appraise for monetary value the way auction houses do; instead, we appraise by determining the historical value of records, which is more of an art than a science and is typically done in the context of an archive’s existing collections. This appraisal process typically yields a tiny percentage of documentation—somewhere between 5 and 10 percent—that might be kept from a person, office, or organization. The remaining 90 or 95 percent of records can and likely will be destroyed over time, a fact one of my professors in grad school frequently referenced to support his view that archivists are actually ‘the ultimate destroyers.’

Derridean arguments aside, appraisal actually serves a practical purpose for researchers, too. Consider Leonard Rapport’s 1981 article on reappraisal in which he invites reader to imagine two distinct worlds:

“In the first it suddenly becomes possible to keep a copy of every single document created, and, for these documents, a perfect, instantaneous retrieval system. In the second, and less blissful, vision the upper atmosphere fills with reverse neutron bombs, heading toward every records repository. These are bombs that destroy records only, not people. They come down and obliterate every record of any sort…Keeping these two events in separate parts of your mind, project forward a century. How different would the two resultant worlds be? I leave it to you to conjecture as you please. My own guess is that between these two worlds there wouldn’t be all that much difference.”

Perhaps most importantly, the need for appraisal reminds us that keeping all of everything says nothing of anything. Consider your own work or personal emails, for example. Just in the time you’ve spent reading this post, you may have received one email from a family member, another from your boss, one from an incessant reply-all sender, and a final one threatening disciplinary action for the former.

Headline from April 28, 1997, issue of the Daily Princetonian. See full story here.

Headline from April 28, 1997, issue of the Daily Princetonian. See full story here.

Realizing that email users of all varieties have this mix of professional, personal, and mundane messages in one email account, we knew we didn’t want to simply make a copy of Dean Smith’s entire email account for posterity. Instead of this blunt approach, we worked with the office—especially the office manager, Carla Hailey Penn, and the office technical expert, Marvin Waterman—to develop two separate lists: one containing the email addresses of relevant correspondents and one containing a list of keywords. We used these two lists to determine which of the almost 100,000 messages we wanted to preserve, rather than trying to build an impossible list of things we didn’t want to preserve. In this sense, we saw Dean Smith’s email as one large database that we could query to return a specific set of results.

The office staff was especially helpful with devising the lists; they knew how information flowed and to whom. They created a list of more than two dozen important correspondents’ names and a keyword list of close to 50 projects, programs, and other significant activities related to Dean Smith’s tenure. With these two lists, the office tech expert and I built two separate rules in Outlook that pulled from these lists to create an extended OR statement that copied (not moved!) the selected messages into a new Outlook data file. This method was suggested by Waterman and is also endorsed in a Guggenheim report published last August.

Running these two sets of rules across 100,000 emails took close to five hours spread across 3 different sessions, but afterwards we were left with just over 20,000 total emails; about 5,000 sent and 15,000 received. We then used the two lists to help inform the descriptive note for the finding aid, leaving the chronological arrangement in place because that’s how Dean Smith used them. Keeping in line with our normal access policy, the emails will not open for research until 40 years after their creation, or July 2050 at the earliest.

Finding aid view of 2014’s incoming messages.

Finding aid view of 2014’s incoming messages.

When the appraisal process was complete University Archivist Dan Linke observed that the 20,000 emails averaged out to 5,000 emails for each year of Dean Smith’s tenure, or more than a dozen for every day she served, and that if they were printed out on paper, it would be the equivalent of around one linear foot-plus of documents per year. This falls in line with the other records of the Dean of the College’s office dating back to the early 20th century (136 linear feet for 90 years). “While a successful appraisal cannot be measured by quantity alone,” Linke said, “we can test our conclusions by various means, and using this information, it appears that we have not over- or under-collected material.”

Email appraisal and access techniques will certainly get more refined over time. The most promising technology to this end is Stanford’s ePADD project, which is set to conclude at the end of June. But in the meanwhile, it’s important to maximize existing tools in email applications in order to efficiently determine which records fit into a collecting area, especially before administrators depart from their positions and lose access to their email. The intimacy of email mandates that those responsible for their preservation and access secure and maintain trust. Confidence from archivists, among other things, enables that trust, and confidence is built through conquering the first hurdle. Here at the Princeton University Archives, we’re confident, and hopefully after reading this post, you will be too.

One thought on “Archiving Email at the Princeton University Archives

  1. Pingback: CR2PA | Lectures autour du mail

Leave a Reply