Meet the Manuscripts Division 2017 Summer Fellow

Under the supervision of the Manuscripts Division processing team, the summer fellow will be assisting with several key projects, including “traditional” paper-based processing, processing born-digital media, inventorying AV materials, and researching access options for born-digital and digitized AV content.

Kat at Prambanan, a Hindu temple in Indonesia

Name: Kathryn Antonelli (but feel free to call me Kat!)

Educational background: I received my undergraduate education from Temple University. My degree was in Media Studies and Production, with a minor in French. I’m now about halfway through my Master’s program in Library and Information Science through the University of South Carolina’s distributed education option. This summer, I’m conducting an independent study on the ethics of archiving audiovisual materials (especially within collections of indigenous and minority cultural groups), so if you have any leads on interesting articles to read please do let me know. 🙂

Previous experience: Before finding my interest in archiving, I worked in event production at the Barnes Foundation in Philadelphia. More recently, after moving to Chicago and starting my MLIS, I’ve had the opportunity to intern at the Gerber/Hart Library, the Chicago Symphony Orchestra, the Oriental Institute, and the Newberry Library.

Why I like archives: I like archives for two reasons: the stories they tell, and the mysteries they solve. I do truly enjoy working with paper-based collections, but after my undergraduate program I became much more aware of how audiovisual media presents—or omits—information, which made those materials and the ways we can use them even more interesting to me. And, after a childhood full of Nancy Drew novels, I’ll count anything from puzzling out the (accurate!) birth date of a well-known dancer to identifying people in a photograph as a type of mystery solving.

Other interests: While baseball season is a lot of fun, and the weather is much nicer, I’m rarely sad for summer to end because it means college football is about to start. I’m an ardent Temple fan, of course, but I also watch every other game I can. My friends are always entertained by the irony, since outside of watching sports I am not a competitive person at all.

Projects this summer: I’m excited that my first task at Princeton is to process the Albert Bensoussan papers. The collection is in both French and Spanish and I love working with foreign language materials. Later this summer, I’ll be taking on more tasks with our born-digital holdings, so I’m also looking forward to learning how to use the new FRED machine to work with files in the Toni Morrison collection.

Moving Beyond the Lone Digital Archivist Model Through Collaboration and Living Documentation

Click here to view slides.

Below is the text of a presentation Elvia Arroyo-Ramirez, Kelly Bolding, and Faith Charlton gave earlier this month at the 2017 Code4Lib conference in Los Angeles, CA. The talk focused on the Manuscripts Division Team’s efforts to manage born-digital materials and the challenges of doing this work as processing archivists without “digital” in their titles. 

Hello everyone, welcome to the last session of the last day of code4lib. Thank you for sticking around.

What we want to talk about in the next 10 minutes are the numerous challenges traditional processing archivists can face when integrating digital processing into their daily archival labor. Shout out to UCSB, NCSU, and RAC for presenting on similar topics. Knowledge, skills, and institutional culture about who is responsible for the management of born-digital materials can all be barriers for those that do not have the word “digital” in their job titles.

Our talk will discuss steps the Manuscripts Division at Princeton University has taken to manage its born-digital materials through collaboration, horizontal learning, and living documentation.

But first, we’ll introduce ourselves: Hi, I am:

  • Elvia – I am the Processing Archivist for Latin American Collections
  • Kelly – I am a Manuscripts Processor
  • Faith – I am the Lead Processing Archivist for Manuscripts

We, along with two other team members, including Allison Hughes and Chloe Pfendler, who both contributed to efforts we will discuss here, form part of the Manuscripts Division in our department. And though we are all “traditional” processing archivists who do not have the word “digital” in our titles, we’ve increasingly encountered digital assets in the collections we are responsible for processing.

First we wanted to give everyone a breakdown of our department. Princeton’s archival repositories are actually physically split between two libraries with three main divisions. The Manuscripts Division (where we are located) is in Firestone Library; Public Policy and the University Archives are located several blocks away at Mudd Library. The library currently employs one dedicated Digital Archivist for the University Archives and that is Jarrett Drake, whom without his expert guidance and skill sharing, we wouldn’t be giving this presentation. Jarrett has really set the tone for horizontal learning and opening opportunities for skill building and sharing across the divisions of the department to empower his colleagues to take on digital processing work.

With that said, the Manuscripts Division has no digital archivist, so digital processing responsibilities are distributed across the team, which initially left us feeling like [gif of Ghostbusters team at the onset of meeting a ghostbusting challenge].

To dive into this type of work we needed to take some first steps. 

We literally jumped at the chance to begin managing our digital backlog by participating in SAA’s 2015 Jump in 3 initiative, which allowed us to gain intellectual control over legacy media within the division’s 1600 or so manuscript collections. We also began updating pertinent documentation, such as our deed of gift, and drafting new guidelines for donors with born-digital materials. We also began assembling our first digital processing workstation – a dual booting BitCurator and Windows 7 laptop connected to various external drives, including a Kryoflux for imaging problematic floppy disks.

With Jarrett’s assistance we began processing born-digital materials using the workflows he and his predecessors had developed for University Archives. We’ve also experimented with new tools and technologies; for example, setting up and using the KryoFlux and creating bash scripts to reconcile data discrepancies. So far, our work continues to be an ongoing process of trial, error, and, most importantly, open discussion.

Okay, let’s talk about horizontal learning. As an increasing number of archivists in our department were gaining the skills necessary to handle digital processing, the opportunity to share our expertise and experiences across divisions materialized. The following are two examples of how we’ve built this collaborative approach.

Over the last year a group of archivists from the across the department, including the Digital Archivist, came together to form DABDAC, the Description and Access to Born-Digital Archival Collections working group, as a means of maintaining an open forum for discussing born-digital description and access issues.

Members meet biweekly to discuss case studies, fails, potential new tools they are interested in experimenting with, readings, additions to workflows, etc. The workgroup follows a “workshop” model; whoever has a current description or access issue can bring it to the meeting’s agenda and ask the collective for advice.

Creating a horizontal skill-sharing environment has boosted our confidence as nascent digital archivists. Now with a baseline understanding of digital processing and the tools we need to do this type of labor, we sought the advice of our peers within the profession to help inform the development of our very own digital archives workstation. The team developed a survey asking 20 peer institutions about their local setup, which ultimately informed our decision in purchasing a FRED machine. Thanks to those who responded and provided us with in depth and extremely helpful responses.

Another key theme that has emerged from our experiences is the importance of living documentation. By this, we mean workflow documentation that is:

  • collaboratively created;
  • openly accessible and transparent;
  • extensible enough to adapt to frequent changes; and  
  • flexible enough to use across multiple divisions.

Managing living documentation like our Digital Records Processing Guide on Google Drive allows us to maintain tried-and-true guidelines vetted by the Digital Archivist, and supplemented by other archivists who work with digital materials.

We currently use the Comments feature to link from specific steps in the workflow to separate Google Docs, or other online resources that can inform decision-making or provide working alternatives to specific steps. We also write and link to documents we call “reflections.” These reflection documents detail improvised solutions to problems encountered during processing so that others can reuse them. By expanding our workflows this way, we extend the value of time dedicated to experimentation by documenting it for future repurposing.

Digital processing also presents opportunities for archivists to develop workflows collaboratively across institutions, especially since archivists often adopt digital tools developed for other fields like forensics. These tools often come poorly documented or with documentation intended for users with very different goals. One example is the KryoFlux, pictured here, a forensic floppy controller that many archivists have adopted. While our KryoFlux arrived from Germany with a few packages of gummy bears, the setup instructions were not so friendly. Luckily, we have benefitted tremendously from documentation that other repositories have generously shared online, particularly guides created by Alice Prael and others at Yale. UCLA’s Digital Archivist Shira Peltzman also recently asked us to contribute our “Tale of Woe” to a collaborative KryoFlux User Guide currently being drafted.

Before we conclude, we want to acknowledge both the particular institutional privileges that allow us to conduct this work as well as the broader structural challenges that complicate it. We are fortunate that the structure of our department affords processing archivists the time necessary to collaborate and experiment, as well as the material resources to purchase tools.

At the same time, while archivists are shifting functionally into more technical roles, institutional structures do not always acknowledge this shift. In our collective experience as an all-female team, we’ve faced challenges due to gendered divisions of labor. Even though the library and archives profession swings heavily female, technical positions in libraries still remain predominantly male. When these gender-coded realities are not acknowledged or challenged, undue and sometimes stubborn expectations can be placed on those who are expected to do the “digital” work and those who are not. For those in “traditional” processing roles with technical responsibilities that now fall within their domain, their labor can be often underappreciated or unacknowledged.

To wrap up, the realities of contemporary manuscript collections have made it clear that the lone digital archivist model no longer works for some institutions, particularly larger ones. As a team, we have met the challenge of integrating digital processing into our regular work by focusing on collaboration, horizontal learning, and living documentation. Although digital processing is new for us, we’ve been able to apply many skills we’ve already developed through prior work with metadata management, and we encourage our fellow archivists to find confidence in these skills when jumping into this work. We wanted to share with you the work we’ve done locally in hopes that our case study may empower anyone in a “traditional” processing role to take on the work that’s often been confined to that of the “digital archivist,” particularly by reaching out to others, whether they be in a different division, department, or institution.

We look forward to further collaboration with other colleagues at our home base and hope to continue building relationships and collaborating with others in the profession at large.
We leave you with a bibliography of additional resources and our contact information, and some gummy bears. Thank you.

Princeton goes to RAC!

Portrait of John D. Rockefeller Jr. (1874-1960)

On Wednesday, January 18th the Manuscripts Division team went on a field trip to the Rockefeller Archive Center (RAC) located in Sleepy Hollow, NY. The team, who consists of Kelly Bolding, Faith Charlton, Allison Hughes, Chloe Pfendler, and myself, was graciously invited by RAC’s Assistant Digital Archivist Bonnie Gordon to meet with our RAC counterparts and have discussions about born digital processing, in specific to knowledge sharing, providing peer support, and horizontal leadership. The team also received a tour of the Center by the Director of Archives, Bob Clark.

The “seed” to this exchange was planted when Faith reached out to Bonnie about RAC’s digital processing workstation specifications (hardware, software, peripheral tools, etc.), which is part of a larger endeavor that we have taken on to best inform building our own workstation (more on this project to come!). In the exchange, Bonnie asked about DABDAC (Description and Access to Born Digital Archival Collections), a peer-working group here in our Department that I had previously mentioned in a presentation I gave at last year’s PASIG meeting at MoMA in New York. Side note: both Bonnie and her colleague Hillel Arnold wrote about their PASIG experience in RAC’s Digital Team blog, Bits & Bytes, which is an excellent resource to keep abreast of digital preservation news and RAC’s innovative projects. In our communication, it became clear that the two respective processing teams at Princeton and RAC were currently undergoing similar efforts of collaborative knowledge and skill set building with regards to born-digital processing. In fact, representatives of both teams are set to give presentations at the upcoming 2017 code4lib conference (*cough*, *cough*) that revolve around how to build competence (and confidence) across an entire team, regardless of whether the word “digital” appears in an archivist’s job title.

Considering that both teams are figuring out ways to get everyone on their team up to speed with digital processing, we decided to meet, to learn from each other, and talk strategy out loud. Here is what I learned from the Rockefeller Archive Center:

RAC’s institutional history:

I’ve always wondered what else could be found in the Rockefeller Archive Center other than Rockefeller family archives. It turns out RAC is also home to the archives of many philanthropic and service organizations like the Ford Foundation, the Commonwealth Fund, and other organizations that were founded by Rockefeller family members like the Rockefeller Foundation. The Center operates in an actual house that was originally built for Martha Baird Rockefeller, John D. Rockefeller Jr.’s second wife, which makes for a very interesting set up for an archival repository. There were many, many bathrooms in the house, which left me wondering if RAC’s entire archival staff of 25 has their own personal bathroom.

RAC’s digital processing workstation:

RAC’s digital processing workstation

RAC currently uses a FRED (Forensic Recovery Evidence Device) in conjunction with a KryoFlux, and a number of floppy disk controllers, like FC5025, to disk image 3.5” and 5.25” floppies; and FTK Imager to image optical disks and hard drives. The fact that the KryoFlux and floppy disk controllers can be connected to the FRED, and the FRED is able to access the contents of floppies from these devices, is perhaps the most important thing the Princeton team learned from RAC since we were previously working under the assumption that the FRED’s internal tableau write blocker will disallow the FRED’s ability to access these. The Center also has the use of a MacBook in order to be able to image Mac-formatted materials, which is something that our team is thinking of also including in our workstation in the near future since we have, and anticipate, legacy Mac-formatted materials in our collections.

Bonnie also mentioned that they have FTK installed on their FRED workstation, though it requires a lot of manual labor and is considering getting a second processing workstation with BitCurator installed. Neither the Manuscripts Division or our colleagues in the University Archives at Mudd Library utilize FTK or FTK Imager, and so far both we’ve been satisfied with the suite of tools BitCurator provides. Though because donor imposed access restrictions are much more prevalent in manuscript collections than in university archive materials, FTK might be particularly useful to zero in on particular files and folders with varying access restrictions.

Manuscript Division’s processing archives team with Bonnie Gordon, Assistant Digital Archivist for the Rockefeller Archive Center.

One thing that Bonnie said that I am beginning to understand and be critical about is the need for archivists to retool or hack their way through the current tools available to be able to use them for our needs. Many of these tools, FTK and FRED for example, are not built with archivists as primary customers in mind, but forensic investigators who use them to analyze evidence for criminal investigation. These forensic tools, at times, require significant time investments to be able to get them to be responsive to archivists’ needs, which makes hacking, or improvisation, necessary for folks who want to, or are currently doing, archival work for cultural institutions. In our own limited but growing knowledge and experience in preparing digital archives for long-term preservation, we’ve come across some challenges with configuring each discrete piece of equipment with the necessary operating system, hardware specifications, etc. It is like taking on the task of solving a giant jigsaw puzzle:

  • when you first start, you definitely don’t know which piece goes where;
  • you may even question if you have all the pieces you need;
  • or if you know how to put a damn jigsaw puzzle together;
  • you get frustrated and want to give up mid-completed puzzle;
  • then you realize that, if you assemble a bigger team together, each one of you can take different sides of the puzzle and go from there.

And truly, that is at the core of what our processing team here is trying to get at: empowering all of our processing staff with the skills and expertise to share the labor of processing born-digital archives so that more diverse sets of skills and experiences can influence conversations about workflows and configurations. If we put more heads together we invite more creative ways of working through roadblocks; and if we have more bodies, we can tackle the growing digital backlog more efficiently.

The Princeton team had an excellent field trip out to the Rockefeller Archive Center. Stay tuned as we return the favor to the RAC team and host them at our digs here in early spring!

Manuscripts Division Offers Its First Archival Fellowship!

The Manuscripts Division, a unit of Princeton University Library’s Department of Rare Books and Special Collections, is proud to offer the inaugural Manuscripts Division Archival Fellowship to one graduate student or recent graduate this year. This fellowship provides a summer of work experience for a current or recent graduate student interested in pursuing an archival career. For more information about the Manuscripts Division visit:

Fellowship Description: The 2017 Fellow will gain experience in technical services, with a focus this year on description and management of born-digital and audiovisual materials. The Fellow will work under the guidance of the of the Manuscripts Division processing team, which includes the Lead Processing Archivist, Latin America Processing Archivist, and General Manuscripts Processor. Projects for 2017 may include:

  • Processing one or more paper-based or hybrid format manuscript collections;
  • Conducting a survey of legacy audiovisual materials and assisting in the writing of a grant proposal for an audio digitization project;
  • Assisting with processing and analysis of born-digital media and implementation of related tools and software (i.e. FRED, ArchivesSpace, BitCurator, KryoFlux); and
  • Researching access options and permissions for digitized and born-digital materials.

The Manuscripts Division of Rare Books and Special Collections is located in Firestone Library, Princeton University’s main library, and holds over 14,000 linear feet of materials covering five thousand years of recorded history and all parts of the world, with collecting strengths in Western Europe, the Near East, the United States, and Latin America. The Fellow will primarily work with the Division’s expansive literary collections, including recently-acquired collections of contemporary authors, and collections relating to the history of the United States during the 18th and 19th centuries. The Fellow will have an opportunity to gain considerable experience and aid staff in formalizing comprehensive management of its born-digital and audiovisual materials, including developing methods of providing better access to this content.

The ten- to twelve-week fellowship program, which may be started as early as May, provides a stipend of $950 per week. In addition, travel, registration, and hotel costs to the Society of American Archivists’ annual meeting in July will be reimbursed.

Requirements: This fellowship is open to current graduate students or recent graduates (within one year of graduation). Successful completion of at least twelve graduate semester hours (or the equivalent) applied toward an advanced degree in archives, library or information management, literature, American history/studies, or other humanities discipline, public history, or museum studies; demonstrated interest in the archival profession; and good organizational and communication skills. At least twelve undergraduate semester hours (or the equivalent) in a humanities discipline and/or foreign language skills are preferred.

The Library highly encourages applicants from under-represented communities to apply.

To apply: Applicants should submit a cover letter, resume, and two letters of recommendation to: Applications must be received by Monday, March 6, 2017. Skype interviews will be conducted with the top candidates, and the successful candidate will be notified by April 14.

Please note: University housing will not be available to the successful candidate. Interested applicants should consider their housing options carefully and may wish to consult the online campus bulletin board for more information on this topic.


Digital Processing Workflows & Improvisation: A Foray into Bash Scripting

Adventures in the command line.

Over the past year, processing archivists in the Manuscripts Division have begun to integrate digital processing into our regular work. So far, each group of digital materials we’ve put through our workflow has presented at least one unique challenge that required us to supplement the standard steps in order to meet our twin goals of preservation and access.

Improvisation, however, is no stranger to “traditional” archival processing in a paper-based setting. Professional standards and local guidelines lend structure to actions taken during processing. Still, there is always a degree of improvisation involved due to the fact that no two collections are alike (Hence, the archivist’s favorite answer to any question: “It depends.”) In the inevitable cases where existing documentation stops short of addressing a particular situation we encounter in the day-to-day, we use our professional judgement to get where we need to go a different way. We improvise.

Good guidelines have room for improvisation built-in. By documenting the context and reasoning behind each step, they empower users to deviate from the particulars while achieving the same general result. We’re lucky to have inherited this kind of thoughtful documentation from our colleagues at Mudd Library (notably, Digital Archivist Jarrett Drake). Archivists across the department who are processing digital materials have recently begun writing informal narrative reflections on particular problems we’ve encountered while working with digital materials and the improvisations and workarounds we’ve discovered to solve them. By linking these reflections to our digital processing guidelines, we hope to aid future troubleshooting, repurpose what we’ve learned, and share knowledge horizontally with our peers.

One example (1) is a recent issue I encountered while working with a group of text files extracted from several 3.5” floppy disks and a zip disk from the papers of an American poet. After acquiring files from the disks, our workflow involves using DROID (Digital Record Object Identification), an open source file format identification tool developed by the U.K. National Archives, to identify file formats and report any file extension mismatches. In this case, the report listed a whopping 4,791 mismatches, nearly all of which lacked file extensions entirely.

While Mac and Linux operating systems rely on internal file metadata (MIME types) rather than file extensions to determine which program is needed to open a file, Windows operating systems (and humans) rely on file extensions. Reconciling file extension mismatches is important both because future digital preservation efforts may require moving files across operating systems and because file extensions provide important metadata that can help future users identify the programs they need to access files.

Small quantities of files can be renamed pretty painlessly in the file browser, as can larger numbers of files requiring uniform changes within a relatively flat directory structure by using the rename or mv command in the terminal. In my case, the collection creator managed her files in a complex directory structure, and single directories often contained files of various types, making these manual solutions prohibitively time- and labor-intensive.

If there’s one thing I’ve learned from working with metadata, it’s that you should never do a highly repetitive task if, in the time it would take you to do the task manually, you could learn how to automate it. (2) In addition to completing the task at hand, you’ll develop a new skill you can reuse for future projects. While sitting down and taking a comprehensive online class in various programming languages and other technologies is certainly useful for archivists venturing into the digital, it’s often hard to find the time to do so and difficult to retain so much technical information ingested all at once. Problems encountered in day-to-day work provide an excellent occasion for quickly learning new skills in digestible chunks in a way that adds immediate value to work tasks and leads to better information retention in the long run. This philosophy is how I decided to solve my mass file renaming problem with bash scripting. I was somewhat familiar with the command line in Linux, and I figured that if I knew the command to rename one file, with a little finagling, I should be able to write a script to rename all 4,791 based on data from DROID.

To create my input file, I turned the file extension mismatch report from DROID into a simple CSV file containing one field with the full path to each file missing an extension and a second field with the matching file extension to be appended. To do so, I looked up the correct file extensions in the PRONOM technical registry using the identification number supplied by the DROID report. I then inserted the extensions into my input file using the Find/Replace dialog in Google Sheets, deleted columns with extraneous information, and saved as a CSV file.

The script I wrote ended up looking like this: (3)

My bash script.

In a nutshell, a bash script is a series of commands that tells the computer what to do. The first line, which appears at the beginning of every bash script, tells the computer which shell is needed to interpret the script. Lines 3-9 first clear the terminal window of any clutter (clear) and then prompt the user to type the file path to the input file and the location where a log file should be stored into the terminal (echo); after the user types in each file path, the script reads it (read) and turns the user’s response into variables ($input and $log). These variables are used in the last line of the script in a process called redirection. The < directs the data from the input file into the body of the script, and the > directs the output into a log file.

Terminal window containing prompts initiated by the echo command, along with user input.

The real mover and shaker in this script is a handy little construct called a while loop (while, do, & done are all part of its syntax). Basically, it says to repeat the same command over and over again until it runs out of data to process. In this case, it runs until it gets to the last line of my input file. IFS stands for internal field separator; by using this, I’m telling the computer that the internal field separator in my input is a comma, allowing the script to parse my CSV file correctly. The read command within the while loop reads the CSV file, placing the data from field 1 (the full path to each file to be renamed) into the variable $f1 and the data from field 2 into the variable $f2 (the file extension to be appended). The mv command is responsible for the actual renaming; the syntax is: mv old_name.txt new_name.txt. In this case, I’m inserting my variables from the CSV file. Quotes are used to prevent any problems with filenames or paths that have spaces in them. -v is an option that means “verbose” and prompts the terminal to spit out everything that it’s doing as it runs. In this case, I’m redirecting this output into the log file so that I can save it along with other administrative metadata documenting my interventions as an archivist.

In the end, what would have taken me or a student worker countless hours of tedious clicking now takes approximately four seconds in the command line. I’m glad I spent my time learning bash.


(1) For an even better example, see Latin American Collections Processing Archivist Elvia Arroyo-Ramirez’s “Invisible Defaults and Perceived Limitations: Processing the Juan Gelman Files” (here).

(2) Maureen Callahan’s “Computational Thinking and Archives” blog post certainly helped inspire this way of thinking.

(3) Here’s where I learned how to make my script executable and how to execute it from the command line.

Tooling Up: Building a Digital Processing Workstation

Learning about jumper settings on our 5.25" floppy disk workstation.

Learning about jumper settings on our 5.25″ floppy disk workstation.

Since completing a comprehensive survey of born-digital holdings within the Manuscripts Division in 2015, the archival processing team at Firestone Library has been steadily gathering the equipment necessary to safely access and preserve digital data stored on obsolete computer media. In addition to the nearly 400 digital media uncovered by our recent survey, the Manuscripts Division continues to acquire digital materials at an increasing pace, most recently within the papers of Toni Morrison, Juan Gelman, and Alicia Ostriker.

We’ve leaned heavily over the past year on the infrastructure and expertise of our colleagues at the Seeley G. Mudd Manuscript Library to get our feet wet with digital processing, including for help with extracting data from over 150 floppy disks in the Toni Morrison Papers. This year, we’ve taken the deep dive into assembling a digital processing workstation of our own. As born-digital archival processing becomes a core part of “regular” archival processing, the tools available to archivists must expand to reflect the materials we encounter on a day-to-day basis in manuscript collections; and we, as archivists, have to learn how to use them.

Manuscripts Division Digital Processing Toolkit (Because everything looks better in a cool toolkit).

3.5″ and 5.25″ floppy disks are a common occurrence within personal papers dating from the mid-1970s through the mid-2000s. Disks often arrive on our desks labeled only with a few obscure markings (if we’re lucky), but their contents remain inaccessible without equipment to read them. Since contemporary computers no longer contain floppy disk drives or controllers, we had to get creative. Based on research and recommendations from Jarrett Drake, Digital Archivist at Mudd Library, we assembled a toolkit of drives, controller boards, and connectors that have enabled us to read both 3.5” and 5.25” floppy disks on our dedicated digital processing laptop, which dual boots Windows 7 and BitCurator (a digital forensics environment running in Linux’s Ubuntu distribution).

3.5" Floppy Drive with USB Connector

3.5″ Floppy drive with USB connector

Fortunately, external drives for 3.5” floppy disks are still readily available online for around $15 from Amazon, eBay, or Newegg. We purchased one that connects directly to the USB port on our laptop, which Latin American Collections Processing Archivist Elvia Arroyo-Ramirez and our student assistant Ann-Elise Siden ’17 recently used to read and transfer data from 164 floppy disks in the Juan Gelman Papers (which will be the subject of an upcoming post).

5.25″ floppy disks, which preceded the 3.5″ model, present a somewhat hairier challenge since new drives are no longer commercially available. Based on positive results with a similar set-up at Mudd Library, we purchased a FC5025 USB 5.25″ floppy controller from Device Side Data to use in conjunction with an internal TEAC FD-55GFR 5.25″ floppy disk drive we bought from a used electronics dealer on Amazon. The Device Side Data floppy controller came with a 34-pin dual-row cable to connect the controller board to the drive and a USB cable to connect to our laptop. After hooking everything up, we also realized we would need a molex AC/DC power adapter to power the 5.25″ drive from a wall outlet, which we were also able to secure online at Newegg. All in all, our 5.25″ floppy disk workstation cost us about $130. Compare that to the price of archival boxes, folders, and bond paper, and it’s actually pretty reasonable.

5.25" Floppy Drive

5.25″ Floppy drive (Purchased used from Amazon dealer)

5.25" Floppy Drive Controller from Device Side Data

5.25″ Floppy drive controller from Device Side Data

While these set-ups have been largely successful so far, there have been a handful of problem 3.5″ floppy disks our drive couldn’t read, likely due to prior damage to the disk or obscure formatting. After doing some additional research into methods adopted by peer institutions, we decided to try out the KryoFlux, a forensic floppy controller that conducts low-level reads of disks by sampling “flux transitions” and allows for better trouble-shooting and handling of multiple encoding formats. While an institutional KryoFlux license is a significantly costlier option than the others we’ve discussed in this post, funds from our purchase will support future development of the tool, and it will be available for use by University Archives and Public Policy Papers staff as well as for those in the Manuscripts Division.

Very recently, we received our KryoFlux by mail from Germany. Upon opening and inspecting the package, among the hardware kit, controller boards, disk drive, and cables, we were delighted to find a gift: several packages of Goldbären (i.e. adorable German gummy bears). Our next steps will be installing the KryoFlux software on our laptop, connecting the hardware, and testing out the system on our backlog of problematic floppy disks, the results of which we will document in a future post. In the meantime, we are interpreting the arrival of these candies as a fortuitous omen of future success, and at the very least, a delicious one.

A gift from our friends at KryoFlux.

A fortuitous omen of future successes in disk imaging.


Wrangling Legacy Media: Gaining Intellectual Control Over (Born) Digital Materials

Not unlike many manuscript repositories, Princeton’s Manuscripts Division has been somewhat slow to act when it comes to managing born-digital materials. This largely has to do with the nature of our collections, most of which predate the 20th century, as well as the fact that we lacked the policies and infrastructure to deal with these materials. The number of more contemporary collections we’re acquiring, however, is rapidly increasing—particularly, the papers of still-active literary and cultural figures– as are the digital materials included in these collections. Coming to terms with this reality has led our division to begin taking the necessary steps to properly manage digital materials.

As a first step in doing so, we wanted to gain as much intellectual control as we could over our extant digital media. Our endeavor happened to coincide with SAA’s 2015 Jump In 3 initiative, which we decided to participate in as it provided structure, guidance, and a timeline for us to get the ball rolling. Jump In 3 was the third iteration of the “Jump In” initiative led by the Manuscript Repositories Section meant to help repositories begin to manage their born-digital records. It invited archivists to submit a report and survey of one or more collections in their repositories and also encouraged participants to take the additional steps of prioritizing collections for further treatment and developing the technical infrastructure for dealing with readable media.

STEP 1: Survey Finding Aids (XQuery Magic)

With the assumption that we had a minimal amount of digital media in our collections, we decided to survey all ~1600 of them. We first surveyed our EAD finding aids, which we manage in SVN (Subversion client within Oxygen XML editor), to locate description indicative of possible digital materials. Since descriptive practices for digital media haven’t been conducted in a consistent manner, if at all, we anticipated that existing descriptions would vary from finding aid to finding aid. This meant that our survey tool would need to capture descriptions of digital materials located in various EAD elements.

With the help of our colleague, Regine Heberlein (Principal Cataloger and Metadata Analyst), we wrote a simple XQuery that scanned all descriptive components of our EADs to locate text strings that matched a regular expression based on a list of words, including variant spellings, that we determined would indicate the likely presence of born-digital materials, such as “disk,” “floppy,” “CD,” “DVD,” “drive,” “digital,” “electronic,” etc.

xquery version “1.0”;
declare namespace ead = “urn:isbn:1-931666-22-9”;
declare default element namespace “urn:isbn:1-931666-22-9”;
declare copy-namespaces no-preserve, inherit;
import module namespace functx=“”
at “”;
declare variable $COLL as document-node()+ := collection(“path/to/EAD/directory”);
let $contains_media := $COLL//ead:c/ead:*[not(self::ead:c) and matches(string(.), ‘(\s|^)flopp(y|ies)(\s|$)|(\s|^)dis(k|c|kette)s?(\s|$)|(\s|^)cd(-rom)?s?(\s|$)|(\s|^)dvds?(\s|$)|(\s|^)digital(\s|$)|(\s|^)(usb|hard|flash)\sdrives?(\s|$)’, ‘i’)]
<results xmlns=“urn:isbn:1-931666-22-9”>
for $media in $contains_media/parent::ead:c
<c level={$media/@level} id={$media/@id}>

The XQuery generated a list of matching EAD components in XML, which we then imported into Excel. Each row in the spreadsheet represented a component that the XQuery located, and each column, an EAD element within that component.


Along with assistance from student workers, we meticulously examined and revised this spreadsheet, removing any irrelevant, extraneous, or redundant information; for example, false positives that resulted from other text strings matching our regular expression, or duplicate records for the same item resulting from multi-level description. We then extracted and transferred the most relevant data, including media type, file formats, and quantities, into additional columns, in order to provide more structure for this technical data, as well as to compute estimated totals. We also added columns to track our attempts to determine from the EAD data whether an item was likely born-digital or contained files of materials that had been digitized.

Controlled list for media type and estimated capacity (not always applicable as many descriptions were very general, i.e. “discs” and “floppies.”)

Controlled list for media type and estimated capacity (not always applicable as many descriptions were very general, i.e. “discs” and “floppies.”)

With the understanding that we were relying on imperfect metadata from older finding aids and, more importantly, that not all digital media were even described in the finding aid, our initial survey determined that the MSS Division held approximately 232 (born) digital media totaling 394 GB. In order to store two preservation copies and one access copy of all of the files on the digital media we found, we determined that we’d need a total of about 1.2 TB of storage space. While this number was likely somewhat inflated due to the fact that all media are probably not filled to full capacity, we preferred to err on the side of overestimating our storage needs, especially due to the anticipated presence of additional digital materials in our holdings for which we have no description. These were the figures we reported in our survey report for Jump In.


STEP 2: Physically Survey Collections

The next step entailed physically looking at the materials that were identified in the EAD survey to enhance the data we had captured. We had two students conduct the survey and create an item-level inventory of the materials.

For this survey, we were able to more strictly adhere to the controlled list we had drafted for media type and estimated capacity. Other data we captured included any annotations on the items that may have existed, including media labels (or markings by the manufacturer) and media markings (or creator notes that may have been added); fortunately, for some items, markings included the actual media capacity that had been used as well as file format information. We also wanted to continue to determine, if possible, which items were actually born-digital as opposed to those that contained files of digitized materials, and to note whether or not the latter were duplicates of original paper copies that we also held. The thinking behind this was that this information would assist us in determining how to prioritize these materials.


The number of media found during the physical survey was much higher than what we had determined from the EAD survey: rather than 232 items, we identified 394 media; the estimated storage capacity need more than tripled: we estimated that we needed about 4.2 TB of space as opposed to 1.2 TB.



Since conducting the survey we’ve discovered more digital media in existing collections and have also received materials in new acquisitions. For example, the Toni Morrison Papers includes over 150 floppy disks, both 3.5” and 5.25,” as well as a number of CDs and DVDs. We also recently received the papers of Argentine poet Juan Gelman with over 160 3.5” floppies as well as several CDs, DVDs, and flash drives.

Next Steps

We in the Manuscripts Division are very fortunate in that a solid foundation of policy and infrastructure for managing born-digital materials has already largely been developed by our colleagues who oversee the University Archives and Public Policy Papers at Mudd Library. We’re currently in the process of trying to apply what they’ve established for our division, mirroring what they’ve developed in some respects, but also tweaking things as appropriate due to the differing nature of our collections.

We’ve begun to emulate Mudd’s environment here at Firestone, developing the infrastructure necessary to properly process and preserve digital records; and have also started to revisit and draft various related policies and procedures, specifically those that address new acquisitions (i.e. our donor agreement), processing workflows, preservation, and access.  Other next steps include working with curators to prioritize the extant media we identified in our survey; beginning to process media from new collections; and refining the workflows we currently have in place. (These issues will be discussed in more detail in future posts.)

Not Changing, But Expanding: Managing Digital Archives at Firestone Library

The following post introduces an upcoming series about managing digital archives at Firestone Library. In the next few months, the processing archivists of RBSC’s Manuscripts Division will be posting about:

– updating donor and purchase agreements to reflect language inclusive of managing digital content;
– gaining intellectual control over legacy born-digital materials;
– tools and programs we will use to capture, process, and preserve materials;
– access models for making this content publicly accessible;
– storage options for long-term preservation; and
– description of born-digital and digitized content.  

We plan to post case studies on how we process various types of digital media including audio, email, documents, and images. We will also share any relevant publications, presentations, webinars, etc., that helped inform our process.

Why now?

Simply put, archival processing of digital materials should not only fall on the responsibility of digital archivists. Princeton University Library currently employs one Digital Archivist whose primary responsibilities are to develop, implement, and execute workflows specific to the management of University Archives. And while the Digital Archivist and other colleagues at the Seeley G. Mudd Library have encountered a steady flow of born-digital materials in University Archives over the past several years, the Manuscripts Division has only recently received an increasing amount of ‘hybrid’ collections that include analog and born-digital materials on floppy disks, CDs, DVDs, USBs, hard drives, and other removable storage media.  We are also taking steps to digitally migrate some of our audiovisual content. Acknowledging these concurrent realities, we see that our roles as traditional processing archivists are not changing but are expanding to include the management of digitized and born-digital content; and we’re ready to assume this responsibility. In looking to the foreseeable future, traditional processing archivists will eventually become digital archivists as backlogs shift from dusty, unprocessed boxes to terabytes of unprocessed data.

Institutionally, the timing is right for us to begin tackling this growing issue with the hopes that we get a handle of our digital content before we let terabytes of unprocessed data sit on the shelf to “collect dust.” We hope that our efforts to begin managing digital materials now prevent this new form of backlog from becoming a reality.

Absolute Identifiers for Boxes and Volumes

AbID in action in the RBSC vault.

AbID in action in the RBSC vault.

Most library users are familiar with call numbers such as 0639.739 no.6, or D522 .K48 2015q, or MICROFILM S00888. These little bits of text look peculiar standing on their own. However, together with an indication of location such as Microforms Services (FLM) or Firestone Library (F) they can guide a user directly to a desired item and often to similar items shelved in the area. In Rare Books and Special Collections (RBSC), there’s no self-service. Instead, departmental staff members retrieve (or “page” as we call it) items requested by users. Finding those items has not always been easy. Over the years many unique locations and practices were established on an ad-hoc basis. The multiplicity of exceptional locations required the paging staff to develop a complex mental mapping that rivaled “The Knowledge” that must be mastered by London taxi drivers. The Big Move of collections in May 2015 provided the opportunity for a fresh start. Faced with growing collections and finite space, we imagined a system that would adapt to the new vault layout, which is strictly arranged by size. By “listening” to the space, we realized that shelving more of our collections by size would result in the most efficient use of shelving and minimize staff time spent on retrieval and stack maintenance. We couldn’t do anything immediately about the legacy of hundreds of subcollections–except to shelve them in a comprehensible order, by size and then alphabetically. However, the break with the past in terms of physical location of materials prompted some rethinking that eventually led to the use of “Absolute Identifiers” or “AbIDs” for almost all new additions to the collections.

RBSC has long used call number notations that are strictly for retrieval rather than subject-related classification for browsing by users. “Sequential” call numbers, which were adopted for most books in 2003, look like “2008-0011Q.”  Collection coding came along many years before that, in forms such as “C1091.” The Cotsen Children’s Library used database numbers as call numbers, such as “92470.” As a result RBSC has vast runs of materials where one call number group has nothing to do with its neighbors in terms of subject or much of anything else. A book documenting the history of the Bull family in England sits between an Irish theatre program and the catalog of an exhibition of artists’ books in Cincinnati. In the Manuscripts Division, the Harold Laurence Ruland papers on the early German cosmographer Sebastian Munster (1489-1552) have a collection of American colonial sermons as a neighbor. So what’s different about AbID? Three things are different: The form of the call numbers, their uniform application across collections and curatorial domains, and the means by which they are created.

The form of AbIDs is simple: a size designation and a number. Something like “B-000201” provides the exact pathway to the item, reading normally. It tells the person paging an item to go to the area for size “B” and look along the shelves for the 201st item. Pretty simple. Size designations are critical in our new storage areas. In order to maximize shelving efficiency, and thus the capacity for on-site storage, everything is strictly sorted into 11 size categories. (Some apply to very small numbers of materials. So far only two sizes account for two thirds of AbIDs.) In a sense, using AbIDs is simply a way of conforming to the floor plan and the need to shelve efficiently. That’s why the designations can be called Absolute Identifiers. They are “absolute” because the text indicates unambiguously the type and location of each item in the Firestone storage compartments. In other words, all information needed to locate an object appears right in the call number, with no need for any additional data. Even if materials must be shifted within the vault in the future, AbIDs remain accurate since they are not tied to specific shelf designations.

AbIDs are applied across curatorial domains (with the notable exception of the Scheide Library). Manuscripts Division, Graphic Arts, Cotsen Children’s Library, Western Americana — all are included. A great step toward this practice was taken with adoption of sequential call numbers for books in 2003. To make the sequential system work, items from all curatorial units were mixed together. Since curatorial units were previously the primary determinant of shelving, the transition required some mental adjustment. The success of sequential call numbers as a means of efficient shelving and easy retrieval made the move to AbIDs easier. So, bound manuscript volumes of size “N” sit in the same shelving run as cased road maps for the Historic Maps Collection, Cotsen volumes, and others. The items are all safely stored on shelving appropriate for their size and are easy to find. Of course, designation of curatorial responsibility remains in the records and on labels, but it is no longer the first key to finding and managing items on the shelves.

Finally, AbIDs are created via a wholly new process that was developed by a small committee of technical services staff. At the heart of the process is a Microsoft Access database. The database has a simple structure, with only two primary tables. A user signs in, selects a metadata format (MARC, EAD, and “None” are the current options–“None” is for a special case), and designates a size, along with several other data elements required for EAD. The database provides the next unique number for the size a user declares and if items are not already barcoded provides smart barcodes (ones that know which physical item they go with). Rather than requiring users to scan each barcode individually, the database incorporates an algorithm that automatically assigns sequential barcodes after a user enters the first and last item number in a range. For collections described in EAD, the database exports an XML file containing AbIDs, barcodes, and other data. A set of scripts written by Principal Cataloger and Metadata Analyst, Regine Heberlein, then transforms and inserts data from the XML export file into the correct elements in the corresponding collection’s EAD file. (These scripts also generate printable PDF box and folder labels at the same time!) For books and other materials cataloged in MARC, the database uses MacroExpress scripts to update appropriate records in the Voyager cataloging client. Overall, the database complements and improves existing workflows, allowing technical services staff to swiftly generate AbIDs and related item data for use in metadata management systems.

Getting started in the AbID database.

Getting started in the AbID database.

Making metadata and size selections in the AbID database.

Making metadata and size selections in the AbID database.

With the 2015 move we are starting afresh. Old locations and habits are no longer valid. We have a chance to re-think the nature of storage and the purposes being served by our collection management practices. Our vault space is a shared resource, and inefficient use in any area of our department’s collection affects all. With AbIDs, the simple form of the call numbers, their uniform application across curatorial domains, and the means by which they are created make for efficient shelving and retrieval, which ultimately translates into better service for our patrons.

Opening the unmarked door: Communicating about technical services

The unmarked door

The unmarked door

Welcome to our blog!

You are reading about library technical services.  Perhaps it’s your first time; perhaps you’re a fellow practitioner; perhaps you’re somewhere in between.  If you are new to this aspect of library work we are delighted by your initial interest and hope to provide posts that hold your attention.  We also hope that readers with some background in the field will find value in our experiences and perspectives.

Writing about technical services for a broad audience immediately brings up two conditions: we are out of public view (literally in the “back room” behind an unmarked door) and we habitually use vocabulary that can be difficult to interpret for those not in the know.  Let’s check on those topics here.  In subsequent posts we will get down to business.


In technical services we are in the business of creating infrastructure.  We provide the intellectual and physical control of library resources that enable users–including other library staff–to carry out their work.  Infrastructure is just that: “infra,” or below.  It’s meant to be used rather than to call attention to itself.  The best indication of an infrastructure job well done is invisibility.  For example, just about everybody drives across bridges without bestowing a thought on the engineers and construction workers who designed, built, and maintain (we hope) the structures that get riders from one side of something to the other side.  It’s the same with us in library technical services.  If the metadata we create readily gets users to resources that they are seeking (via a great deal of systems work: another realm of invisibility), then we have succeeded.  Normally metadata creators rise to the level of conscious thought only in cases of error or inadequacy.  (Users, understandably enough, typically consider our fundamental aim of “user service” to mean “service to me at this moment” and thus they tend to judge adequacy and correctness in terms of their own goals.  Our working perspective has to encompass all current and future users, and the limits on our capacity to serve them.  However, these are topics for other posts!)  There’s nothing unusual or lamentable about our general lack of visibility.  Our products such as catalog records and finding aids are eminently visible, and as long as they serve to connect users with the department’s amazing collections we are content.

So, a number of those who peruse our posts will encounter activities or ideas that they have not previously thought of very much, if at all.  Good!  We hope to convey information about the work we do, and along with it some hint of the intellectual vigor and liveliness that keep us engaged in our obscure but consequential functions on this side of processing.

The language barrier

We talk funny.

String. Property. Field. Tag. Element. WEMI. FISO. XSLT. Entification. Expression. 506. odd. Subdivision. Ancestor. Authorities.  These are all words (or “words” in some cases) that we use in our daily discourse.  They have specific contextual meanings, and behind those meanings are concepts and models that we use to construct our intellectual workspace.  For example, to us a field isn’t an expanse of property that one might find in a subdivision, possibly demarcated by a string.  It’s a constituent part of a MARC record, which is …  Well, we’d best move on before explanations overwhelm us.  All professional specialties have their own jargon and precise terminology.  You would likely be uneasy if you heard your doctor referring to your inner workings as a collection of thingamabobs and doohickeys rather than bronchi and glomeruli and so forth.  The technical services terms listed above and others like them provide a means for us to communicate effectively among ourselves.  Fluency in their use signals full in-group status, much like a prison tattoo.  However, in this blog we are generally going to avoid emphasizing our mastery of what for most people is esoteric vocabulary.  Instead, we are going to write in plain language, except when our primary target audience is our fellow practitioners.  Even then we will make some effort to explain the jargony terms that we use.

That said, we do expect to be writing with a technical services-oriented audience in mind.  Much of what we have to say is about innovation.  We are constantly thinking about ways to improve services and take advantage of developments in technology.  Members of our unit have lots of ideas on topics such as creating “absolute identifiers” for holdings or improving holdings management in finding aids, to name just two.  We are involved with leading-edge projects such as SNAC and LD4P (more specialized terminology!).  You’ll read about them here.  Discussion of such matters is necessarily laden with jargon and acronyms.  If you’re unfamiliar with the wording or underlying concepts and want more background, just let us know.  We want to give everyone a chance to learn about us and to gain some insight into activities that affect all users.  Our goal is to communicate with everyone and to make all members of our audience feel welcome in our world.