Meet Mudd’s Jarrett M. Drake

drake-jarrett

Name/Title: Jarrett M. Drake, Digital Archivist

Responsibilities: As the digital archivist at Mudd, I’m responsible for the development, implementation, and execution of processes that facilitate the effective acquisition, description, preservation, and access of born-digital archival collections acquired by the University Archives. The emphasis on ‘born-digital’ is to distinguish my work from that of digitization, which is a process that converts analog material into digital formats. Born-digital records are those that originated as microscopic inscriptions of 0’s and 1’s on a piece of magnetic media.

mfmhd21

“Magnetic Force Microscopy (MFM) of a Magnetic Hard Disk,” taken from MIT

Preserving and providing access to those 0’s and 1’s, or bits, is too challenging a problem for any single person to solve, so many of my duties require me to collaborate with others in the University Archives and across campus. This often involves me meeting with our University Archivist, the Assistant University Archivist for Technical Services (to whom I report), and the University Records Manager. As exciting as it is to dive into the past by hacking away at old and new media—and trust me, doing this is really exciting—the most important element of my success is laying the infrastructure for our Digital Curation Program, which we initiated two months ago. Infrastructure is invisible to most of us but critical for all of us. More on that in future posts.

Lest I lead you to believe that I work exclusively in the digital realm, I also do things that archivists have always done: processing paper records and performing reference services in person, on the phone, and over email.

Ongoing projects: Because our Digital Curation Program is rather nascent, I spend a majority of my time drafting policy documents for the program as well as revising workflows for how we process born-digital records. Outside of that, I contribute to several Library-wide working groups and task forces. When I’m not doing one of those two things, you can probably find me working with a new digital preservation tool or strengthening my command of various operating systems.

Worked at Mudd since: I began at Mudd in November of 2013. Prior to Princeton, I served as University Library Associate at the Special Collections Library of the University of Michigan, a post I maintained for nearly two years while I completed my master’s degree in information science at the School of Information. Before Michigan, I had brief stints at the Maryland State Archives and Beinecke Rare Book & Manuscript Library.

Why I like my job/archives: Contrary to general perception, archivists are concerned equally with the future as they are with the past. Yes, we manage records that document past activities, but we do so only for future use by researchers. In this way, I see my job as a digital archivist as one that preserves the past in order to promise the future. That promise is harder to ensure when it comes to digital records, but it’s a challenge that I find to be terrifyingly exciting and incredibly meaningful. Also, I learn something new each and every day, which is one of the most fulfilling aspects of my work.

And though I put a lot of time and energy into curating bits, I joined the profession because I like people. I enjoy assisting them with their research questions and it gratifies me that I can contribute to the creation of new knowledge about the past. The roughest days I encounter are immediately turned around when a researchers says “I can’t thank you enough for your assistance” or “without you, I’m not sure I could have answered this question.” Those are my reminders that I chose the right profession.

Favorite item/collection: Recently I responded to a researcher who sought information about the first Japanese student to graduate from Princeton. I spent some time digging around our Historical Subject Files and our Alumni Undergraduate Records collection to learn that in 1876, Hikoichi Orita was the University’s first Japanese student to graduate.

orita

Authored by Walter Mead Rankin, 1884. Found in “Orita, Hikoichi,” Box 148, Undergraduate Alumni Records, Princeton University Archives, Department of Rare Books and Special Collections, Princeton University Library.

In addition to his alumni files, we have a copy of his student diary, which I told myself I would read slowly over my career. It’s in English, in case you’re interested in viewing it, too. This is a classic example where a researcher informs the interests of the archivist, instead of vice versa.

Records Management and University Archives: Perfect Together

The job of the Princeton University Archives is to keep in perpetuity the University records that should be kept, and the University Records Manager, Anne Marie Phillips, helps to identify them.  She also helps offices determine how long non-permanent records must be kept before they can be destroyed.

With the University’s first financial records retention schedule coming online, she identified almost 300 boxes of journal vouchers and check registers from the 1950s and 1960s held within the Archives that should have been destroyed long ago.

containersThese records filled 21 bins (see above photo) and weighed over 6,000 pounds.  Now that shelf space can be used for permanent records that the Archives will keep for as long as there is a Princeton.

loadedtruckloadingtruck

 

Our NHPRC-Funded Digitization Project at Six Months

Late last year, the Mudd Manuscript Library was granted an award by the National
Historical Publications and Records Commission (NHPRC) to digitize our most-used Public Policy collections, serve them online, and create a report for the larger archival community about cost-efficient digitization practices. Excerpts from our six-month progress report is below.

nhprc-logo-l

Work so far

  1. Project planning

From the time we were awarded the grant to the present, we have produced an overall project plan and timeline, a vendor RFQ and plan of work, in-house quality control procedures for vendor-supplied images, a workplan for in-house scanning, and hardware-specific instructions for in-house scanning. All activities are either on schedule or ahead of schedule. Vendor-supplied digitization is currently eight months ahead of schedule.

  1. Finding a vendor

After distributing an RFQ and collecting bids, we decided on The Crowley Company as our vendor, based on both price and our confidence that they would be able to manage the materials and the work carefully and efficiently.

  1. Managing vendor-supplied digitization

Before materials can go out to the vendor, we first create a manifest of everything we want to send by transforming the EAD-encoded finding aid into an easily-read Excel worksheet. Since we want each folder of material to have a cover sheet that explains the collection name, box number, folder number, URL, and copyright policy, we used collection manifests to make target sheets with this information. A total of 6,943 target sheets were created, printed, and inserted into the beginnings of folders by student workers before materials were sent out to the vendor.

Once materials have been imaged by the vendor, students sample ten percent of the collection to check for completeness and readability. So far, everything has passed quality control with flying colors.

Each month, Crowley sends us a report of how many images have been created that month, how many images have been created cumulatively, and average scanning rate per hour. This information is below:

Boxes Scanned

Pages Scanned

2013 March

15

17119

2013 April

32

45761

2013 May

50

49499

2013 June

65

97896

Totals

162

210275

  1. In-house imaging

Imaging of the John Foster Dulles papers started in June. So far, we have completed a pilot of scanning with the sheet-feed of the photocopier, and pilots of microfilm scanning and scanning with a Zeutschel face-up scanner are underway.

Project goals and deliverables

  1. Twelve series or subseries from six collections digitized

To date, five series or subseries have been completely digitized, and three others are in the process of being digitized.

  1. Approximately 416,000 images created and posted online

As of July 1, 2013, 210,275 images have been scanned by the vendor. Of this total, 39,834 images have been posted online. Our vendor is several months ahead of schedule for this project, and in-house scanning is on track. Since beginning in-house scanning in June, 1,838 pages have been scanned by student workers. In the next months, we will calculate the per-page costs for scanning on a Zeutschel face-up scanner and with a microfilm scanner. From there, we plan to image fifty feet of materials with the sheet feeder of the photocopier, 10.3 feet with the Zeutschel face-up scanner, and 33.4 feet with the microfilm scanner.

  1. Six EAD finding aids updated to include links for 17,508 components (folders)

Two finding aids (Council on Foreign Relations Records and Adlai Stevenson Papers) have been updated to include links to digitized content. Another (George F. Kennan Papers) is ready to be updated. This process is managed semi-automatically with a series of shell scripts. After quality control hard drives of images are sent to Princeton’s digital studios. Staff there verify and copy digital assets to permanent storage. After this, PDF and JPEG2000 files are derived from the master TIFFs, and the relationship between these objects is described in an automatically generated METS file. The digital archival object (<dao>) tag is added to the EAD-encoded finding aid for each component.

  1. Digital imaging cost of less than 80 cents per page achieved

The plan of work with our vendor calls for scanning costs well below the 80 cents per page. Our first (and likely least expensive) of three in-house scanning pilots estimates the costs of scanning with the sheet feeder of a copier to be two cents per page. We will have numbers for microfilm scanning and scanning with a face-up scanner at the time of our next report.

  1. Metrics for digital imaging of 20th century archival collections for

    1. In-house microfilm conversion

    2. Sheet feeding through a networked photocopier

    3. Vendor supplied images

The information that we have collected thus far is below. Our vendor metrics are based on the quote and plan of work with The Crowley Company. Sheet feed metrics are collected by having a student worker fill out a minimal, time-stamped form at the beginning and end of each scan, and then analyzing that information. These numbers are preliminary. Sheet-fed scans have not yet been checked for quality control — re-scans may increase the total time per page and dollars per page for this method.

Vendor

Sheet Feed

Microfilm

Zeutschel

Total pages:

270,600*

1838

Total feet:

530.95

1.68

Total time:

2:25:14

Total time (decimal):

2.42

Time per page:

0:00:04

Pages per hour:

270.75

759.33

Hours per foot:

1:26:26

Feet per hour

0.69

Cost per page:

TBD

$0.02

*This number is an estimate, based on an assumed 1200 pages per box. Our reports from Crowley show anywhere from 1050-1750 pages in a box.

Note: in addition to these three methods, we plan to add a fourth – scanning with a face-up scanner (in our case, a Zeutschel scanner table).

  1. Policies and documentation for large-scale digitization initiative created and shared with archival community

As we go forward with our project, we have been blogging not just about the content of our digitized collections, but also our methods and rationales. A blog post written in February explains how this project fits into our other digitization activities and our approach to access. In early June, we wrote about the reasons why this kind of project is so important, and how our materials will now reach researchers worldwide (and of all ages) who might otherwise never come to our reading room in Princeton, New Jersey.

A more formal report on our methods and results will be made available once more data has been gathered.

Archives for Everyone

In each of the last two springs, several staff of the Mudd Manuscript Library and other members of the Department of Rare Books and Special Collections have judged at the regional qualifier of the National History Day competition held on Princeton’s campus. This is a contest for middle and high school students who, based on rigorous guidelines, synthesize and analyze information about a historic event. They then create a paper, website, documentary, exhibit or performance explaining what they have learned.

Judging National History Day is a powerful touchstone about the value of archives in the production of history. Each year, I see students adroitly avoid some of the more common traps of historical production — their projects are clear, level-headed, open-minded, and support their claims with evidence. Students who submit the best projects don’t just have a clear argument and lengthy bibliography — they let the primary sources surprise them and challenge their previous conceptions of the past. Yes, they may start with textbooks and biographies, but stronger projects evaluate primary sources. And the very best projects tend to not just look at key documents that have been artificially assembled on a website (although this is valuable too) — they look at records in context and try to make arguments about subtext and authenticity.

The best place to find records in context is usually an archives. But of course, access to archives isn’t easy for students. Working parents may not be able to take their children to the New Jersey Historical Society or National Archives or Mudd Library, as much as they might like to provide that experience. Most archives are only open during the hours when parents are working and visiting these institutions can be intimidating. From a young student’s perspective, it’s often hard to tell what the holdings are and whether the trip will be worth it.

Our NHPRC-funded project hopes to be a model toward ameliorating this barrier to access. We believe that by scanning our records and making them available within the same context that one would see them in the reading room, anyone with an internet connection can have a meaningful scholarly experience without the cost and inconvenience of traveling to Princeton, New Jersey.

We hope that children will benefit as much as anyone from this project. As Cathy Gorn, the Executive Director of National History Day, noted in her letter of support for the grant:

Having primary source materials on the Cold War available via the Internet would allow many NHD students around the country to conduct research for their projects that they ordinarily would not be able to, and the Mudd collections to be digitized are broad enough to support a variety of NHD Projects.

Of course, students don’t just wish to access historical records for National History Day — they want access for the same reasons that any other researcher does. A teenager may want to know more about when and how his family came to America. He might want to know more about the history of his town, and how certain sites came to be created. Or he may be interested in the history of ideas, policies and customs that affect his life. The collections that we plan to digitize — the John Foster Dulles papers, the Allen Dulles papers, the James Forrestal papers, the Council on Foreign Relations records, the George Kennan papers and the Adlai Stevenson papers — document how cold war activities were conducted and understood. They also present an opportunity for students to understand through diaries and correspondence the false starts, misunderstandings, and possible alternatives that constitute all historical events.

The historian John Lewis Gaddis makes the argument for access more persuasively than I could. In his letter of support for our grant, he explained the cost, inconvenience and wear on records for professional researchers trying to do research on-site.

But the most fundamental shortcoming of this old system was the disservice it did to students of history who never got to see an archive in the first place. Maybe they lived abroad. Maybe they attended American universities or colleges that could not provide research support. Maybe they were high school or even elementary students who might have gotten hooked on history for life had they had the chance to work with original materials – but they didn’t have that chance.

Now, however, almost all of them have access to a new means of access, which is of course the internet- even if they’re stuck in a place like Cotulla, Texas, where I grew up. I mention this little town because it’s where the young Lyndon B. Johnson spent a year teaching, in 1928-29, in the then segregated Mexican-American school. What he tried to do for those kids is still remembered: it gets its own chapter in the first volume of Robert Caro’s massive biography. But just think what LBJ could have done as a teacher had he had the resources that are available now. That’s why this project is important.

It has the potential, quite literally, to globalize the possibility of doing archival research. That’s no guarantee that this will produce a greater number of great books than in the past. What it will ensure, however, is a quantum leap in the opportunities students and their teachers will have to bring the excitement of working with original documents into all classrooms. That’s easily as important, I think, as writing the kind of books that might get you tenure at a place like Yale.

Why — and How — We Digitize

It’s February, and we’re now in the second month of our NHPRC-funded digitization project. In twenty-three more months, we’ll have completed scanning and uploading 400,000 pages of our most-viewed material to our finding aids, and anyone with an internet connection will be able to view it.

This is just the most recent effort to introduce digitization as a normal part of our practice at Mudd. As I said in my previous post, we know that it’s well and good that we have collections that document the history of US diplomacy, economics, journalism and civil rights in the twentieth and twenty-first centuries. But for the majority of potential users, who may never be able to come to Princeton, NJ, this is irrelevant. However interested they may be, they may never be able to afford to visit us. And there’s a whole other subset of potential users — let’s call them working people — who can’t come between the hours of 9:00 and 4:45, Monday through Friday. Are we really providing fair and equitable access under these conditions? Since we have the resources to digitize, it’s imperative that we develop the infrastructure and political will to do so.

We know that it’s time to get serious — and smart — about scanning.

The ball has been rolling in this direction for some time. We have three “streams” of making digital content available, and with our new finding aids site, we have an intuitive way of linking descriptions of our materials to the materials themselves.

Images of the collection in the context of the finding aid

Images of the collection in the context of the finding aid

Our first is patron-driven digitization.

The Zeutschel -- our amazing German powerhouse face-up scanner

This is our Zeutschel scanner. It does amazing work, is easy on our materials, and usually requires very little quality control.

Archives have been providing photoduplication services since the advent of the photocopier. At Mudd, we have dedicated staff who have been doing this work for decades. Recently, we’ve just slightly tweaked our processes to create scans instead of paper copies and to (in many cases) re-use the scans that we make so that they’re available to all patrons, not just the one requesting the scan.

A patron (maybe you!) finds something in our finding aids that he thinks he may be interested in, and asks for a copy.

If he’s in our reading room, he flags the pages of material he wants. If he’s remote, he identifies the folders or volumes to be scanned. The archivist tells him how much the scan will cost, and he pre-pays.

Now, the scanning. This either happens on our photocopier (the technician can press “scan” instead of “photocopy” to create a digital file instead of a paper one) or on our Zeutschel scanner. And while we feel happy and lucky to have the Zeutschel, we don’t strictly need it to fulfill our mission to digitize.

The scan is named in a way that associates it with the description of the material in the finding aid, and is then linked up and served online. We currently send the patron an email of this scan, but in the future we may just send them a link to the uploaded content.

Our second stream is targeted digitization based on users’ viewing patterns

Our friendly student receptionist, Ashley, scans materials at the front desk when she isn't welcoming patrons.

Our student receptionist, Ashley, scans materials at the front desk when she isn’t welcoming patrons.

We try to keep lots of good information about what our users find interesting. We use a service called google analytics to learn about what users are browsing online, and we keep statistics about which physical materials patrons see in the reading room.

From these sources, we create a list of most-viewed materials, and set up a system for our students to scan them in their downtime when they’re working at the front desk.

We do this because we want to make sure that we’re putting the effort into digitizing resources that patrons actually want to see — there are more than 35,000 linear feet of materials at the Mudd Library. We probably won’t ever be able to digitize absolutely everything, and it wouldn’t make sense to start from “A” and go to “Z”. So, we pay attention to trends and try to anticipate what researchers might find useful.

Our final stream — and the one for which we currently have to rely on external support — is large-scale vendor-supplied digitization.

Our current cold war project is a great example of this. We’ve put together a project plan, chosen materials, called for quotes and chosen a vendor. We recently shipped our first collection to be digitized, and I’ll be posting information to the blog as we move forward.

Another good example of an externally-supported digitization activity is the scanning of microfilm from our American Civil Liberties Union Records. Our earliest records were microfilmed decades ago and recently, Professor Sam Walker supported the digitization of some of this microfilm so that they could be made available online.

No single stream — externally-supported projects, left-to-right scanning, or patron-driven digitization — would be enough to support our goal of maximizing the content available online. We hope that the three, each pursued aggressively, will help us realize our mission of providing equitable access to our materials. And we think that focusing on this cold war project will help us reflect on and improve all of our digitization activities.

The Daily Princetonian is digitized and keyword searchable

prince_inverted.gif

The Princeton University Archives, working in conjunction with the Princeton University Library Digital Initiatives, has nearly completed a monumental project that will change the way researchers investigate University history. The student newspaper, The Daily Princetonian, has been digitized from its inception in 1876 through 2002. The site has been available in beta for almost two years, but all issues will be loaded as of June 30, 2012. At the suggestion of The Daily Princetonian alumni board who have been among the prime backers of this project, the site is named in honor of the newspaper’s long-serving production manager Larry Dupraz, and researchers are able to perform sophisticated keyword searches that can unlock the vast richness of the daily newspaper that documents so much of the University’s history. (For the years 2002- present, users may search online via the Daily Prince site.)

DailyPsearchsreenshop

“I wrote my final paper for my Freshman Writing Seminar about how the presence of veterans on Princeton’s campus following World War II affected Princeton’s academic environment and social atmosphere,” said Jennifer Klingman ’13. “My research heavily relied on The Daily Princetonian archives, and I had to spend a lot of time and energy searching for relevant articles in Firestone’s microform versions of the newspaper. It was difficult to comb through the articles, and as a result my research was limited in scope. This spring, I wrote my history department junior paper on academic and social changes taking place at Princeton during the late 1940s and 1950s. The online Daily Princetonian archives proved to be invaluable. I was able to access the archives anywhere and at any time, and use the archives’ search function to find a number of extremely useful articles. My independent work has definitely benefited from the existence of the online archives.”

100_0988

Freelance journalist W. Barksdale Maynard ’88 states “I am able to write about the social history of Princeton in an entirely new way and have restructured my research to take full advantage of this exciting new resource. For my Princeton Alumni Weekly article on the early history of automobiles at Princeton, the Dupraz Digital Archives allowed me to identify every reference to cars as early as 1901, to pinpoint who owned them and what kinds. I would never have attempted this article without The Dupraz Digital Archives.”

Maynard’s PAW colleague, Gregg Lange ’70, regularly uses the site for his column, “Rally Round the Cannon,” which examines and appraises University history. “You can piece together the story of Princeton football or Woodrow Wilson in a dozen ways. But the unique accessibility of a daily publication allows more subtle topics to arise and recede, and for cross-generational tales to emerge. Be it Ella Fitzgerald singing at a Princeton dance at age 19, then receiving an honorary degree 54 years later; or student revolts against the clubs’ Bicker selection system in 1917 and 1940 presaging its loss of monopoly in 1968, the combination of detail and long view is indispensable in understanding the ethos of the institution over time, and essentially inaccessible without the DuPraz technology and precision. And existentially, if I never see another microfiche in my life I will die a happy man.”

Maynard added, “My regular column in PAW, “From Princeton’s Vault,” has benefited enormously. Recently I was able to identify the earliest references to Princetonians as “tigers,” which had been guesswork previously. It turns out we were wrong by a decade.

This has been an international project, with the newspapers sent from Princeton to Brechin Imaging in Canada, where TIFF images are generated using high end German cameras. The files are then sent via a hard drive to Cambodia, where Digital Divide Data analyzes the structure of each page and uses an optical character recognition (OCR) program to derive machine-readable text, which allows for keyword searching. The hard drive is then shipped to Austin, Texas, where the US office of New Zealand company DL Consulting loads the data into a content-management system called Veridian, which supports searching and browsing, online reading, article extraction and printing, and other features.

Within the library, many hands have worked for this project’s success. At Mudd Library, project archivists Dan Brennan and then Adriane Hanson have overseen the day-to-day work of the project, managing the shipment of the newspapers to Brechin, as well as supervising students with the quality control phase. University Archivist Dan Linke raised the funds from various University and alumni sources and coordinated the project.

Within the greater Library system, Cliff Wulfman, the Library’s Digital Initiatives Coordinator, took the lead in writing the Request for Proposals and then selecting and coordinating the work with DDD, as well as providing technical assistance, support and vision. The Library System Office’s Antonio Barrera designed the front end web page with Phil Menos providing server support, and Deputy University Librarian and Systems Librarian Marvin Bielawski allocated the funds to acquire the Veridian software.

The project employs the METS/ALTO markup standard, the same used by the Library of Congress’s Newspaper Digitization Project, which means that as software changes and improves, we will be able to sustain this resource for many years to come.

100_0996

Most used Princeton theses

Dear Mr. Mudd, I was wondering what is the most popular/most requested senior thesis in the University Archives collection?

This is a perennial question and the short answer is that with the exception of celebrity alumni theses, there are few theses that are pulled with any regularity, yet the collection as a whole (totaling over 60,000 theses) is our most used collection within the University Archives. Last year over 1,000 theses were viewed by visitors–mostly Princeton undergraduates–to the Mudd Library, which accounted for about 1/4 of all Archives materials circulated.

Kopphoto
cainphoto

Wendy Kopp’s thesis is always among those requested by remote researchers–that is, those who do not visit the library in person, and whenever a Princetonian makes news or is on a hit show, their thesis is often requested.

In the past, this included Wentworth Miller III (when Prison Break was a hit), David Duchovny (for the X Files) and Dean Cain (Adventures of Lois and Clark), as well as all three now sitting Supreme Court Justices: Samuel Alito, Elena Kagan, and Sonia Sotomayor.

The entire theses collection can be searched via this database, and Archives staff are working to make future senior theses available online to the Princeton community starting in 2013.

University Archives featured in Princeton Alumni Weekly

Every few weeks the Princeton Alumni Weekly focuses one segment of the magazine to highlight items from the Princeton University Archives entitled "From the Vault."

The articles are researched and written by alumnus W. Barksdale Maynard ’88 who has been contributing the content to the PAW for two years. Mr. Maynard has also written a few books, two focusing on Princeton, which you can see here. The concept of the articles originated with Editor Marilyn H. Marks *86 who has an interest in the University Archives, which are housed at the Seeley G. Mudd Manuscript Library. http://www.princeton.edu/mudd/

The most recent article focuses on a former Princeton alumni who was aboard the Titanic when it sank. http://paw.princeton.edu/issues/2012/04/04/pages/7288/

Recently PAW photographer Riccardo Barros and Art Director Marianne Gaffney Nelson came to Mudd to photograph physical items included in the collections for upcoming issues of the PAW. Here you can see a behind the scene’s view of how those articles come to life.
100_0928
100_0933
100_0935
Keep checking the next few issues of the PAW to see these items explained!!
For more about the University Archives click here.

University Archives materials in new Art Museum exhibition

A new exhibition at the Princeton University Art Museum features items borrowed from the Princeton University Archives. Princeton and the Gothic Revival: 1870-1930 is a look into "Americans’ changing attitudes to the art, architecture, and style of the Middle Ages through the lens of Princeton University around the turn of the twentieth century" and opens to the public this Saturday, February 25, 2012.

Chapel exterior
Alexander Hoyle for Cram and Ferguson, architects

The exhibit includes 10 items loaned from the Princeton University Archives, including the signature image for the exhibition, a watercolor of the University Chapel (above). Other items include architectural drawings of the Marquand Chapel, Holder Hall, Madison Hall and the South Court Tower, and some suggested additions for the university library from 1898, which at that time was housed in Chancellor Green.

One piece needed some intricate and delicate conservation efforts from University Paper Conservator Ted Stanley. A watercolor of the proposed exterior of the A. Page Brown, Class of 1877 Biological Laboratory had split in half. Stanley was able to restore the watercolor and the board it was mounted on to its original form to hide the separation. We challenge you to find the seam!

This is the first time that any of the archives material has been loaned and displayed at the Princeton Art Museum. The exhibit will run from February 25th to June 24, 2012

For more about Princeton and the Gothic Revival: 1870-1930 or the Princeton Art Museum, visit their website.