Our NHPRC-Funded Digitization Project at Six Months

Late last year, the Mudd Manuscript Library was granted an award by the National
Historical Publications and Records Commission (NHPRC) to digitize our most-used Public Policy collections, serve them online, and create a report for the larger archival community about cost-efficient digitization practices. Excerpts from our six-month progress report is below.

nhprc-logo-l

Work so far

  1. Project planning

From the time we were awarded the grant to the present, we have produced an overall project plan and timeline, a vendor RFQ and plan of work, in-house quality control procedures for vendor-supplied images, a workplan for in-house scanning, and hardware-specific instructions for in-house scanning. All activities are either on schedule or ahead of schedule. Vendor-supplied digitization is currently eight months ahead of schedule.

  1. Finding a vendor

After distributing an RFQ and collecting bids, we decided on The Crowley Company as our vendor, based on both price and our confidence that they would be able to manage the materials and the work carefully and efficiently.

  1. Managing vendor-supplied digitization

Before materials can go out to the vendor, we first create a manifest of everything we want to send by transforming the EAD-encoded finding aid into an easily-read Excel worksheet. Since we want each folder of material to have a cover sheet that explains the collection name, box number, folder number, URL, and copyright policy, we used collection manifests to make target sheets with this information. A total of 6,943 target sheets were created, printed, and inserted into the beginnings of folders by student workers before materials were sent out to the vendor.

Once materials have been imaged by the vendor, students sample ten percent of the collection to check for completeness and readability. So far, everything has passed quality control with flying colors.

Each month, Crowley sends us a report of how many images have been created that month, how many images have been created cumulatively, and average scanning rate per hour. This information is below:

Boxes Scanned

Pages Scanned

2013 March

15

17119

2013 April

32

45761

2013 May

50

49499

2013 June

65

97896

Totals

162

210275

  1. In-house imaging

Imaging of the John Foster Dulles papers started in June. So far, we have completed a pilot of scanning with the sheet-feed of the photocopier, and pilots of microfilm scanning and scanning with a Zeutschel face-up scanner are underway.

Project goals and deliverables

  1. Twelve series or subseries from six collections digitized

To date, five series or subseries have been completely digitized, and three others are in the process of being digitized.

  1. Approximately 416,000 images created and posted online

As of July 1, 2013, 210,275 images have been scanned by the vendor. Of this total, 39,834 images have been posted online. Our vendor is several months ahead of schedule for this project, and in-house scanning is on track. Since beginning in-house scanning in June, 1,838 pages have been scanned by student workers. In the next months, we will calculate the per-page costs for scanning on a Zeutschel face-up scanner and with a microfilm scanner. From there, we plan to image fifty feet of materials with the sheet feeder of the photocopier, 10.3 feet with the Zeutschel face-up scanner, and 33.4 feet with the microfilm scanner.

  1. Six EAD finding aids updated to include links for 17,508 components (folders)

Two finding aids (Council on Foreign Relations Records and Adlai Stevenson Papers) have been updated to include links to digitized content. Another (George F. Kennan Papers) is ready to be updated. This process is managed semi-automatically with a series of shell scripts. After quality control hard drives of images are sent to Princeton’s digital studios. Staff there verify and copy digital assets to permanent storage. After this, PDF and JPEG2000 files are derived from the master TIFFs, and the relationship between these objects is described in an automatically generated METS file. The digital archival object (<dao>) tag is added to the EAD-encoded finding aid for each component.

  1. Digital imaging cost of less than 80 cents per page achieved

The plan of work with our vendor calls for scanning costs well below the 80 cents per page. Our first (and likely least expensive) of three in-house scanning pilots estimates the costs of scanning with the sheet feeder of a copier to be two cents per page. We will have numbers for microfilm scanning and scanning with a face-up scanner at the time of our next report.

  1. Metrics for digital imaging of 20th century archival collections for

    1. In-house microfilm conversion

    2. Sheet feeding through a networked photocopier

    3. Vendor supplied images

The information that we have collected thus far is below. Our vendor metrics are based on the quote and plan of work with The Crowley Company. Sheet feed metrics are collected by having a student worker fill out a minimal, time-stamped form at the beginning and end of each scan, and then analyzing that information. These numbers are preliminary. Sheet-fed scans have not yet been checked for quality control — re-scans may increase the total time per page and dollars per page for this method.

Vendor

Sheet Feed

Microfilm

Zeutschel

Total pages:

270,600*

1838

Total feet:

530.95

1.68

Total time:

2:25:14

Total time (decimal):

2.42

Time per page:

0:00:04

Pages per hour:

270.75

759.33

Hours per foot:

1:26:26

Feet per hour

0.69

Cost per page:

TBD

$0.02

*This number is an estimate, based on an assumed 1200 pages per box. Our reports from Crowley show anywhere from 1050-1750 pages in a box.

Note: in addition to these three methods, we plan to add a fourth – scanning with a face-up scanner (in our case, a Zeutschel scanner table).

  1. Policies and documentation for large-scale digitization initiative created and shared with archival community

As we go forward with our project, we have been blogging not just about the content of our digitized collections, but also our methods and rationales. A blog post written in February explains how this project fits into our other digitization activities and our approach to access. In early June, we wrote about the reasons why this kind of project is so important, and how our materials will now reach researchers worldwide (and of all ages) who might otherwise never come to our reading room in Princeton, New Jersey.

A more formal report on our methods and results will be made available once more data has been gathered.

Records of Adlai Stevenson, Ambassador to the United Nations, Now Available to View Online

In October 1962, at the height of the Cuban missile crisis, Adlai Stevenson spoke the most famous line of his career. The former Illinois governor and two-time presidential candidate was the United States’ ambassador to the United Nations.

After a series of provocative political moves and a failed US attempt to overthrow the Cuban regime,  Nikita Khrushchev proposed the idea of placing Soviet nuclear missiles on Cuba to deter any future invasion attempt in May 1962. By October 14, American spy planes captured images showing sites for medium-range and intermediate-range ballistic nuclear missiles under construction in Cuba.

Tensions mounted quickly. Concurrent with other negotiations, the United States requested an emergency meeting of the United Nations Security Council on October 25. There, Adlai Stevenson confronted Soviet Ambassador Valerian Zorin, challenging him to admit the existence of the missiles. Ambassador Zorin refused to answer.

“Do you, Ambassador Zorin, deny that the U.S.S.R. has placed and is placing medium- and intermediate-range missiles and sites in Cuba? Don’t wait for the translation! Yes or no?”

“I am not in an American courtroom, sir,” Zorin responded, “and therefore I do not wish to answer a question put to me in the manner in which a prosecutor does–”

“You are in the courtroom of world opinion right now,” Stevenson interrupted, “and you can answer yes or no. You have denied that they exist, and I want to know whether I have understood you correctly.”

“You will have your answer in due course,” Zorin replied. “I am prepared to wait for my answer until hell freezes over, if that’s your decision,” countered Stevenson. “And I am also prepared to present the evidence in this room.”

The Mudd Manuscript Library holds the papers of Adlai Stevenson, and as part of our NHPRC-funded project, we have digitized records relating to his tenure as United States Ambassador to the United Nations. Here, especially in his section on Cuba, we get more of the story behind the story — notes, memoranda and letters of congratulations after this memorable speech, and records from 1963-1965, after the crisis and when the cold war was icier than ever.

Patrons can view thumbnails of a file to get a sense of what’s available

Browsing Adlai Stevenson correspondence

Scroll through to see all 164 images.

Simply click on any of the thumbnail images to see a larger view.

The entire file is also available for download in PDF form.

Clicking on this button will download a pdf of the entire file.

Clicking on this button will download a pdf of the entire file.

We hope that researchers everywhere will be able to make use of these newly-available materials. As always, please contact the Mudd Library with questions about any of our collections.

Archives for Everyone

In each of the last two springs, several staff of the Mudd Manuscript Library and other members of the Department of Rare Books and Special Collections have judged at the regional qualifier of the National History Day competition held on Princeton’s campus. This is a contest for middle and high school students who, based on rigorous guidelines, synthesize and analyze information about a historic event. They then create a paper, website, documentary, exhibit or performance explaining what they have learned.

Judging National History Day is a powerful touchstone about the value of archives in the production of history. Each year, I see students adroitly avoid some of the more common traps of historical production — their projects are clear, level-headed, open-minded, and support their claims with evidence. Students who submit the best projects don’t just have a clear argument and lengthy bibliography — they let the primary sources surprise them and challenge their previous conceptions of the past. Yes, they may start with textbooks and biographies, but stronger projects evaluate primary sources. And the very best projects tend to not just look at key documents that have been artificially assembled on a website (although this is valuable too) — they look at records in context and try to make arguments about subtext and authenticity.

The best place to find records in context is usually an archives. But of course, access to archives isn’t easy for students. Working parents may not be able to take their children to the New Jersey Historical Society or National Archives or Mudd Library, as much as they might like to provide that experience. Most archives are only open during the hours when parents are working and visiting these institutions can be intimidating. From a young student’s perspective, it’s often hard to tell what the holdings are and whether the trip will be worth it.

Our NHPRC-funded project hopes to be a model toward ameliorating this barrier to access. We believe that by scanning our records and making them available within the same context that one would see them in the reading room, anyone with an internet connection can have a meaningful scholarly experience without the cost and inconvenience of traveling to Princeton, New Jersey.

We hope that children will benefit as much as anyone from this project. As Cathy Gorn, the Executive Director of National History Day, noted in her letter of support for the grant:

Having primary source materials on the Cold War available via the Internet would allow many NHD students around the country to conduct research for their projects that they ordinarily would not be able to, and the Mudd collections to be digitized are broad enough to support a variety of NHD Projects.

Of course, students don’t just wish to access historical records for National History Day — they want access for the same reasons that any other researcher does. A teenager may want to know more about when and how his family came to America. He might want to know more about the history of his town, and how certain sites came to be created. Or he may be interested in the history of ideas, policies and customs that affect his life. The collections that we plan to digitize — the John Foster Dulles papers, the Allen Dulles papers, the James Forrestal papers, the Council on Foreign Relations records, the George Kennan papers and the Adlai Stevenson papers — document how cold war activities were conducted and understood. They also present an opportunity for students to understand through diaries and correspondence the false starts, misunderstandings, and possible alternatives that constitute all historical events.

The historian John Lewis Gaddis makes the argument for access more persuasively than I could. In his letter of support for our grant, he explained the cost, inconvenience and wear on records for professional researchers trying to do research on-site.

But the most fundamental shortcoming of this old system was the disservice it did to students of history who never got to see an archive in the first place. Maybe they lived abroad. Maybe they attended American universities or colleges that could not provide research support. Maybe they were high school or even elementary students who might have gotten hooked on history for life had they had the chance to work with original materials – but they didn’t have that chance.

Now, however, almost all of them have access to a new means of access, which is of course the internet- even if they’re stuck in a place like Cotulla, Texas, where I grew up. I mention this little town because it’s where the young Lyndon B. Johnson spent a year teaching, in 1928-29, in the then segregated Mexican-American school. What he tried to do for those kids is still remembered: it gets its own chapter in the first volume of Robert Caro’s massive biography. But just think what LBJ could have done as a teacher had he had the resources that are available now. That’s why this project is important.

It has the potential, quite literally, to globalize the possibility of doing archival research. That’s no guarantee that this will produce a greater number of great books than in the past. What it will ensure, however, is a quantum leap in the opportunities students and their teachers will have to bring the excitement of working with original documents into all classrooms. That’s easily as important, I think, as writing the kind of books that might get you tenure at a place like Yale.

Applying “More Product, Less Process” to very large collections: Mudd archivist presents at professional conference

MARAC
Recently project archivist Adriane Hanson participated in a panel at the recent spring conference of the Mid-Atlantic Regional Archives Conference (MARAC) in Cape May, NJ. The topic of her talk was how she is handling the size of her current project, processing 2,500 linear feet of the records of the American Civil Liberties Union Records in a two-year project funded by the National Historical Publications and Records Commission (NHPRC).
In a nutshell, this feat is accomplished by:
1. Stay on top of the schedule through careful project management, collecting metrics to have realistic data on how long each task requires, and frequently revisiting and adjusting the timeline of the project.
2. Be flexible about the workflow, examining the way you have always done things and adjusting as needed to better work with a massive collection.
3. Think of it as data management. Use tools to repurpose data from one step of the project to another, and to analyze and transform the data once the box inventories are complete.
4. Spend extra time writing descriptions about each part of the collection to provide the researcher with important keywords to search for and context to understand the significance of the section. But do not spend time on description that is not aiding in searching, such as lists of document types in the collection inventory. Time should be spent on value-added description.
The slides and text for her presentation are available here.
If you have any questions for her, you can reach her by email: ahanson@princeton.edu