Our NHPRC-Funded Digitization Project at Six Months

Late last year, the Mudd Manuscript Library was granted an award by the National
Historical Publications and Records Commission (NHPRC) to digitize our most-used Public Policy collections, serve them online, and create a report for the larger archival community about cost-efficient digitization practices. Excerpts from our six-month progress report is below.

nhprc-logo-l

Work so far

  1. Project planning

From the time we were awarded the grant to the present, we have produced an overall project plan and timeline, a vendor RFQ and plan of work, in-house quality control procedures for vendor-supplied images, a workplan for in-house scanning, and hardware-specific instructions for in-house scanning. All activities are either on schedule or ahead of schedule. Vendor-supplied digitization is currently eight months ahead of schedule.

  1. Finding a vendor

After distributing an RFQ and collecting bids, we decided on The Crowley Company as our vendor, based on both price and our confidence that they would be able to manage the materials and the work carefully and efficiently.

  1. Managing vendor-supplied digitization

Before materials can go out to the vendor, we first create a manifest of everything we want to send by transforming the EAD-encoded finding aid into an easily-read Excel worksheet. Since we want each folder of material to have a cover sheet that explains the collection name, box number, folder number, URL, and copyright policy, we used collection manifests to make target sheets with this information. A total of 6,943 target sheets were created, printed, and inserted into the beginnings of folders by student workers before materials were sent out to the vendor.

Once materials have been imaged by the vendor, students sample ten percent of the collection to check for completeness and readability. So far, everything has passed quality control with flying colors.

Each month, Crowley sends us a report of how many images have been created that month, how many images have been created cumulatively, and average scanning rate per hour. This information is below:

Boxes Scanned

Pages Scanned

2013 March

15

17119

2013 April

32

45761

2013 May

50

49499

2013 June

65

97896

Totals

162

210275

  1. In-house imaging

Imaging of the John Foster Dulles papers started in June. So far, we have completed a pilot of scanning with the sheet-feed of the photocopier, and pilots of microfilm scanning and scanning with a Zeutschel face-up scanner are underway.

Project goals and deliverables

  1. Twelve series or subseries from six collections digitized

To date, five series or subseries have been completely digitized, and three others are in the process of being digitized.

  1. Approximately 416,000 images created and posted online

As of July 1, 2013, 210,275 images have been scanned by the vendor. Of this total, 39,834 images have been posted online. Our vendor is several months ahead of schedule for this project, and in-house scanning is on track. Since beginning in-house scanning in June, 1,838 pages have been scanned by student workers. In the next months, we will calculate the per-page costs for scanning on a Zeutschel face-up scanner and with a microfilm scanner. From there, we plan to image fifty feet of materials with the sheet feeder of the photocopier, 10.3 feet with the Zeutschel face-up scanner, and 33.4 feet with the microfilm scanner.

  1. Six EAD finding aids updated to include links for 17,508 components (folders)

Two finding aids (Council on Foreign Relations Records and Adlai Stevenson Papers) have been updated to include links to digitized content. Another (George F. Kennan Papers) is ready to be updated. This process is managed semi-automatically with a series of shell scripts. After quality control hard drives of images are sent to Princeton’s digital studios. Staff there verify and copy digital assets to permanent storage. After this, PDF and JPEG2000 files are derived from the master TIFFs, and the relationship between these objects is described in an automatically generated METS file. The digital archival object (<dao>) tag is added to the EAD-encoded finding aid for each component.

  1. Digital imaging cost of less than 80 cents per page achieved

The plan of work with our vendor calls for scanning costs well below the 80 cents per page. Our first (and likely least expensive) of three in-house scanning pilots estimates the costs of scanning with the sheet feeder of a copier to be two cents per page. We will have numbers for microfilm scanning and scanning with a face-up scanner at the time of our next report.

  1. Metrics for digital imaging of 20th century archival collections for

    1. In-house microfilm conversion

    2. Sheet feeding through a networked photocopier

    3. Vendor supplied images

The information that we have collected thus far is below. Our vendor metrics are based on the quote and plan of work with The Crowley Company. Sheet feed metrics are collected by having a student worker fill out a minimal, time-stamped form at the beginning and end of each scan, and then analyzing that information. These numbers are preliminary. Sheet-fed scans have not yet been checked for quality control — re-scans may increase the total time per page and dollars per page for this method.

Vendor

Sheet Feed

Microfilm

Zeutschel

Total pages:

270,600*

1838

Total feet:

530.95

1.68

Total time:

2:25:14

Total time (decimal):

2.42

Time per page:

0:00:04

Pages per hour:

270.75

759.33

Hours per foot:

1:26:26

Feet per hour

0.69

Cost per page:

TBD

$0.02

*This number is an estimate, based on an assumed 1200 pages per box. Our reports from Crowley show anywhere from 1050-1750 pages in a box.

Note: in addition to these three methods, we plan to add a fourth – scanning with a face-up scanner (in our case, a Zeutschel scanner table).

  1. Policies and documentation for large-scale digitization initiative created and shared with archival community

As we go forward with our project, we have been blogging not just about the content of our digitized collections, but also our methods and rationales. A blog post written in February explains how this project fits into our other digitization activities and our approach to access. In early June, we wrote about the reasons why this kind of project is so important, and how our materials will now reach researchers worldwide (and of all ages) who might otherwise never come to our reading room in Princeton, New Jersey.

A more formal report on our methods and results will be made available once more data has been gathered.

One thought on “Our NHPRC-Funded Digitization Project at Six Months

  1. Pingback: Political Cartoons now available online | Mudd Manuscript Library Blog