Our NHPRC-Funded Digitization Project at Six Months

Late last year, the Mudd Manuscript Library was granted an award by the National
Historical Publications and Records Commission (NHPRC) to digitize our most-used Public Policy collections, serve them online, and create a report for the larger archival community about cost-efficient digitization practices. Excerpts from our six-month progress report is below.

nhprc-logo-l

Work so far

  1. Project planning

From the time we were awarded the grant to the present, we have produced an overall project plan and timeline, a vendor RFQ and plan of work, in-house quality control procedures for vendor-supplied images, a workplan for in-house scanning, and hardware-specific instructions for in-house scanning. All activities are either on schedule or ahead of schedule. Vendor-supplied digitization is currently eight months ahead of schedule.

  1. Finding a vendor

After distributing an RFQ and collecting bids, we decided on The Crowley Company as our vendor, based on both price and our confidence that they would be able to manage the materials and the work carefully and efficiently.

  1. Managing vendor-supplied digitization

Before materials can go out to the vendor, we first create a manifest of everything we want to send by transforming the EAD-encoded finding aid into an easily-read Excel worksheet. Since we want each folder of material to have a cover sheet that explains the collection name, box number, folder number, URL, and copyright policy, we used collection manifests to make target sheets with this information. A total of 6,943 target sheets were created, printed, and inserted into the beginnings of folders by student workers before materials were sent out to the vendor.

Once materials have been imaged by the vendor, students sample ten percent of the collection to check for completeness and readability. So far, everything has passed quality control with flying colors.

Each month, Crowley sends us a report of how many images have been created that month, how many images have been created cumulatively, and average scanning rate per hour. This information is below:

Boxes Scanned

Pages Scanned

2013 March

15

17119

2013 April

32

45761

2013 May

50

49499

2013 June

65

97896

Totals

162

210275

  1. In-house imaging

Imaging of the John Foster Dulles papers started in June. So far, we have completed a pilot of scanning with the sheet-feed of the photocopier, and pilots of microfilm scanning and scanning with a Zeutschel face-up scanner are underway.

Project goals and deliverables

  1. Twelve series or subseries from six collections digitized

To date, five series or subseries have been completely digitized, and three others are in the process of being digitized.

  1. Approximately 416,000 images created and posted online

As of July 1, 2013, 210,275 images have been scanned by the vendor. Of this total, 39,834 images have been posted online. Our vendor is several months ahead of schedule for this project, and in-house scanning is on track. Since beginning in-house scanning in June, 1,838 pages have been scanned by student workers. In the next months, we will calculate the per-page costs for scanning on a Zeutschel face-up scanner and with a microfilm scanner. From there, we plan to image fifty feet of materials with the sheet feeder of the photocopier, 10.3 feet with the Zeutschel face-up scanner, and 33.4 feet with the microfilm scanner.

  1. Six EAD finding aids updated to include links for 17,508 components (folders)

Two finding aids (Council on Foreign Relations Records and Adlai Stevenson Papers) have been updated to include links to digitized content. Another (George F. Kennan Papers) is ready to be updated. This process is managed semi-automatically with a series of shell scripts. After quality control hard drives of images are sent to Princeton’s digital studios. Staff there verify and copy digital assets to permanent storage. After this, PDF and JPEG2000 files are derived from the master TIFFs, and the relationship between these objects is described in an automatically generated METS file. The digital archival object (<dao>) tag is added to the EAD-encoded finding aid for each component.

  1. Digital imaging cost of less than 80 cents per page achieved

The plan of work with our vendor calls for scanning costs well below the 80 cents per page. Our first (and likely least expensive) of three in-house scanning pilots estimates the costs of scanning with the sheet feeder of a copier to be two cents per page. We will have numbers for microfilm scanning and scanning with a face-up scanner at the time of our next report.

  1. Metrics for digital imaging of 20th century archival collections for

    1. In-house microfilm conversion

    2. Sheet feeding through a networked photocopier

    3. Vendor supplied images

The information that we have collected thus far is below. Our vendor metrics are based on the quote and plan of work with The Crowley Company. Sheet feed metrics are collected by having a student worker fill out a minimal, time-stamped form at the beginning and end of each scan, and then analyzing that information. These numbers are preliminary. Sheet-fed scans have not yet been checked for quality control — re-scans may increase the total time per page and dollars per page for this method.

Vendor

Sheet Feed

Microfilm

Zeutschel

Total pages:

270,600*

1838

Total feet:

530.95

1.68

Total time:

2:25:14

Total time (decimal):

2.42

Time per page:

0:00:04

Pages per hour:

270.75

759.33

Hours per foot:

1:26:26

Feet per hour

0.69

Cost per page:

TBD

$0.02

*This number is an estimate, based on an assumed 1200 pages per box. Our reports from Crowley show anywhere from 1050-1750 pages in a box.

Note: in addition to these three methods, we plan to add a fourth – scanning with a face-up scanner (in our case, a Zeutschel scanner table).

  1. Policies and documentation for large-scale digitization initiative created and shared with archival community

As we go forward with our project, we have been blogging not just about the content of our digitized collections, but also our methods and rationales. A blog post written in February explains how this project fits into our other digitization activities and our approach to access. In early June, we wrote about the reasons why this kind of project is so important, and how our materials will now reach researchers worldwide (and of all ages) who might otherwise never come to our reading room in Princeton, New Jersey.

A more formal report on our methods and results will be made available once more data has been gathered.

Why — and How — We Digitize

It’s February, and we’re now in the second month of our NHPRC-funded digitization project. In twenty-three more months, we’ll have completed scanning and uploading 400,000 pages of our most-viewed material to our finding aids, and anyone with an internet connection will be able to view it.

This is just the most recent effort to introduce digitization as a normal part of our practice at Mudd. As I said in my previous post, we know that it’s well and good that we have collections that document the history of US diplomacy, economics, journalism and civil rights in the twentieth and twenty-first centuries. But for the majority of potential users, who may never be able to come to Princeton, NJ, this is irrelevant. However interested they may be, they may never be able to afford to visit us. And there’s a whole other subset of potential users — let’s call them working people — who can’t come between the hours of 9:00 and 4:45, Monday through Friday. Are we really providing fair and equitable access under these conditions? Since we have the resources to digitize, it’s imperative that we develop the infrastructure and political will to do so.

We know that it’s time to get serious — and smart — about scanning.

The ball has been rolling in this direction for some time. We have three “streams” of making digital content available, and with our new finding aids site, we have an intuitive way of linking descriptions of our materials to the materials themselves.

Images of the collection in the context of the finding aid

Images of the collection in the context of the finding aid

Our first is patron-driven digitization.

The Zeutschel -- our amazing German powerhouse face-up scanner

This is our Zeutschel scanner. It does amazing work, is easy on our materials, and usually requires very little quality control.

Archives have been providing photoduplication services since the advent of the photocopier. At Mudd, we have dedicated staff who have been doing this work for decades. Recently, we’ve just slightly tweaked our processes to create scans instead of paper copies and to (in many cases) re-use the scans that we make so that they’re available to all patrons, not just the one requesting the scan.

A patron (maybe you!) finds something in our finding aids that he thinks he may be interested in, and asks for a copy.

If he’s in our reading room, he flags the pages of material he wants. If he’s remote, he identifies the folders or volumes to be scanned. The archivist tells him how much the scan will cost, and he pre-pays.

Now, the scanning. This either happens on our photocopier (the technician can press “scan” instead of “photocopy” to create a digital file instead of a paper one) or on our Zeutschel scanner. And while we feel happy and lucky to have the Zeutschel, we don’t strictly need it to fulfill our mission to digitize.

The scan is named in a way that associates it with the description of the material in the finding aid, and is then linked up and served online. We currently send the patron an email of this scan, but in the future we may just send them a link to the uploaded content.

Our second stream is targeted digitization based on users’ viewing patterns

Our friendly student receptionist, Ashley, scans materials at the front desk when she isn't welcoming patrons.

Our student receptionist, Ashley, scans materials at the front desk when she isn’t welcoming patrons.

We try to keep lots of good information about what our users find interesting. We use a service called google analytics to learn about what users are browsing online, and we keep statistics about which physical materials patrons see in the reading room.

From these sources, we create a list of most-viewed materials, and set up a system for our students to scan them in their downtime when they’re working at the front desk.

We do this because we want to make sure that we’re putting the effort into digitizing resources that patrons actually want to see — there are more than 35,000 linear feet of materials at the Mudd Library. We probably won’t ever be able to digitize absolutely everything, and it wouldn’t make sense to start from “A” and go to “Z”. So, we pay attention to trends and try to anticipate what researchers might find useful.

Our final stream — and the one for which we currently have to rely on external support — is large-scale vendor-supplied digitization.

Our current cold war project is a great example of this. We’ve put together a project plan, chosen materials, called for quotes and chosen a vendor. We recently shipped our first collection to be digitized, and I’ll be posting information to the blog as we move forward.

Another good example of an externally-supported digitization activity is the scanning of microfilm from our American Civil Liberties Union Records. Our earliest records were microfilmed decades ago and recently, Professor Sam Walker supported the digitization of some of this microfilm so that they could be made available online.

No single stream — externally-supported projects, left-to-right scanning, or patron-driven digitization — would be enough to support our goal of maximizing the content available online. We hope that the three, each pursued aggressively, will help us realize our mission of providing equitable access to our materials. And we think that focusing on this cold war project will help us reflect on and improve all of our digitization activities.

Mudd Technical Services Meeting Minutes: June 2012

Mudd Technical Services Meeting Minutes – June 2012

Maureen Callahan

Maureen has finished managing the Princeton Weekly Bulletin digitization project – this resource is now available online. In addition to her usual reference and accessioning work, she also created a number of orientation screencasts for the new finding aids site, and is finishing writing notes for the Bill Bradley Papers. She, Dan Linke, and (mostly) John Walako installed the new exhibit in the Millberg gallery, “The Election for Woodrow Wilson’s America,” which will be on display through the end of the year.

Lynn Durgin

Lynn oversaw data collection and processing of 2012 senior theses (completed 15 of 33 departments); implemented a new system for applying dissertation embargoes in DataSpace and ProQuest; and created ten new University Archives accession records.

Adriane Hanson

Adriane began work in earnest this month on her summer projects, preparing the next batch of Daily Princetonian newspapers (2003-2012) and the Western European Theater Political Pamphlets for digitization.  She also worked with three patrons in to use the newly open ACLU Records and is preparing to speak on the project at the annual meeting of the Society of American Archivists in August.

Christie Peterson

Christie finalized all remaining work and reports from the P collection shelf read/reconciliation project. She created three new collections and added materials to seven additional collections in an ongoing project to assimilate all unprocessed University Archives materials. In continuing her work with born-digital materials, Christie and Dan Santamaria attended an SAA workshop on digital forensics for archivists, and Christie began work on an accessioning workflow that incorporates these materials. She also trained a new summer student on cataloging photographs in the Historical Photographs Collection database, and he restarted work on that project. Finally, Christie has announced that she will be leaving to start a new job September 1.

Dan also updated the group on the progress of various initiatives, in particular new developments with the redesigned EAD site, Primo, Aeon, and related issues.   We also discussed Bethany Nowviskie’s keynote talk at the March 2012 code4lib conference on the concept of “Lazy Consensus.”

For more information or questions mudd@princeton.edu

Technical Services at Mudd Library: What do they do?

Ever wonder what some of the staff here at Mudd spend their time working on? Our Technical Services department has been hard at work and here is a quick summary of what they have completed!

Maureen Callahan: Public Policy Papers Project Archivist

Maureen has been supervising the final inventory work for the Bill Bradley papers, working with Dan Linke on an exhibit about Woodrow Wilson and the 1912 election, and writing help text for the new finding aids site, which is now in beta testing, (along with her usual reference and accessioning work). She is also organizing a June 26 Delaware Valley Archivists’ Group meeting about copyright, copyfraud and rights & permissions policies in archives.

Lynn Durgin: Special Collections Assistant for Technical Services

Lynn worked with ProQuest, the Graduate School and OIT to implement a policy change on Publishing Options for Princeton University Dissertations, which now allows for dissertation embargoes in ProQuest and in Princeton’s DataSpace.  She also completed processing of 13 University Archives accessions.

Adriane Hanson: Economic Papers Project Archivist

Adriane is wrapping up the 2-year grant project to process 2,500 linear feet of American Civil Liberties Union Records, which will be completed in June.  This month, she finished the finding aid for the last series, so we now have the description of the entire collection online and researchers have started to come use it, and we physically put the boxes in order.  She also started planning for the next phase of the Daily Princetonian digitization project, which will be for the years 2003-present and will repurpose PDFs saved by the Daily Princetonian staff where possible.

Christie Peterson: University Archives Project Archivist

Christie completed reconciling the results of last summer’s P collection (Princetoniana) shelf read with Voyager (our cataloging system). She continued to investigate tools and methods for accessioning and managing born-digital materials in the archives through a site visit with electronic records archivists at Yale University. She also integrated additions to 12 different collections, oversaw the processing of another collection by a Special Collections Assistant, and met with developers from OIT to plan and move forward on the creation of a new web interface for the redesigned photograph, AV and memorabilia databases.

The group also discussed readings selected by Lynn from Controlling the Past: Documenting Society and Institutions, Essays in Honor of Helen Willa Samuels. The selections (one by Richard Katz and Paul Gandel and one by Elizabeth Yakel) reflect on documentation strategy in the context of the digital age and social media.

Questions? Email: Mudd Library

Recent History of the Princeton University Library Catalog

The following essay by Richard J. Schulz, Associate University Librarian for Technical Services, was prepared in conjunction with the announcement that Firestone Library’s card catalog will be disassembled this summer. As the University Archives maintains the historical records of the University Library, we offer this for our patrons’ edification with thanks to the author for his permission in posting it.
The Card Catalog served as Princeton University Library’s primary database of acquired holdings until it was closed in 1981 when a major change in cataloging rules (AACR2) was adopted by the Library of Congress and all major research libraries in North America, Great Britain and many other libraries world-wide. As of 1981, no new cataloging was added to the Card Catalog. Updating of penciled-in bound volume holding notations to the records for existing serial and book-set titles continued to be made until 1989, when a project to retrospectively convert all active card serial and set titles was consummated. After 1989, therefore, the Card Catalog became a static partial representation of titles which the Library had acquired prior to 1981; in the terminology of the period, its status had changed from being “closed” to being “dead.”
In 1969, a microfilm copy was made of the pre-AACR2 Card Catalog as a backup for security reasons. This film copy is stored at the Library’s remote book shelving facility (ReCAP). A large number of the older hand-written card files in the Card Catalog had, at some earlier time, been re-typed, likely as a preservation measure. Documentation describing when this decision was made, and the extent to which it was applied, has been lost.

Continue reading