Why — and How — We Digitize

It’s Feb­ru­ary, and we’re now in the sec­ond month of our NHPRC-funded dig­i­ti­za­tion project. In twenty-three more months, we’ll have com­pleted scan­ning and upload­ing 400,000 pages of our most-viewed mate­r­ial to our find­ing aids, and any­one with an inter­net con­nec­tion will be able to view it.

This is just the most recent effort to intro­duce dig­i­ti­za­tion as a nor­mal part of our prac­tice at Mudd. As I said in my pre­vi­ous post, we know that it’s well and good that we have col­lec­tions that doc­u­ment the his­tory of US diplo­macy, eco­nom­ics, jour­nal­ism and civil rights in the twen­ti­eth and twenty-first cen­turies. But for the major­ity of poten­tial users, who may never be able to come to Prince­ton, NJ, this is irrel­e­vant. How­ever inter­ested they may be, they may never be able to afford to visit us. And there’s a whole other sub­set of poten­tial users — let’s call them work­ing peo­ple — who can’t come between the hours of 9:00 and 4:45, Mon­day through Fri­day. Are we really pro­vid­ing fair and equi­table access under these con­di­tions? Since we have the resources to dig­i­tize, it’s imper­a­tive that we develop the infra­struc­ture and polit­i­cal will to do so.

We know that it’s time to get seri­ous — and smart — about scanning.

The ball has been rolling in this direc­tion for some time. We have three “streams” of mak­ing dig­i­tal con­tent avail­able, and with our new find­ing aids site, we have an intu­itive way of link­ing descrip­tions of our mate­ri­als to the mate­ri­als themselves.

Images of the collection in the context of the finding aid

Images of the col­lec­tion in the con­text of the find­ing aid

Our first is patron-driven digitization.

The Zeutschel -- our amazing German powerhouse face-up scanner

This is our Zeutschel scan­ner. It does amaz­ing work, is easy on our mate­ri­als, and usu­ally requires very lit­tle qual­ity control.

Archives have been pro­vid­ing pho­todu­pli­ca­tion ser­vices since the advent of the pho­to­copier. At Mudd, we have ded­i­cated staff who have been doing this work for decades. Recently, we’ve just slightly tweaked our processes to cre­ate scans instead of paper copies and to (in many cases) re-use the scans that we make so that they’re avail­able to all patrons, not just the one request­ing the scan.

A patron (maybe you!) finds some­thing in our find­ing aids that he thinks he may be inter­ested in, and asks for a copy.

If he’s in our read­ing room, he flags the pages of mate­r­ial he wants. If he’s remote, he iden­ti­fies the fold­ers or vol­umes to be scanned. The archivist tells him how much the scan will cost, and he pre-pays.

Now, the scan­ning. This either hap­pens on our pho­to­copier (the tech­ni­cian can press “scan” instead of “pho­to­copy” to cre­ate a dig­i­tal file instead of a paper one) or on our Zeutschel scan­ner. And while we feel happy and lucky to have the Zeutschel, we don’t strictly need it to ful­fill our mis­sion to digitize.

The scan is named in a way that asso­ciates it with the descrip­tion of the mate­r­ial in the find­ing aid, and is then linked up and served online. We cur­rently send the patron an email of this scan, but in the future we may just send them a link to the uploaded content.

Our sec­ond stream is tar­geted dig­i­ti­za­tion based on users’ view­ing patterns

Our friendly student receptionist, Ashley, scans materials at the front desk when she isn't welcoming patrons.

Our stu­dent recep­tion­ist, Ash­ley, scans mate­ri­als at the front desk when she isn’t wel­com­ing patrons.

We try to keep lots of good infor­ma­tion about what our users find inter­est­ing. We use a ser­vice called google ana­lyt­ics to learn about what users are brows­ing online, and we keep sta­tis­tics about which phys­i­cal mate­ri­als patrons see in the read­ing room.

From these sources, we cre­ate a list of most-viewed mate­ri­als, and set up a sys­tem for our stu­dents to scan them in their down­time when they’re work­ing at the front desk.

We do this because we want to make sure that we’re putting the effort into dig­i­tiz­ing resources that patrons actu­ally want to see — there are more than 35,000 lin­ear feet of mate­ri­als at the Mudd Library. We prob­a­bly won’t ever be able to dig­i­tize absolutely every­thing, and it wouldn’t make sense to start from “A” and go to “Z”. So, we pay atten­tion to trends and try to antic­i­pate what researchers might find useful.

Our final stream — and the one for which we cur­rently have to rely on exter­nal sup­port — is large-scale vendor-supplied digitization.

Our cur­rent cold war project is a great exam­ple of this. We’ve put together a project plan, cho­sen mate­ri­als, called for quotes and cho­sen a ven­dor. We recently shipped our first col­lec­tion to be dig­i­tized, and I’ll be post­ing infor­ma­tion to the blog as we move forward.

Another good exam­ple of an externally-supported dig­i­ti­za­tion activ­ity is the scan­ning of micro­film from our Amer­i­can Civil Lib­er­ties Union Records. Our ear­li­est records were micro­filmed decades ago and recently, Pro­fes­sor Sam Walker sup­ported the dig­i­ti­za­tion of some of this micro­film so that they could be made avail­able online.

No sin­gle stream — externally-supported projects, left-to-right scan­ning, or patron-driven dig­i­ti­za­tion — would be enough to sup­port our goal of max­i­miz­ing the con­tent avail­able online. We hope that the three, each pur­sued aggres­sively, will help us real­ize our mis­sion of pro­vid­ing equi­table access to our mate­ri­als. And we think that focus­ing on this cold war project will help us reflect on and improve all of our dig­i­ti­za­tion activities.

The Daily Princetonian is digitized and keyword searchable

prince_inverted.gif

The Prince­ton Uni­ver­sity Archives, work­ing in con­junc­tion with the Prince­ton Uni­ver­sity Library Dig­i­tal Ini­tia­tives, has nearly com­pleted a mon­u­men­tal project that will change the way researchers inves­ti­gate Uni­ver­sity his­tory. The stu­dent news­pa­per, The Daily Prince­ton­ian, has been dig­i­tized from its incep­tion in 1876 through 2002. The site has been avail­able in beta for almost two years, but all issues will be loaded as of June 30, 2012. At the sug­ges­tion of The Daily Prince­ton­ian alumni board who have been among the prime back­ers of this project, the site is named in honor of the newspaper’s long-serving pro­duc­tion man­ager Larry Dupraz, and researchers are able to per­form sophis­ti­cated key­word searches that can unlock the vast rich­ness of the daily news­pa­per that doc­u­ments so much of the University’s his­tory. (For the years 2002– present, users may search online via the Daily Prince site.)

DailyPsearchsreenshop

I wrote my final paper for my Fresh­man Writ­ing Sem­i­nar about how the pres­ence of vet­er­ans on Princeton’s cam­pus fol­low­ing World War II affected Princeton’s aca­d­e­mic envi­ron­ment and social atmos­phere,” said Jen­nifer Kling­man ’13. “My research heav­ily relied on The Daily Prince­ton­ian archives, and I had to spend a lot of time and energy search­ing for rel­e­vant arti­cles in Firestone’s micro­form ver­sions of the news­pa­per. It was dif­fi­cult to comb through the arti­cles, and as a result my research was lim­ited in scope. This spring, I wrote my his­tory depart­ment junior paper on aca­d­e­mic and social changes tak­ing place at Prince­ton dur­ing the late 1940s and 1950s. The online Daily Prince­ton­ian archives proved to be invalu­able. I was able to access the archives any­where and at any time, and use the archives’ search func­tion to find a num­ber of extremely use­ful arti­cles. My inde­pen­dent work has def­i­nitely ben­e­fited from the exis­tence of the online archives.”

100_0988

Free­lance jour­nal­ist W. Barks­dale May­nard ’88 states “I am able to write about the social his­tory of Prince­ton in an entirely new way and have restruc­tured my research to take full advan­tage of this excit­ing new resource. For my Prince­ton Alumni Weekly arti­cle on the early his­tory of auto­mo­biles at Prince­ton, the Dupraz Dig­i­tal Archives allowed me to iden­tify every ref­er­ence to cars as early as 1901, to pin­point who owned them and what kinds. I would never have attempted this arti­cle with­out The Dupraz Dig­i­tal Archives.”

Maynard’s PAW col­league, Gregg Lange ’70, reg­u­larly uses the site for his col­umn, “Rally Round the Can­non,” which exam­ines and appraises Uni­ver­sity his­tory. “You can piece together the story of Prince­ton foot­ball or Woodrow Wil­son in a dozen ways. But the unique acces­si­bil­ity of a daily pub­li­ca­tion allows more sub­tle top­ics to arise and recede, and for cross-generational tales to emerge. Be it Ella Fitzger­ald singing at a Prince­ton dance at age 19, then receiv­ing an hon­orary degree 54 years later; or stu­dent revolts against the clubs’ Bicker selec­tion sys­tem in 1917 and 1940 pre­sag­ing its loss of monop­oly in 1968, the com­bi­na­tion of detail and long view is indis­pens­able in under­stand­ing the ethos of the insti­tu­tion over time, and essen­tially inac­ces­si­ble with­out the DuPraz tech­nol­ogy and pre­ci­sion. And exis­ten­tially, if I never see another micro­fiche in my life I will die a happy man.”

May­nard added, “My reg­u­lar col­umn in PAW, “From Princeton’s Vault,” has ben­e­fited enor­mously. Recently I was able to iden­tify the ear­li­est ref­er­ences to Prince­to­ni­ans as “tigers,” which had been guess­work pre­vi­ously. It turns out we were wrong by a decade.

This has been an inter­na­tional project, with the news­pa­pers sent from Prince­ton to Brechin Imag­ing in Canada, where TIFF images are gen­er­ated using high end Ger­man cam­eras. The files are then sent via a hard drive to Cam­bo­dia, where Dig­i­tal Divide Data ana­lyzes the struc­ture of each page and uses an opti­cal char­ac­ter recog­ni­tion (OCR) pro­gram to derive machine-readable text, which allows for key­word search­ing. The hard drive is then shipped to Austin, Texas, where the US office of New Zealand com­pany DL Con­sult­ing loads the data into a content-management sys­tem called Verid­ian, which sup­ports search­ing and brows­ing, online read­ing, arti­cle extrac­tion and print­ing, and other features.

Within the library, many hands have worked for this project’s suc­cess. At Mudd Library, project archivists Dan Bren­nan and then Adri­ane Han­son have over­seen the day-to-day work of the project, man­ag­ing the ship­ment of the news­pa­pers to Brechin, as well as super­vis­ing stu­dents with the qual­ity con­trol phase. Uni­ver­sity Archivist Dan Linke raised the funds from var­i­ous Uni­ver­sity and alumni sources and coor­di­nated the project.

Within the greater Library sys­tem, Cliff Wulf­man, the Library’s Dig­i­tal Ini­tia­tives Coor­di­na­tor, took the lead in writ­ing the Request for Pro­pos­als and then select­ing and coor­di­nat­ing the work with DDD, as well as pro­vid­ing tech­ni­cal assis­tance, sup­port and vision. The Library Sys­tem Office’s Anto­nio Bar­rera designed the front end web page with Phil Menos pro­vid­ing server sup­port, and Deputy Uni­ver­sity Librar­ian and Sys­tems Librar­ian Mar­vin Bielawski allo­cated the funds to acquire the Verid­ian software.

The project employs the METS/ALTO markup stan­dard, the same used by the Library of Congress’s News­pa­per Dig­i­ti­za­tion Project, which means that as soft­ware changes and improves, we will be able to sus­tain this resource for many years to come.

100_0996

Most used Princeton theses

Dear Mr. Mudd, I was won­der­ing what is the most popular/most requested senior the­sis in the Uni­ver­sity Archives collection?

This is a peren­nial ques­tion and the short answer is that with the excep­tion of celebrity alumni the­ses, there are few the­ses that are pulled with any reg­u­lar­ity, yet the col­lec­tion as a whole (total­ing over 60,000 the­ses) is our most used col­lec­tion within the Uni­ver­sity Archives. Last year over 1,000 the­ses were viewed by visitors–mostly Prince­ton undergraduates–to the Mudd Library, which accounted for about 1/4 of all Archives mate­ri­als circulated.

Kopphoto
cainphoto

Wendy Kopp’s the­sis is always among those requested by remote researchers–that is, those who do not visit the library in per­son, and when­ever a Prince­ton­ian makes news or is on a hit show, their the­sis is often requested.

In the past, this included Went­worth Miller III (when Prison Break was a hit), David Duchovny (for the X Files) and Dean Cain (Adven­tures of Lois and Clark), as well as all three now sit­ting Supreme Court Jus­tices: Samuel Alito, Elena Kagan, and Sonia Sotomayor.

The entire the­ses col­lec­tion can be searched via this data­base, and Archives staff are work­ing to make future senior the­ses avail­able online to the Prince­ton com­mu­nity start­ing in 2013.

University Archives featured in Princeton Alumni Weekly

Every few weeks the Prince­ton Alumni Weekly focuses one seg­ment of the mag­a­zine to high­light items from the Prince­ton Uni­ver­sity Archives enti­tled “From the Vault.”

The arti­cles are researched and writ­ten by alum­nus W. Barks­dale May­nard ’88 who has been con­tribut­ing the con­tent to the PAW for two years. Mr. May­nard has also writ­ten a few books, two focus­ing on Prince­ton, which you can see here. The con­cept of the arti­cles orig­i­nated with Edi­tor Mar­i­lyn H. Marks *86 who has an inter­est in the Uni­ver­sity Archives, which are housed at the See­ley G. Mudd Man­u­script Library. http://www.princeton.edu/mudd/

The most recent arti­cle focuses on a for­mer Prince­ton alumni who was aboard the Titanic when it sank. http://paw.princeton.edu/issues/2012/04/04/pages/7288/

Recently PAW pho­tog­ra­pher Ric­cardo Bar­ros and Art Direc­tor Mar­i­anne Gaffney Nel­son came to Mudd to pho­to­graph phys­i­cal items included in the col­lec­tions for upcom­ing issues of the PAW. Here you can see a behind the scene’s view of how those arti­cles come to life.
100_0928
100_0933
100_0935
Keep check­ing the next few issues of the PAW to see these items explained!!
For more about the Uni­ver­sity Archives click here.

University Archives materials in new Art Museum exhibition

A new exhi­bi­tion at the Prince­ton Uni­ver­sity Art Museum fea­tures items bor­rowed from the Prince­ton Uni­ver­sity Archives. Prince­ton and the Gothic Revival: 1870–1930 is a look into “Amer­i­cans’ chang­ing atti­tudes to the art, archi­tec­ture, and style of the Mid­dle Ages through the lens of Prince­ton Uni­ver­sity around the turn of the twen­ti­eth cen­tury” and opens to the pub­lic this Sat­ur­day, Feb­ru­ary 25, 2012.

Chapel exterior
Alexan­der Hoyle for Cram and Fer­gu­son, architects

The exhibit includes 10 items loaned from the Prince­ton Uni­ver­sity Archives, includ­ing the sig­na­ture image for the exhi­bi­tion, a water­color of the Uni­ver­sity Chapel (above). Other items include archi­tec­tural draw­ings of the Mar­quand Chapel, Holder Hall, Madi­son Hall and the South Court Tower, and some sug­gested addi­tions for the uni­ver­sity library from 1898, which at that time was housed in Chan­cel­lor Green.

One piece needed some intri­cate and del­i­cate con­ser­va­tion efforts from Uni­ver­sity Paper Con­ser­va­tor Ted Stan­ley. A water­color of the pro­posed exte­rior of the A. Page Brown, Class of 1877 Bio­log­i­cal Lab­o­ra­tory had split in half. Stan­ley was able to restore the water­color and the board it was mounted on to its orig­i­nal form to hide the sep­a­ra­tion. We chal­lenge you to find the seam!

This is the first time that any of the archives mate­r­ial has been loaned and dis­played at the Prince­ton Art Museum. The exhibit will run from Feb­ru­ary 25th to June 24, 2012

For more about Prince­ton and the Gothic Revival: 1870–1930 or the Prince­ton Art Museum, visit their web­site.