The Daily Princetonian is digitized and keyword searchable

prince_inverted.gif

The Prince­ton Uni­ver­sity Archives, work­ing in con­junc­tion with the Prince­ton Uni­ver­sity Library Dig­i­tal Ini­tia­tives, has nearly com­pleted a mon­u­men­tal project that will change the way researchers inves­ti­gate Uni­ver­sity his­tory. The stu­dent news­pa­per, The Daily Prince­ton­ian, has been dig­i­tized from its incep­tion in 1876 through 2002. The site has been avail­able in beta for almost two years, but all issues will be loaded as of June 30, 2012. At the sug­ges­tion of The Daily Prince­ton­ian alumni board who have been among the prime back­ers of this project, the site is named in honor of the newspaper’s long-serving pro­duc­tion man­ager Larry Dupraz, and researchers are able to per­form sophis­ti­cated key­word searches that can unlock the vast rich­ness of the daily news­pa­per that doc­u­ments so much of the University’s his­tory. (For the years 2002– present, users may search online via the Daily Prince site.)

DailyPsearchsreenshop

I wrote my final paper for my Fresh­man Writ­ing Sem­i­nar about how the pres­ence of vet­er­ans on Princeton’s cam­pus fol­low­ing World War II affected Princeton’s aca­d­e­mic envi­ron­ment and social atmos­phere,” said Jen­nifer Kling­man ’13. “My research heav­ily relied on The Daily Prince­ton­ian archives, and I had to spend a lot of time and energy search­ing for rel­e­vant arti­cles in Firestone’s micro­form ver­sions of the news­pa­per. It was dif­fi­cult to comb through the arti­cles, and as a result my research was lim­ited in scope. This spring, I wrote my his­tory depart­ment junior paper on aca­d­e­mic and social changes tak­ing place at Prince­ton dur­ing the late 1940s and 1950s. The online Daily Prince­ton­ian archives proved to be invalu­able. I was able to access the archives any­where and at any time, and use the archives’ search func­tion to find a num­ber of extremely use­ful arti­cles. My inde­pen­dent work has def­i­nitely ben­e­fited from the exis­tence of the online archives.”

100_0988

Free­lance jour­nal­ist W. Barks­dale May­nard ’88 states “I am able to write about the social his­tory of Prince­ton in an entirely new way and have restruc­tured my research to take full advan­tage of this excit­ing new resource. For my Prince­ton Alumni Weekly arti­cle on the early his­tory of auto­mo­biles at Prince­ton, the Dupraz Dig­i­tal Archives allowed me to iden­tify every ref­er­ence to cars as early as 1901, to pin­point who owned them and what kinds. I would never have attempted this arti­cle with­out The Dupraz Dig­i­tal Archives.”

Maynard’s PAW col­league, Gregg Lange ’70, reg­u­larly uses the site for his col­umn, “Rally Round the Can­non,” which exam­ines and appraises Uni­ver­sity his­tory. “You can piece together the story of Prince­ton foot­ball or Woodrow Wil­son in a dozen ways. But the unique acces­si­bil­ity of a daily pub­li­ca­tion allows more sub­tle top­ics to arise and recede, and for cross-generational tales to emerge. Be it Ella Fitzger­ald singing at a Prince­ton dance at age 19, then receiv­ing an hon­orary degree 54 years later; or stu­dent revolts against the clubs’ Bicker selec­tion sys­tem in 1917 and 1940 pre­sag­ing its loss of monop­oly in 1968, the com­bi­na­tion of detail and long view is indis­pens­able in under­stand­ing the ethos of the insti­tu­tion over time, and essen­tially inac­ces­si­ble with­out the DuPraz tech­nol­ogy and pre­ci­sion. And exis­ten­tially, if I never see another micro­fiche in my life I will die a happy man.”

May­nard added, “My reg­u­lar col­umn in PAW, “From Princeton’s Vault,” has ben­e­fited enor­mously. Recently I was able to iden­tify the ear­li­est ref­er­ences to Prince­to­ni­ans as “tigers,” which had been guess­work pre­vi­ously. It turns out we were wrong by a decade.

This has been an inter­na­tional project, with the news­pa­pers sent from Prince­ton to Brechin Imag­ing in Canada, where TIFF images are gen­er­ated using high end Ger­man cam­eras. The files are then sent via a hard drive to Cam­bo­dia, where Dig­i­tal Divide Data ana­lyzes the struc­ture of each page and uses an opti­cal char­ac­ter recog­ni­tion (OCR) pro­gram to derive machine-readable text, which allows for key­word search­ing. The hard drive is then shipped to Austin, Texas, where the US office of New Zealand com­pany DL Con­sult­ing loads the data into a content-management sys­tem called Verid­ian, which sup­ports search­ing and brows­ing, online read­ing, arti­cle extrac­tion and print­ing, and other features.

Within the library, many hands have worked for this project’s suc­cess. At Mudd Library, project archivists Dan Bren­nan and then Adri­ane Han­son have over­seen the day-to-day work of the project, man­ag­ing the ship­ment of the news­pa­pers to Brechin, as well as super­vis­ing stu­dents with the qual­ity con­trol phase. Uni­ver­sity Archivist Dan Linke raised the funds from var­i­ous Uni­ver­sity and alumni sources and coor­di­nated the project.

Within the greater Library sys­tem, Cliff Wulf­man, the Library’s Dig­i­tal Ini­tia­tives Coor­di­na­tor, took the lead in writ­ing the Request for Pro­pos­als and then select­ing and coor­di­nat­ing the work with DDD, as well as pro­vid­ing tech­ni­cal assis­tance, sup­port and vision. The Library Sys­tem Office’s Anto­nio Bar­rera designed the front end web page with Phil Menos pro­vid­ing server sup­port, and Deputy Uni­ver­sity Librar­ian and Sys­tems Librar­ian Mar­vin Bielawski allo­cated the funds to acquire the Verid­ian software.

The project employs the METS/ALTO markup stan­dard, the same used by the Library of Congress’s News­pa­per Dig­i­ti­za­tion Project, which means that as soft­ware changes and improves, we will be able to sus­tain this resource for many years to come.

100_0996