Lately I’ve been working on a slightly different library project than usual, and I’ve learned some important things about digital libraries, my job, and myself. (That sounds a bit like those revelations celebrities always make on the covers of women’s magazines. I always thought it strange that the celebrities learn everything in threes.)
A philosophy professor here wanted the library to acquire and digitize a copy of Histoire Naturelle de Mre. Francois Bacon, a seventeenth-century French translation of Bacon’s Sylva Sylvarum, published as far as I can tell just a few years after the English version. The library did manage to acquire a copy, which makes us one of the few libraries in the country to have one. I know you’re jealous. The finding and purchasing was relatively easy. I was pleasantly surprised that the book was under a thousand dollars considering its relative scarcity and decent condition. However, I wasn’t sure about the digitization because I hadn’t worked with the digital projects people before. Partly to get the project going, and partly to learn something new, I agreed to help out with the project if it was accepted into the queue. The project was swiftly approved, and within a couple of weeks of receipt the book was digitized (with excellent images).
I’m hesitant to admit my previous ignorance of what goes on behind the scenes of these digital projects, but I had little idea. I figured it was more than scanning pages and loading the images on the web, but that’s about it. I use these projects all the time, but hadn’t thought much of their creation, much like I’m happy to use the catalog but glad someone else does the cataloging. This may be the only project I’ve followed from selection to the very end, which I hope is near. I watched the digital photographer photographing some of the images. I watched the head rare books cataloger do some minor tweaking of the MARC record.
Then came my part, the METS record, which according to my favorite easily accessible encyclopedia–the Wikipedia–stands for “Metadata Encoding and Transmission Standard” and “is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium.” That sounds about like what the more knowledgeable programmers and catalogers told me, so I’ll stick with that. My task in this project was to help create the METS record, which among other things creates the points of entry into the digital document. You probably knew that already. I had no idea. I don’t know what I’ve been doing with myself the past few years. If you go to this project and click on the drop down box that marked SHOW, you’ll see one thing the METS record does.
With 628 images, I suspected the METS record might be a little tedious to create. I also knew that it was in XML, which I have no experience with. I foolishly thought there would be some sort of editor program to help with this. I don’t mean an XML editor, because I had one of those. I mean something more like Dreamweaver, that creates html and css and all that good stuff without having to hand code it. I wanted a Dreamweaver for my METS record. Imagine my shock when I was told there wasn’t one, that some people have been handcoding parts of these projects. I’m always more or less happy I’m not a programmer or a cataloger, but this confirmed my feelings.
Fortunately, another of our catalogers has created a macro that translates an Excel spreadsheet into a proper XML METS record. Though initially thinking this project might be a good way to learn XML, I was counseled by wiser heads to use the spreadsheet method instead. Most of it was me going through the images and creating points of entry on chapters or subchapters, labeling them, matching page and image numbers, etc. Very detailed work. I finished the spreadsheet was told it looked pretty good, and am waiting to hear if it made its miraculous metamorphoses into a workable METS record.
What I learned about digital projects is how complicated they are and how much work goes into even the simplest one. Some of my colleagues have criticized the rate at which the digital projects have been moving, but after seeing how much had to be done, and how many people had to work together just to digitize one book properly, I understand why it’s not the same as scanning and loading onto a website, which some people seem to think. The whole project has given me a better understanding of the behind the scenes work of a lot of my hardworking colleagues.
I can’t say I learned anything new about my job, but I needed a third insight to sound like a celebrity. Still, it was reinforced for me how dependent we all are on other people doing their work well. When I’m doing it, it seems to me that most of my work is independent. I don’t often work in teams, and I can do a lot of my job without interacting with other librarians or being in a particular location. In any given week, I’m much more likely to interact with a professor or student than with a colleague. I like the autonomy, but my autonomy depends on all the teams working behind the scenes making sure that when I click a button ordering a book, that book later shows up on a shelf, or even in a digital project.
What I learned about myself is that I’m still glad I’m not a cataloger or programmer, but now I have more concrete reasons. I’m just not cut out for that detailed work. I’m an end user and proud of it, but I’m even more thankful for the detail-oriented people in the background making all this stuff work for me.
Clay used to quote someone else in the digital library world (I can never remember who though) by saying “digital libraries are the Flintstones, not the Jetsons.” It does feel at times like we are working with stone tools. Slowly, though, things are getting better. Thanks for the interesting post!
Hi Wayne,
Great post. There’s so much behind-the-scenes work that goes on for digital library production over and beyond traditional web authoring that it’s like comparing apples and oranges. Or better yet, mountains to mole hills. Your post helps illustrate that. Given the small staff size there working on digital collections, the problem is all that much greater.
Common logic for most non-techie librarians at any institution is that digital library production is somehow automated at an even higher level than even traditional cataloging workflows. Most seem to have visions of sci-fi movies and whiz bang programming going on behind the scenes, etc. That couldn’t be further from the truth. The tools available to most digital libraries amount to little more than glorified text editors on many occasions. My motto to those who have these pre-conceived notions is that digital libraries are The Flintstones, not The Jetsons (ibid with a nod to Kevin). We’re in the stone age in terms of having to create our own tools to do this properly. Often, custom workflow and processing tools have to be made for each project.
When the digital staff only has time to work on pushing out highly-demanded content, the ability to create tools to make this workflow simpler is greatly hindered. It’s great that some other folks are starting to take a bit of responsibility to help the digital collections team churn out projects, be it data entry, tool creation, etc.
Best to you and your family!
Clay
I meant to give credit where credit is due regarding that Flintstones quote. That can be attributed back to Michael Pelikan at Penn State, as heard at a recent(ish) DLF forum. — Clay
Wayne:
Thanks! It’s nice to see someone in another department investing some time and thought into appreciating all of the effort that goes into a digital collections resource — and helping us out with a METS file along the way.
I’m sure it’s obvious to you now why a DreamWeaver-like tool for METS doesn’t (yet) exist, but let me hold forth: Dreamweaver, on a good day, writes HTML and CSS that works when your browser looks at it. More than likely that code behind the scenes is poorly structured, semantically awkward, and may or may not validate against the Schema it references. Those first two problems (structure and semantics) are among the primary roles a METS file plays in a digital resource. If our goal is to represent and preserve the analog resource as accurately as possible, and taking into consideration the fact that the other resources that interpret the METS are much less forgiving than your average browser (which will interpret almost anything), there’s much less room for error. We’ll get there, I’m sure(!), but I hope when we do that our METS files don’t suffer as HTML has under DreamWeaver.
I really do appreciate your post — it’s heartening to hear that digital collections’ work isn’t going by unnoticed or misunderstood. Thanks,
–Jon