The Cairo Geniza is a collection of an estimated 750,000 manuscript pages found discarded for “burial” in the Geniza chamber of the Ben Ezra Synagogue in Cairo in the late 19th century. In addition to holding religious poems and fragments of Torah scrolls, the Cairo Geniza contains approximately 15,000 mundane papers that reflect the daily life of the Jewish community in Cairo during the medieval period (mainly in the 11th to 13th centuries) – letters, contracts, wills, and other legal documents preserved in the area’s arid climate. These “Geniza documents” range in size from a few words to long letters of 80-100 lines.
For more than two decades, Mark Cohen and his Princeton colleagues have been working to bring these ancient papers into the digital age. Their work, called the Princeton Geniza Project, has created the world’s only online, searchable-text database of the Cairo Geniza’s historical documents.
At the April 29 Lunch ‘n Learn seminar, Mark Cohen, Professor of Near Eastern Studies, discussed the background and challenges of the project. In 1986, Cohen and his colleague in the Near Eastern Studies department, Avrom Udovitch, proposed the creation of a computerized database of Geniza documents. IBM (through its Princeton Pegasus Project) and Princeton University’s Near Eastern Studies department supported the effort, and in the past 20 years, with help from technology upgrades and recent grants from the Friedberg Genizah Project and the University, the database has grown to include more than 4,000 documents (as much as a quarter of the historical Geniza), available online and searchable in Hebrew and Arabic script or English keywords. The database used for this purpose, called TextGarden, contains transcriptions of Judeo-Arabic, Hebrew, and Arabic documents in XML format and allows for the storage of not only the transcriptions themselves, but also of the images, genres, news stories, essays, locations, and people involved with these documents.
The project has transcribed documents from film copies, photocopies, draft texts typed by S. D. Goitein, and printed editions, creating a full text retrieval text-base of transcribed documents. The project has developed new tools such as dictionaries, semantic categories, and morphological aids to aid the study of Geniza texts. The project disseminates its materials freely through the web to the international community of scholars who have an interest in the life of the medieval Middle East, as well as to all with an interest in Judaica. Ultimately, the project hopes to link digitized images of manuscripts in the corpus, as libraries pursue the imaging of their collections.
The fragments from the Cairo Genizah are dispersed in more than two dozen libraries worldwide. The three most important collections are in Cambridge, St. Petersburg, Russia, and in New York. Princeton’s Near Eastern Studies Department has copies and microfilms of most of the historical fragments. The virtual environment of the Geniza provides provides obvious advantages. Scholars gain much easier access to the manuscripts and have a honed index. But one advantage has been especially exciting. Cohen and other scholars have been able to reunite fragments of the manuscripts, restoring their integrity for future academic research.
The current edition of the Geniza database, and the TextGarden web application that hosts it, was developed in 2005 by Rafael Alvarado, then Manager of Humanities Computing Research Applications at Princeton (now Director of Academic Technology Services at Dickinson College). It replaced and incorporated the original browser developed by Peter Batke in the late 1990s.
Ben Johnston from Princeton’s Humanities Resource Center, who has maintained the TextGarden database since 2006, spoke about the TextGarden database. Use of Unicode on web pages permits the project to archive the documents and transcriptions on the same page, even when both Hebrew and Arabic appear on the same document. The TextGarden database permits scholars to search for words and phrases and to explore often complex interrelationships among the documents within the collection.
A podcast and the presentation are available.