Todd R. Hanneken
St. Mary’s University
Table of Contents
This document is based on a paper presented July 3, 2019 at the
2019 International Meeting of the Society of Biblical Literature
The original conference paper was
written May 21-26, 2019,
and formatted and published online August 2, 2019
Before digital technology most of us had access to manuscripts by way of photographic plates and critical editions. The few who had the privilege of handling the manuscript itself faced their own challenges. Digital technology opens new possibilities for extending and improving the role of manuscripts in our research and teaching. The impact of digitizing manuscripts is unmistakable, but digitizing is not an end in itself. We should think about why we digitize and what we expect a digital surrogate of a manuscript to be able to do. In turn, this leads to consideration of how to digitize and what to do with the digital surrogate. This essay discusses three priorities I consider most essential when digitizing manuscripts for the benefit of research and teaching in biblical literature. I draw from my experience directing the Jubilees Palimpsest Project, which faced challenges in these areas in the pursuit of the study of the only copy of Jubilees in Latin and the only copy anywhere of the Testament (or Assumption) of Moses. I started off with an interest in recovering more text from the manuscript, which dates from the fifth century and was abused to the point of illegibility since. As I went I encountered questions and potential for digitization of manuscripts to do much more than add to critical editions. The first priority is access. One of the core advantages of digital information is that it can be copied and transmitted with no loss and little cost over the Internet or other digital media. But we do not reap the benefit without thinking about standards for interoperability, permissions, discoverability, permanence, and ease of visualization interface. The second priority is comparability to first-hand experience. What can we say to someone who says there is no substitute for first-hand experience? What exactly do we do with a manuscript when we handle it in a reading room? Can digital technology provide something like that experience, or at least help us answer the same scholarly questions? The third consideration is the value of improving on first-hand experience. Even direct access does not guarantee answers to all our questions. Spectral imaging gives us superpowers—the ability to see what the natural human eye cannot. These priorities become more complex when one imagines that a scholar might want to study a manuscript in a way other than reading its text beginning to end.
Before examining the three priorities individually, it will be helpful to consider three examples that illustrate the range of possibilities of what it can mean to digitize a manuscript. Conversations about the benefits and pitfalls of the study of manuscripts by way of digital surrogates often fail to consider the variety of what it can mean to digitize. See most recently, L.W.C. van Lit, O.P., Among Digitized Manuscripts: Philology, Codicology, Paleography in a Digital World (Handbook of Oriental Studies 137; Leiden: Brill, 2020). A variety of views are expressed in Ancient Manuscripts in Digital Culture: Visualisation, Data Mining, Communication (D. Hamidović, et al., eds.; Digital Biblical Studies 3, Leiden: Brill 2019). See especially, L. Lied, “Digitization and Manuscripts as Visual Objects: Reflections from a Media Studies Perspective,” and B. Landau, et al., “‘What No Eye Has Seen’: Using a Digital Microscope to Edit Papyrus Fragments of Early Christian Apocryphal Writings.” See also the September 2018 special issue of Archive Journal, “Digital Medieval Manuscript Cultures,” edited by M. Hanrahan and B. Whearty, and especially the articles by A.S.G. Edwards, “The Digital Archive, Scholarly Enquiry, and the Study of Medieval English Manuscripts,” and A. Prescott and L. Hughes, “Why Do We Digitize? The Case for Slow Digitization,” http://www.archivejournal.net/essays/digital-medieval-manuscript-cultures/. For the first example, consider Figures 1–6, all of which are digital images representing the same page of the Jubilees Palimpsest. Figure 1 is a one-bit image of Ceriani’s edition from 1861. A.M. Ceriani, Fragmenta latina evangelii S. Lucae, parvae genesis et assuptionis Mosis, Baruch, threni et epistola Jeremiae versionis syricae Pauli telensis cum notis et initio prolegomenon in integram ejusdem versionis editionem (Monumenta Sacra et Profana ex Codicibus praesertim Bibliotheca Ambrosiana 1.1; Milan: Typis et impensis Bibliothecae Ambrosinae, 1861). It may be the most legible of all, but gives the least sense of the manuscript as anything other than a text container. Even as a transcription, it is incomplete and questionable. It is highly accessible, thanks to Google Books. My first look at the manuscript was on microfilm, shown in Fig. 2 as an eight-bit digital image. I had access to it only by special request to my dissertation director, James VanderKam. It captures a moment in the conservation history of the manuscript, but is useless for reading the erased text. I saw the manuscript in real life for the first time in 2011, again thanks to inside connections. The Biblioteca Ambrosiana was willing to digitize one bifolio, shown here as Fig. 3. It would look good if not juxtaposed to Fig. 4, which was captured in 2017 with high-precision spectral imaging equipment calibrated for color accuracy. Just because something is digitized in color does not mean the color is accurate. Figure 5 is captured with raking light to show the texture of the folio. Texture sometimes helps us recover the writing, but more importantly comes closest (among still images) to representing what it is like to handle the folio. Figure 6 departs from efforts to accurately represent the artifact in favor of enhancing the text for legibility. Using data captured using light range and resolution beyond human capability, the processed image shows more of the erased text than any of the others. All six images represent the same page, and all are digital. But if we are going to talk about digitizing manuscripts and working with digital manuscripts, we need to be aware that “digital” encompasses a range of possibilities with different advantages and limitations. Any one of them may be better than nothing, and no one of them conveys all that a scholar might want to investigate.
A second example illustrates the implications of spatial resolution, color resolution, and non-natural color enhancements. The most familiar distinction is spatial resolution, which determines whether an image looks pixelated or blurry, as in Fig. 7. Color resolution ranges from binary black and white (Fig. 8), which may be adequate for printed material. A single channel of color, monochrome, is a big improvement (Fig. 9). A second channel of color is a further improvement, but still what we would call “color blind” in humans (Fig. 10). A third channel of color defines “normal” for most of us, especially if it is calibrated for accuracy (Fig. 11). However, normal human vision is quite poor compared to what can be seen by other species and by spectral imaging. Figure 12 shows an enhancement that squeezes the multispectral color range and resolution into what can be seen by the human eye. Ultraviolet appears as blue and infrared appears as red (note the black microfiber background, which does not reflect visible light but does reflect infrared light). Finally, Figure 13 shows a Pseudocolor enhancement which does not attempt to resemble accurate color but rather to highlight in human-visible contrasts the contrasts that processing can identify in multispectral data sets.
A third example illustrates the limitations and possibilities of a variety of methods of digitization for purposes of reading erased text from a palimpsest. If the goal is to read letters on a page, the metric is fairly straightforward. The first four degrees of digitization are clearly inadequate for textual recovery. Figure 14, binary color resolution, and Fig. 16 (two-channel color resolution) would not be seriously proposed as a format for digitization of manuscripts, but they do illustrate that color resolution can have multiple degrees. However, many would consider a color image (Fig. 17) to be the gold standard, or at least normal when speaking of what it means to digitize a manuscript. Notably, by the metric of readability it is no better than the monochrome image (Fig. 15) that some would associate with old-fashioned media such as microfilm or plates in critical editions. Legibility of the text does not become possible until a combination of multispectral and texture imaging are applied. Figures 18 and 20 use Extended Spectrum processing to squeeze in contrasts outside the human visible color spectrum. Figure 21 uses Pseudocolor to highlight in sharp contrast the greatest contrasts computer algorithms can detect in the massive multispectral data set. Figures 19–21 use raking light to show the texture. Texture aids legibility when ink slightly corrodes the surface of parchment and leaves its impact even after the ink itself is removed.
Besides illustrating the range of possibilities of what it can mean to digitize a manuscript, these introductory examples preview the three priorities to be discussed in the following sections. The Jubilees Palimpsest, prior to the Jubilees Palimpsest Project, was not accessible to anyone without a travel budget and academic connections. Capturing digital information and sharing it on the Internet are only the first steps toward making it accessible. The difference between the diffuse and raking images anticipates the concern of approximating first-hand experience. As someone who has handled the Jubilees Palimpsest in real life, I would not hesitate to identify Fig. 5 (the raking image) as the truest to first-hand experience of the manuscript. Figure 4, though impeccably accurate in color, seems cartoonish in obscuring the feel of the manuscript, no less so than Fig. 3 (the conventional color image) seems cartoonish in obscuring the color tones. Finally, the benefits of Extended Spectrum and Pseudocolor anticipate the concern that sometimes realistic is not good enough. It can be beneficial to utilize multispectral capture and processing to surpass the natural limitations of the human eye.
The first priority is access. We want scholars and students to be able to investigate manuscripts even without special connections and privilege. Digital technology makes that possible, but not automatic. We can also learn from the mistakes of past digital projects. Many a website, once impressive, will die alone in its own isolated silo, its data and formats never to be carried over into next-generation systems. There are a couple of points here, and they all revolve around the theme that we can achieve sustainable access through standards for interoperability. Consider that information is lost in human communication if language, genres, and conventions are not held in common. What is new here is the importance of machine readability. If we follow conventions that allow machines to aggregate and index the information that will be important to scholars, most of the other considerations of access will fall into place.
One issue of access is permission. Debates about open access often focus on free or not free. Especially in the past, publishers feared that free access would devalue traditional business models. Protectors of cultural heritage feared that third-party commercial exploitation could devalue their holdings. Overall, institutions are increasingly aware that access to digital surrogates creates value. Having seen pictures of the Mona Lisa or ceiling of the Sistine Chapel does not make people less interested in seeing the originals. A book or journal that only a few can find, read, and cite does not gain value by virtue of obscurity. Considerations of open access should go beyond cost. Requirements to create accounts and agree to tracking or convoluted privacy statements may deter casual discovery, impede interoperability (such as linking directly to resources without going through a designated portal), and raise concerns of principle in the case of publicly funded projects. No less important is the consideration of how standard a license is. A custom license, no matter how generous in content, creates a barrier to human and machine access. A human may get lost in legalese. A computer has no hope of knowing whether data and metadata can be aggregated into a resource unless a standard license is included in the appropriate metadata field. For example, Creative Commons defines a set of standard licenses, not all of which are particularly generous in detailing what a user may do with the content, but they are all standard and recognizable to anyone who works with permissions on a regular basis. When the uniform resource identifier (URI) of that license is included in the metadata of a webpage or IIIF Presentation manifest, for example, machine as well as human aggregators can make the information accessible to users in new ways. Search engines such as Google will index webpages unless asked not to (for example with a “robots.txt” file), while libraries and many databases will do the opposite. Information may not be included in a database or index without explicit permission, preferably following a machine-readable standard.
If present and future aggregators and search indexes have permission to organize information about manuscripts, then discoverability often falls into place. Discoverability is the fundamental question of does anyone know it exists? Can scholars find it? Some searches and filters are easy. Google works well if you know you are looking for Latin Jubilees. The Leuven Database of Ancient Books works well if you know you are looking for palimpsests at the Biblioteca Ambrosiana. A reasonable but harder question is if you are looking for fifth-century manuscripts written in two columns. What if your goal is not to read the manuscript beginning to end but to trace development of scribal practices across centuries and continents? Open standards and linked data will allow information to be used outside of the originating silo to answer questions different from those of the original designers.
Another important part of access is permanence.
Will it be there when you go back to look for it again?
If you copy the address bar from the browser, will that take you back to what you were looking at?
Will any of it make sense to a human who might want to make a modification?
In the case of images of manuscripts, the most important standard is the
International Image Interoperability Framework (IIIF) Image API.
This lets us store the image once on a public repository, and refer to and access any portion and scale we might want.
If I want to collect examples of a particular scribal practice, such as ligatures or nomina sacra,
I can store the IIIF Image URL rather than downloading, modifying, and reposting the image.
The coordinates are understandable by a human with familiarity with the standard and can be easily modified to get a broader or higher-resolution image.
More importantly, we always know where the image came from.
Similarly the IIIF Presentation API describes codices and other large collections of images in meaningful and standard ways.
For more on the International Image Interoperability Framework (IIIF)
and how it can be useful to scholarship and teaching in biblical studies see,
T.R. Hanneken, “International Image Interoperability Framework (IIIF),”
Pre-Publisher Research for Brill’s
Textual History of the Bible.
Accessed January 7, 2020,
Another concern is not just will the information be there when we go back for it, but when our great-grandchildren go back for it. In some fields permanence can mean only across browser sessions. In biblical studies, working with manuscripts that have been around for millennia, we raise harder questions. Will the information be there for future generations? Longevity is partly a function of commitments from stable institutions, such as major universities. If we vision further out, stability is not enough. Information survives if it is copied. It is copied if it continues to be useful, and can be copied. The information in a proprietary format, such as a Word Perfect document, has the best chance of survival if it is preserved in a format such as PDF that is open, standard, and can be easily upgraded to whatever replaces it. The current standard does not need to last forever as long as it can be understood and converted by future aggregators and archives.
The final consideration for access is the ease of use of the visualization interface. Users would rank this highly. However, if the standards for interoperability are done right this is the least of the concerns. No one viewer needs to be a permanent solution, and no one viewer needs to have the perfect balance of power and ease that is right for everyone. Once images and collections are defined with IIIF standards, users can choose between several viewers according to personal preference. Mirador (Fig. 22) stands out as being most actively developed, so there is a greater chance that users will already be familiar with it, and a greater chance that power and ease of use will increase in the future. But if a better viewer comes along later, nothing will have to change about how the information is stored.
The second priority for digitally-enabled research and teaching is that digital surrogates of manuscripts should be able to help us answer the questions we could answer with first-hand experience. When people say there is no substitute for first-hand experience, they don’t mean that the text is more legible, certainly not more than a critical edition. My point is not to argue that a digital experience will ever be the same as an unmediated experience. It may be better in some ways, and worse in others, but always different. We can examine the differences and consider how many of them can be addressed. An ineffable value of unmediated experience will keep museums and libraries in business. It may be helpful to think of the analogy of musical performance. No one will equate having seen a musician perform live with having heard a very good recording of that performance. If an essential part of a concert is the shared experience with friends and strangers, could social media play a role? If the visuals are an essential part, could video play a role? If the freedom to look around is an essential part, could virtual reality play a role? Concert goers (and concert venues) have found value in holographic projections of deceased performers. Any such supplements or enhancements do not devalue the live performance and the effable value of the experience thereof. The question is not whether concert venues (or cultural heritage institutions) can survive digital access and enhancement. The question is whether they can survive without it. To imagine the optimal digital surrogate, we can focus on the scholarly questions that can be answered with first-hand experience and work from there.
One way to think about this is to picture what we do in a reading room. There is a lot of movement involved. We step back to grasp the overall structure, but then we move closer to examine detail. This means the spatial resolution has to be high enough to allow zoom without pixilation. We also move the object, or our heads, or the light. This is partly our way of correcting for abnormalities in the light that create artificial hues or other false artifacts. By holding a page up to the light we can see holes and scoring lines made to guide the scribe. We can see thin spots and even holes where the ink corroded the parchment. From movement we perceive specularity, how shiny the surface is. Most of all we perceive texture and depth. We track the distorted shape of a letter over distortions in the parchment. We note the hair and flesh sides of the parchment, especially when codicological reconstruction is necessary. We try to read dry-point notation, the hidden notes scribes and readers made to themselves that could only ever be seen from texture.
To these examples from parchment manuscripts, many more could be added when studying objects for which texture was the primary conveyor of meaning. Inscriptions, coins, and cuneiform are major examples from writing, and brush strokes and other techniques can be important in arts and crafts. A diffuse light photograph of an inscription is completely illegible. A good photographer can find the right angle of illumination to make the intended feature visible. If no one angle of light makes all the features visible, or if the photographer cannot anticipate every feature a scholar would want to examine, it becomes important to photograph multiple angles of illumination. The next step is to capture a complete set of angles of illumination. With all this information the data can be processed to record the texture and specularity of each pixel as a function of the light position. This method, called Reflectance Transformation Imaging, can extrapolate light positions and enhance the specularity of the surface.
More importantly, the image becomes interactive, as the user, even through a web browser, can move a virtual light around the object. This brings us back to our image of a scholar examining a manuscript in a reading room and the importance of movement. No one still image is adequate, and even a sum of still images falls short of experience of motion and interactivity in real time. With interactive texture, we can get a feel for the condition of the folio and answer questions as basic as distinguishing a trace of ink even with the surface from a hole or accretion above the surface. For a student who might not have had the privilege of first-hand experience, it can be moving as well as informative.
The third priority picks up where the second priority leaves off. Sometimes first-hand experience is not enough to tell us everything we might like to know. Our memories are notoriously unreliable when we try to play back in our minds what we saw. Memory is also insufficient for persuading others. Even in the moment, our ability to see what we’re looking at is remarkably limited. Our low color resolution means that browns can be hard to distinguish. This is particularly a problem when we look at brownish parchment with brownish erased ink and brownish secondary ink and brownish reagent. There is much more to light and color than what humans can see naturally. Mantis shrimp, for example, have a much greater range and resolution of color receptivity. Note that the existence of color receptors may not simply equate to “perception” as humans understand it. See B.R. Cronin and N.J. Marshall, “A Retina with at least Ten Spectral Types of Photoreceptors in Mantis Shrimp.” Nature 339 (1989) pp. 137–40 doi: 10.1038/339137a0; H.H. Thoen et al., “A Different Form of Color Vision in Mantis Shrimp.” Science 343 (2014) pp. 411–13 doi: 10.1126/science.1245824; Z. Qasim et al., “Evolution of Neural Computations: Mantis Shrimp and Human Color Decoding.” i-Perception 5 (2014) pp. 492–96 doi: 10.1068/i0662sas. Every color we see is a combination from three color receptors in the eye. When we look at a rainbow we see seven bands. These bands are to color resolution what pixilation is to spatial resolution. With more receptors we would see a smooth gradient from violet to red. We call people color blind if they have only two kinds of color receptors. Shrimp could call us color blind. More usefully, a multispectral imaging system could call a shrimp color blind, resolving fourteen discrete wavelengths.
Besides color resolution, our vision is also limited in color range. We do not see wavelengths shorter than violet, called ultraviolet, or longer than red, called infrared. Infrared is especially good at distinguishing organic and inorganic pigments. Ultraviolet is also useful because, in addition to reflecting off parchment, it also causes materials to fluoresce. Fluorescence is commonly experienced as the glow of shoelaces in what appears to be a dark room with a blacklight. We have known about the usefulness of infrared and ultraviolet for studying manuscripts long before digital imaging. A film camera can be modified to capture a monochrome photo showing infrared as white. A blacklight in a reading room can allow the eye to see the fluorescence, though not the reflected ultraviolet.
What is new with digital multispectral imaging is that for each pixel in each capture the light coming from that spot is given a numerical value. Digital multispectral imaging was pioneered most substantially by the Archimedes Palimpsest Project. R.L. Easton and W. Noel, “Infinite Possibilities: Ten Years of Study of the Archimedes Palimpsest.” Proceedings of the American Philosophical Society 154 (2010) pp. 50–76. As long as the object and camera remain motionless, many captures can be acquired and the pixel will represent the same spot on the object. Multispectral imaging today might capture fifty images under different conditions. We measure how much light passes through the object at four different wavelengths. We measure how much visible light reflects at ten wavelengths. We measure how much invisible light (ultraviolet and infrared) reflects at six wavelengths. We measure how much light fluoresces in each of six color ranges, stimulated by five wavelengths of ultraviolet. Multispectral imaging doesn’t just take a nice picture, it captures a huge amount of digital data.
Sometimes one of the captures is useful, the way on old film infrared photo could be useful. More often it takes additional processing to find meaningful contrasts in that huge collection of data. Fortunately, this is what computers are good at. For each pixel they can look at all the numbers captured under fifty or more conditions and clearly distinguish the brown with one spectral fingerprint from the brown with another spectral fingerprint. The images produced are not intended to represent true color, but to render contrasts invisible to the human eye in contrasts that are visible to the human eye. For a complex folio there may be more contrasts to show than can be shown at once. We should not expect multispectral imaging to produce one definitive image that will show everything one might wish to see. This is especially true if we think of manuscripts as more than just text containers. For this reason it is important for viewers to be able to flicker or fade between many color renderings.
These three priorities are all important to consider when thinking about what it means to digitize manuscripts in a way that will satisfy the needs of scholars today and well into the future. Multispectral data—beyond simple pictures of what the human eye can see—are necessary for erased or damaged manuscripts, or to study non-textual features of scribal culture. Texture and interactivity are necessary to provide conservators, researchers, and students with a sense of the physicality of the artifact as more than a text container. The images will not have the appropriate impact on scholarship if they cannot be accessed, discovered, and studied in a helpful viewer, both today and well into the future.
If we were to consider objects other than manuscripts, we would also have to include structure. A folio can be reasonably reduced to recto and verso with texture but not deep dimensionality on each side. For objects with many sides, or not simply reducible to sides, technologies such as laser-scanning and photogrammetry can capture truly three-dimensional structure. Full 3D imaging includes both boundary structure of an object and the texture and specularity of each surface on the structure. Thus RTI and laser-scanning are not competing technologies, but complementary aspects of optimal three-dimensional visualization. While work remains to be done on integrating texture imaging with structure modeling, the three priorities discussed above are all fully compatible and integrated today. For the most promising work on the full integration of 3D structure models and surface texture captured from RTI, see B. Endres, Digitizing Medieval Manuscripts: The St. Chad Gospels, Materiality, Recoveries, and Representation in 2D and 3D (Medieval Media Cultures. Leeds: Arc Humanities Press, 2019), and follow @BillEndres on Twitter. With support from the National Endowment for the Humanities, my own project developed and made public tools for combining Reflectance Transformation Imaging and multispectral imaging. The software also outputs formats conducive to publication using IIIF standards for interoperability and WebRTI for interactivity in any web browser. T.R. Hanneken, “The Jubilees Palimpsest Project: Pioneering the Recovery of Illegible Text from Ancient Manuscripts Through New Tools in Digital Archaeology,” The Jubilees Palimpsest Project, Accessed January 7, 2020, http://jubilees.stmarytx.edu/; T.R. Hanneken, “Guide to Creating Spectral RTI Images,” The Jubilees Palimpsest Project, Accessed January 7, 2020, http://jubilees.stmarytx.edu/spectralrtiguide/; T.R. Hanneken and B. Haberberger, “SpectralRTI_Toolkit,” GitHub, Accessed October 26, 2019, https://github.com/thanneken/SpectralRTI_Toolkit/.
One last consideration is cost. To date, mostly known high-value and mysterious objects have been imaged with the full set of multispectral and texture imaging. The only real solution to the problem of cost, in my opinion, is to increase efficiency and economy of scale. Once spectral imaging moves from a niche to the standard for conservation-quality imaging, the cost will decrease dramatically. One might think that many objects do not need texture imaging or do not need multispectral imaging, but that list shrinks if we imagine that other researchers may bring different questions to the artifact, particularly questions other than reading the main text. In many cases we don’t know what we can’t see until we try advanced imaging. Dry-point notation and erased text have appeared where not expected. One might say that anything is better than nothing, but it could do harm if scholarship is done using a digital surrogate that fails to provide the necessary information. Work is also being done on inexpensive options that may not match the quality of a high-end system, but could be used to test many objects to determine candidates for further imaging.
Manuscripts and the literature they preserve are only one aspect of a complete study of the ancient world. I do not believe, however, that the literary record and study of scribal cultures has been exhausted by the critical editions and digitized collections available today. The study of the most primary of primary sources in biblical literature, the manuscripts themselves, is very different today than twenty years ago. We can expect, or at least hope, that it will be even more different twenty years from now.