For the past 3 years, I have worked on several digitization projects in two special collections libraries. The two libraries take vastly different approaches to book digitization especially; these differences reflect of course large differences in their projects’ funding, and differences in workplace cultures, but I also believe that they express a difference in the philosophy and purpose of digitization. Beginning with these examples, I’d like to explore them more deeply and apply to them the framework laid out in “W(h)ither Preservation,” an essay by Michele Cloonan.
The first library I worked for wanted books scanned as whole objects–scans were of spreads of pages, blank pages were included, spines were scanned, images were cropped to show a sizable margin around the edge of the book pages. I remember being trained on the book scanner, and my manager expressing her desire for future people who had never seen a book before to be able to understand what they were like from these scans.
The library where I work now scans books with the philosophy of many other libraries, a style best known on Google Books: left and right pages are separated into separate images, spines and blank pages are not scanned, and the scans are cropped close to the text on the page, without showing the edge of the book or much of the page’s margins.
“When is the object part of the information?” asked Michele Cloonan in her essay, “W(h)ither Preservation?” invoking the assumed divide between books content, and the books form. These two libraries different approaches reflect different answers to this question. Neither library displayed digital images with high enough quality for a viewer to deduce the type of paper used, or binding stitch of the book; both also depicted books as flat surfaces (not employing 3D scanning, obviously). What is information to the first library–the structure, wholeness, and look of a book–was not necessarily important enough information to be digitized for the second library.
Still, the choices made by most library staff in book scanning enable books to become text-searchable by Optical Character Recognition (OCR), which reads the printed text on a scanned book and transforms it into searchable text into the copy. OCR cannot currently process handwritten text, nor can it really distinguish between erroneous marks and what is supposed to be text; edges of books, stitching, and other high contrast areas can confuse it, and thus they’re best cropped out when OCR is employed.
At the end of the day, in the case of most digital libraries scanning books, one ends up with an a nice, neat, OCR’d PDF that bears little resemblance to the book that has been scanned, but looks tidy and readable (on the high-quality monitor used for image editing). Information in this case means simply text, textual content, and illustrations. There are almost an infinite number of ways that a PDF is not like a book, and cannot reproduce all of the information a book holds.
Walter Benjamin explores the way that photography and film can “put the copy of the original in situations which would be out of reach for the original itself,” in his essay “Art in the Age of Mechanical Reproduction”. This is the intrinsic idea of book digitization, of course. Creating text-searchable documents out of standard books that can be sent and shared in the blink of an eye are features that are totally beyond the capabilities of physical books, and that speed and search power is now the standard by which information is shared.
“By making many reproductions it substitutes a plurality of copies for a unique existence. And in permitting the reproduction to meet the beholder and listener in his own particular situation, it reactivates the object reproduced. These two processes lead to a tremendous shattering of tradition which is the obverse of the contemporary crisis in the renewal of mankind.” Benjamin’s observations on the analog photography of artworks is just as relevant to the digital photography of books: scanned books no longer look or act like unique physical books. They can be text searchable, and can potentially exist everywhere at once, but contain little reference to their former form (bound paper).
“Digital documents force us to preserve them on their own terms,” writes Cloonan, and we can perhaps begin to see digital preservation as something other than the sequestering of digital files for an imagined future where the original physical books don’t exist. I wonder then if we can begin to think of scanned books in a digital library as a kind of hybrid information form, transforming the content of analog books into a powerfully sharable and searchable digital form. It is perhaps wisest to operate under the assumption that digital files are just as perishable as physical books, if not more, as Cloonan and others point out. It’s too much to assume that these digital files will continue be usable and accessible by the time most books have perished, and be used as the only source of their implicit and explicit information, as my first manager imagined.
Cloonan, M. V. (2001). “W(h)ither preservation?” The Library Quarterly 71(2): 231–242.
Benjamin, W. (2005 ). “The work of art in the age of mechanical reproduction,” trans. Andy Blunden. https://www.marxists.org/reference/subject/ philosophy/works/ge/benjamin.htm.
Latest posts by alchomet (see all)
- Visiting the Bruce Springsteen Special Collection and Archive - December 17, 2015
- Libraries in the Age of Digital Reproduction - October 30, 2015
- Against the Queer Intervention: some thoughts on Drabinski and Berman - October 1, 2015