Karen Hanson is Senior Research Developer for Portico


The last decade or so has seen the emergence of a new kind of scholarly work - the enhanced digital monograph. While still recognizable as monographs, these resources include a variety of dynamic features that cannot be replicated in print format. These works represent a leap forward for scholarship, but their formats, use of dynamic features, and composite nature present complex preservation challenges. 

To help address these challenges, a new collaborative project funded by the Andrew W. Mellon Foundation partners preservation institutions, libraries, and university presses that are producing enhanced monographs. The goal is to examine what aspects of these works can be preserved at scale, and produce guidelines to improve their preservability that publishers and authors can use while creating these works.

The story of digital monographs (it’s a real page turner)

It’s fair to say that scholarly monographs have made the shift to digital in that anything published today will likely have a digital distribution option. Though e-book reader devices add features that are not possible to experience in the paper copy, the digital version is still typically a proxy of the print copy. It is composed of text and images that you can navigate linearly by flipping through the pages.

As the e-book format has matured, there have been efforts to create works that are not mere print proxies. These are instead more interactive, take advantage of the technology available, and engage with readers in different ways. This is best explained through some examples:

From these you might note that while most are recognizable as e-books, the digital medium provides options for a variety of advanced features that could not be visible in a print version. 

The project

Many of the examples above were developed with support from the Andrew W. Mellon Foundation as part of its digital monograph initiative [1]. This initiative funded a variety of efforts to build a next generation of innovative digital monographs, as well as the infrastructure to support their creation and management. Preservation was identified as a necessary part of that infrastructure. This led the foundation to fund a new project focusing on the preservation of emerging forms of scholarship.  

The new grant, entitled Enhancing Services to Preserve New Forms of Scholarship, was awarded to NYU Libraries in April 2019 [2]. The project connects preservation service institutions with libraries and university presses that are producing enhanced monographs. The presses in the examples listed above are all participants on the project. 

With NYU Libraries coordinating the effort, Portico and CLOCKSS will attempt to preserve a variety of enhanced works provided by the university presses. The goal is to:

  • examine what can be preserved at scale using existing tools and based on current research
  • produce a set of guidelines for authors and publishers to use to make these enhanced works more preservable from the time of creation

Preservation hat

If you reviewed some of the examples of enhanced monographs, you may have observed that they are quite different from a standard monograph. Switching to a preservation hat, below are some of the challenges that stand out to me in these examples:

  • Their composite nature means a separate preservation strategy may be required for each component part as well as the work as a whole
  • Many are experimental, meaning they may use combinations of novel technology that might not be easily supported or available in the future - a specific content management system or third party JavaScript library, for example. 
  • They may depend on live web services for the experience to be complete e.g. annotation services. Some of these services may not be controlled by the author or publisher e.g. SUP’s The Chinese Deathscape, which utilizes the ArcGIS map API. 
  • They may utilize a variety of formats for supplementary or embedded material, ranging from streamed multimedia, custom-built executables, to data in any format. 
  • The more complex or innovative the work, the more it may be prone to degrade quickly in the years following publication. Over time, links might be broken or point to modified content [3], functionality might be lost [4] and require significant effort to repair, and security issues might become critical [5]. This makes early preservation important.

Clearly these challenges are different from the usual concerns when preserving a standard monograph. They are more akin to the concerns of web archivists and those working with digital humanities projects -- which perhaps provide better frameworks for conceptualizing these as artifacts for preservation. 

Current Portico e-book workflow: If it’s BITS it fits!

To put these challenges in the context of Portico’s current workflow, which has processed over 900,000 e-books to date, I’ve described our current process below.

Most e-books are provided to Portico as batches of XML with one or more PDFs, figure graphics, and supplementary files. These are validated using JHOVE. Any book XML is normalized to BITS (Book Interchange Tag Suite) -  an extension to the JATS XML format that can handle the structural aspects of a book. There are additional checks to ensure that all supporting files such as embedded images have been provided. In the event that the publisher version is no longer available at some point in the future, books will be made available for access to end users through the Portico website. Few publishers supply EPUB files for books. Where they do, they are secondary to PDFs or XML. 

The Preserving New Forms project gives us an opportunity to broaden our support for e-books by experimenting with new ways to collect, process, and provide access (in the event the original is no longer available) to more dynamic materials. 

Technologies

The team has identified several formats that are used for enhanced monographs among the participating presses: EPUB3, HTML5, and web publications. Works to be assessed for preservation will include one or more of the features showcased in the earlier examples (embedded multimedia, supplemental data, dynamic visualizations etc.)

The goal of the project is not to build new technologies to support preservation. It is to evaluate tools that currently exist, to identify where these can be integrated into current workflows to create scalable solutions, and to identify gaps in our preservation capacities. 

Fortunately, there are a variety of emerging and established preservation technologies that may offer solutions. Our explorations will include a variety of web archiving technologies including Rhizome’s Webrecorder and some of the related projects. We will also look at emulation in cases where a specific software environment is necessary for accurate playback. For this we will consult with the Emulation as a Service Infrastructure (EaaSI) team at Yale University. 

Though creating new tools is not a focus of the project, there may be opportunities to enhance existing ones. For example, Portico is currently working on a new EPUB validation module for JHOVE, built around the EPUBCheck tool. We will be working with the Open Preservation Foundation to make this available in a future release.

Where limitations in support for enhanced monographs are revealed, these will be included in the guidelines document - a key objective of the project.

Finally

The Preserving New Forms project cuts across a wide range of preservation work. We’re excited to spend the next 15 months with our library, publisher, and preservation partners, digging into some of the established and cutting edge tools used by the preservation community to see how they play with our workflows. In my early research, I’ve already seen some really, really, neat things and I’m looking forward to continuing this exploration!

With a steep climb ahead of us, we very much welcome your feedback and look forward to sharing the results. Watch this space!


[1] Waters D. Monograph Publishing in the Digital Age (2016). https://mellon.org/resources/shared-experiences-blog/monograph-publishing-digital-age/

[2] NYU Receives Major Grant from The Andrew W. Mellon Foundation; Collaborative Effort Aims to Meet the Challenge of Preserving New Forms of Digital Scholarship (April, 2019)  https://www.nyu.edu/about/news-publications/news/2019/april/nyu-receives-major-grant-from-the-andrew-w--mellon-foundation--c.html

[3] Jones SM, Van de Sompel H, Shankar H, Klein M, Tobin R, Grover C (2016) Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content. PLoS ONE 11(12): e0167475. https://doi.org/10.1371/journal.pone.0167475

[4] Sites that use 3rd party web services are especially vulnerable, since these can be subject to change over time. An example can be seen in this announcement of the deprecation of the Google Earth API but there are many such examples that could affect the functioning of a website and result in the need for significant code rewrites.

[5] Smithies J, Westling C, Sichani A, Mellen P, Ciula A (2019) Managing 100 Digital Humanities Projects: Digital Scholarship & Archiving in King’s Digital Lab. Digital Humanities Quarterly 13(1). http://www.digitalhumanities.org/dhq/vol/13/1/000411/000411.html


Scroll to top