Peter May is the British Library’s Digital Preservation Technical Architect
Preservation planning is a long established function in digital preservation. Its purpose is to ensure that digital content can move forwards through time for future users without suffering unacceptable loss, either to intellectual content or functionality. Many different activities support preservation planning, and at the British Library this has included collection profiling, format sustainability assessments, defining digital preservation policy, content sampling, and preservation risk modelling. These activities have led to an excellent understanding of what is needed to preserve our digital content and the risks that are likely to manifest.
Missing from this picture, however, was the ability for us to put this knowledge into practice in an automated manner so that technical risks can be effectively and efficiently mitigated, at scale, and across all the collections. Our approach, formalised in our Integrated Preservation Suite (IPS) project, is our developing solution to this challenge.
IPS is an internally funded project at the British Library that builds upon our preservation activities to develop and enhance the Library’s preservation planning capability, focussed on addressing the technical risks and opportunities specific to the Library’s heterogeneous collections. Through agile development practices, the project is iteratively designing and implementing the technical infrastructure for the Suite as well as populating it with the content required for the infrastructure to work in a business environment.
The Suite comprises several foundation components that integrate together — a knowledge base, a software repository, a policy and planning repository, an execution platform, and an overarching web-based workbench. These are all designed to meet separate but complementary goals, such as the gathering and curation of technical knowledge about formats, or the preservation of institutionally relevant access software.
Preservation Workbench
The Workbench forms the main entry point to IPS, providing a web-based user interface for digital preservation practitioners that currently provides for three main tasks, enabling users to: search for information from the Knowledge Base; curate incoming data into the Knowledge Base; and, create and edit format-based preservation plans. Over time, this functionality will be enhanced as needs dictate, for example to monitor the preservation environment and provide notifications to users about potential technical preservation risks.
Knowledge Base
Fundamental to the success of IPS is the Knowledge Base which provides a graph-based curated store of, initially, technical information about file formats, software, and wider technical environments relevant to the Library's digital collection. Data is gathered from a variety of different sources, such as PRONOM and File-Extensions.org, and added through a curation process to ensure appropriate linkage of data, whilst retaining the provenance of that data. This process, in combination with the underlying data model enables more relational searching, such as asking for which software can be used to migrate one file format to another, or what file formats can this software import? Going forwards, it is envisaged this knowledge base will be enhanced with details from our own file format sustainability research, etc., enabling further advanced queries such as what risks relate to a specific format.
Preservation Software Repository
One base ability IPS aims to deliver is the capability to preserve rendering software for all collection content within our care. Our preservation software repository provides this capacity. Integration of cataloguing information with the Knowledge Base will enable Workbench software search results to indicate whether the software is captured in the repository. When considered within preservation planning, such integration between these components will start to facilitate the selection of suitable software to mitigate risks associated with collection formats.
Policy and Planning Repository
Whilst preservation plans are created within the IPS Workbench, approved versions need to be stored and preserved along with the collection content they relate to, so that it is possible to understand what reasoning and justification went in to the preservation of those objects. Our Policy and Planning Repository is the place where versioned plans can be held and approval workflows can be implemented. On top of this, we are finding that we're using the repository to store increasing amounts of other documentation - file format assessments, collection profiles, even reports from events that we've attended.
Execution Platform
Lastly, developing a preservation plan can require evaluation of different approaches - testing out different format migration software, for example - in order to evidence the appropriate mitigation approach to a risk. Executable scripts, utilising software preserved in the Software Repository and running on a sample of content in the Execution Platform would achieve this. These scripts and results would be referenced and captured in the preservation plan, so that there was a record of evaluations that had taken place.
Where are we at?
From that description you may have judged that IPS is an ambitious project. It is. The majority of our efforts to date have focussed on the Workbench and Knowledge Base, with particular emphasis on the data model, data curation, and the integration of these with the Workbench UI layer. We also have a functional Software Repository (using RODA) and a working Policy and Planning Repository (using Mayan EDMS) installed and accepting their respective content files. For a fuller description, please see our iPRES paper and if you’re attending iPRES this year, come see our demo and talk to us. We’d really like to get your feedback.