DiAGRAM is a probabilistic model of digital preservation risk, and online tool that the archivist can use as part of their preservation planning.

Firstly, some context. Digital archives have a crucial role keeping evidence for future generations – never more important than during and in the aftermath of a global pandemic. Today, the digital archivist is confronted by a sharp increase in the amount of data they need to preserve, against the backdrop of rapid technological change, and budgetary pressures. The challenge for the archivist is to translate their finite resources into maximum benefit, interms of managing and mitigating digital preservation risk.

DIAGRAM Warwick NewWhere should the archivist target their efforts? The answer from experts is that it depends. In this project we have researched and developed ways for archivists to try out different interventions, through DiAGRAM, a digital preservation risk model and interactive online tool.

How does the archivist win the case for investment? DiAGRAM quantifies the benefits of digital preservation. Using techniques developed by the Applied Statistics & Risk Unit (AS&RU) at the University of Warwick, DiAGRAM helps the archivist to assess the impact of multiple related risks and then see the benefit of different interventions. It can also be used to quantify the return on investment for digital preservation actions, in terms of reduced risk.

DiAGRAM was made through the collaboration of various archives and archivists. It combines empirical data with expert judgement into a single model. The IDEA (“Investigate”, “Discuss”, “Estimate” and “Aggregate”) protocol has been used to harness the collective wisdom of many experts who have participated in the project.

DIAGRAM TNA SQUARE LOGO POSITIVEThis project was initiated by The National Archives (UK), who wanted a better method to understand and manage the digital preservation threats to their rapidly growing collection. The practice of treating digital preservation as risk management can be traced back to at least Paul Conway’s 1997 report for CLIR “Preservation in the Digital World”, and then on through Rosenthal et al’s 2005 “Requirements for digital preservation systems: a bottom up approach”; the development of risk models such as the SPOT model and DRAMBORA; and maturity modelling.

However, previous digital preservation risk models have tended to be narrowly focused (for example they might consider only storage). Prior to DiAGRAM, all our risk models have been qualitative - useful but not compelling evidence for a funder who needs to see measureable benefits. Previous models didn’t allow for comparisons of the threats from different types of risk, to see the impact of possible actions, or to prioritise competing actions. Using Bayesian Networks for digital preservation risk modelling overcomes these limitations.

To ensure we had the right skills to develop a useful model, we engaged AS&RU, accessing skills and expertise not typically found in the digital preservation world. AS&RU introduced us to Integrated Decision Support Systems, which (over the top of a Bayesian Network risk model) provide additional resources for scenario planning and trying out different interventions.

As the sector lead for the UK archives sector, The National Archives reached out to the wider archival community for help to develop amore generalised digital preservation risk model for archives. A range of archives, representing a cross-section of the UK’s archives sector, participated in the project, including county record offices, higher education archives and special collections, and a corporate archive.

DIAGRAM TNLHLF Colour Logo English RGB 0 0By widening the pool of archives, we were able to create a model that was more generally applicable. However, we realised we would also need additional funding to support the costs involved. The National Archives bid for and was awarded £93,500 by the National Lottery Heritage Fund (NLHF), in January 2020, setting an important precedent - that digital heritage is eligible for NLHF support in its own right. NLHF funding has enabled the direct involvement of the DPC, to conduct the project evaluation, to assist with workshops, and to introduce the risk model to the wider digital preservation community. The project has also benefitted from two computer science placement students from Monash University,who were spending 2 months at Warwick, and built the initial iterations of the online tool which has brought the project to life.

The partner archives and risk modellers worked together, discussing various risks and the relationships between them, building up the basic Bayesian Network. Then we identified where suitable data sources existed in order to develop the conditional probability tables that drive the model. Where suitable data did not exist, we prepared elicitation questions so that we could use structured expert judgement from a panel of digital archivists. The COVID-19 pandemic gave us the chance to move our events online and broaden participation, including experts from the BFI and Cambridge University Library among others.

With the data gathering complete, we populated the prototype model and presented it at three online workshops. The workshops combined an explanation of the theory behind our approach, the development process, and a chance for archivists to try the model for themselves following a couple of demonstration case studies. Between workshops, we updated the tool based on feedback from the sessions, to improve its usability and provide policy suggestions bespoke to the user’s model.

The workshops have attracted a worldwide audience of archivists. It is important to note that some of our data is UK based, so the model might not be as applicable to archives in other parts of the world in its current form.

DiAGRAM is now substantially complete. The National Archives will use it as a source of evidence for its Spending Review submission. We will be doing further work to improve the user interface until early October. With the tool, archives can get an initial view of their current risk levels, and then compare the effect of different “policies” on their risk levels to help them build a business case for action, and also see which nodes of the network will have the greatest likelihood of reducing their risk. We hope that this model will become part of the suite of tools and resources supported by the National Archives, for the benefit of the whole community, alongside PRONOM and DROID.

Scroll to top