Jenny Mitcham

Last updated on 29 July 2020

Last week the DPC held an online briefing day on the topic of Preserving Semi-current Records and we heard from a range of speakers who were all facing or considering the challenge of preserving semi-current or semi-active records.

Defining what a semi-active or semi-current record actually is was one of the first challenges of the day and perhaps differs in different contexts and disciplines. Kevin Ashley from the Digital Curation Centre described them as ‘the undead’ - records that are still with us but not quite alive. There is an implication of less use and perhaps greater neglect. 

If a record is complete and finished with, it can be neatly packed off to a suitable archive for long term preservation and ongoing access as appropriate, but for semi-active records there is a debate around when it should go to an archive and how long it can be effectively managed outside of that archive. Complex situations emerge for digital records (that of course can live in more than one place at once) where it is both submitted to an archive and remains in situ, perhaps being updated and added to over time.

Kevin gave us an example of a curated scientific database as a semi-current record. There may be an ongoing piece of work to maintain and update a database over time, but snapshots of the data still need to be archived. Users of the dataset need to be able to cite it at a particular point in time in order that others can verify their research. A nice example of this kind of workflow was later presented by Tim Evans of the Archaeology Data Service.

Gordon Reid from the Nuclear Decommissioning Authority and Andy Harris from Radioactive Waste Management presented the first case study of the day, describing the radioactive waste package records they are charged with managing for the long term. These records consist of collated and structured data relating to each individual waste package. They contain vital information relevant to the future handling and storage of the packages and continue to be added to over time as checks and inspections are carried out and and logged. As the creation of radioactive waste packages has been in progress for many years now (and will continue for many more), the information is held in a diverse range of formats (sometimes as physical records that need to be digitised). Andy described their approach to ‘information informed design’ and discussed the competing approaches of perfectionism and conservatism. A key challenge now is to ensure that the records they are creating today are designed in such a way that they can be adequately used in the future and with a high level of trust and confidence.

Bryony Leventhall from the Bank of England described semi-current records as being records that are complete and can be accessed but not edited or deleted. She described the processes currently in place at the Bank of England to manage records within their EDRMS (in the absence of a digital preservation system). She talked us through their records management systems and processes, pointing out that these are so effective that they have to some extent hindered their ability to move forward with plans to establish a digital archive. The EDRMS has however helped them prepare for the implementation of more robust digital preservation processes in the future, in particular through records classification and disposal tiers which will allow preservation priorities to be highlighted more efficiently.

The last two case studies of the day came from Jaana Pinnick from the British Geological Survey and its National Geoscience Data Centre and Tim Evans from the Archaeology Data Service. Jaana gave an interesting overview of the challenges of preserving geoscience data (which has very long validity as geology changes very slowly). In fact, in both of these disciplines, ‘old’ data frequently gets reused and repurposed and sometimes reinterpreted, which can lead to challenges as data changes or is resubmitted over time. Tim gave us some great examples of this, including a dendrochonology database ( which was first archived with the ADS in 2000 but is still being updated 20 years on. Tim talked about how the ADS manages content that is resubmitted over time, taking snapshots of the data and maintaining DOIs and previous versions as appropriate.

Tim Gollins of the National Records of Scotland rounded up the day with some reflections about ongoing work of the DPC’s EDRMS Preservation Task Force (more about this in another blog post soon) and a simple framework to help us understand some of these challenges. 

The framework presented some of the different scenarios around semi-active records with rows and columns to define how ‘active’ they really were, and the length of time they needed to be kept. A recommended approach was suggested for each scenario, ranging from maintaining in the current active records system, transferring to a digital archive or maintaining a copy in both locations. 

What came out of this quite strongly for me was the realisation that we don’t need to try and solve all of the problems at once, especially if long time periods are thrown into the mix. If something needs to be kept forever, it is not necessary for us to have all the answers, we only need focus our efforts on handing it over to the next generation (who will no doubt make further decisions about how best to preserve the content over time). 

Discussions in the panel session at the end of the day delved further into some of the issues raised during the day, reflecting on the framework that Tim Gollins presented, talking about the costs of inaction and also whether emulation has a place in this space. There was some discussion about whether the fact that semi-current records are still being occasionally used offers them some level of protection against obsolescence and whether transformations and migrations that happen outside of the digital archive should be documented for future users.

At the end of the day, each of the speakers was asked to share their takeaway thoughts on the day or put forward one piece of advice for others working in this space to think about.

Some speakers mentioned that planning for the future of the information is the most important thing. Gordon Reid suggested that the information needs are the one thing that remains constant and that getting this right at the beginning will lay down the foundations for future users. Andy Harris stressed that careful design and structuring of the records was key and talked about understanding the future use cases and ensuring we keep the right amount of information without over-specifying or keeping too much. Jaana Pinnick also picked up on the issue of appraisal and careful selection of what to keep. 

Several speakers touched on the issue of when to capture semi-current records. Tim Gollins mentioned web archives as a current challenge. Kevin Ashley talked about understanding when a semi-current record becomes an obsolete record...and perhaps when action would need to be taken. Bryony mentioned how neglected semi-current records can be and supported the idea of having a preservation version of those records that may be held separately to the access version. Tim Evans highlighted the fact that there are many semi-current records that archives may never get to see and that perhaps more work is needed to approach potential donors and depositors.

I wanted to end on a comment from Kevin Ashley that was raised both at the start and the end of the day. 

Why we are worried about semi-current records? What it is that is different and special about them and what actions can we take to address these particular challenges?

It is a good question.

  • Is it the implied neglect and a concern that this may lead to potential loss?

  • Is it that valuable records may be altered and that this change won’t be effectively managed outside of a digital archive?

  • Is it that system migrations carried out over time (outside of a digital archive) will lead to changes and losses that will not be documented?

  • Is it that a nice, neat and tidy one-off ‘deposit to archive’ model is broken by records that may still be being added to over time?

  • Is it that digital records can be in more than one place at a time and that increases the challenges we have around control, versioning and citation?

  • Is it that where records are still in active use it is harder to persuade the creators or owners to relinquish control of them and deposit them in an archive?

I think it is a combination of all of these things.

In terms of actions and solutions, the things that really jump out at me are as follows:

  • Having clear systems in place for managing active data - for example the classification and disposal tiers mentioned by Bryony.

  • Having a clear understanding of what records need to be kept and why. As mentioned by Janna, Gordon and Andy, understanding future users and use cases is always going to help.

  • Having clear, documented processes and procedures for managing complex deposit scenarios - such as the methods or managing multiple versions and enabling accurate citation described by Tim Evans.

  • Ensuring access to records is appropriate and adequate - this may help persuade reluctant record creators and owners to deposit them in an archive, happy in the knowledge that they can still use them.

I’d encourage you to have a look back at some of the presentations for yourself (DPC Members do log in to see the recordings) and see what you think.

