Filling the Digital Preservation Gap is exploring how we can apply digital preservation tools to  research data management. Funders now expect Higher Education Institutions to retain research  data in a usable form for longer periods of time. Rolling retention periods such as those mandated  by the Engineering and Physical Sciences Research Council (EPSRC) state that data should be  retained for ten years from the date of last access ; the Natural Environment Research Council [1] (NERC) requires that for projects of major importance data may need to be retained for 20 years or longer and the Science and Technology Facilities Council (STFC) expects data that cannot be [2] re­measured to be retained indefinitely. When retaining data for long periods of time, [3] preservation becomes increasingly important.    

The project team from the Universities of York and Hull

Many Higher Education Institutions have been working on processes and systems to manage and  provide access to research data,  a but few have actively utilised digital preservation tools to help  curate the data for the longer term. The invisible and slightly intangible nature of digital  preservation means that resources tend to gravitate to the more visible and immediate areas of  need – for example data deposit, storage and access.    

The project team were keen to investigate an open source preservation system called  Archivematica to assess its potential use for the preservation of research data. Making use of a  freely available tool that allowed institutions to automate many of the processes and activities around digital preservation could offer a pragmatic solution to the problem of preserving research  data – particularly given the lack of resource available in most institutions for carrying out this  work. We were inspired by the concept of ‘Parsimonious Preservation’ a phrase coined by Tim Gollins [4] which suggests that sustainable steps towards digital preservation can be achieved by  using free tools and automated processes.    

In the first phase the project team explored whether Archivematica had potential for use in this  context. The project teams installed Archivematica locally for testing purposes and this was  complemented by wider research into its capabilities and discussion with the user community. A  further strand of work investigated the nature of the research data we would be looking to preserve. A detailed phase one report summarised our findings and included an accessible set of  FAQs to help inform others about the need for digital preservation and suitability of  Archivematica. We concluded that Archivematica had potential to be used in this context and  were also able to highlight several areas where it did not quite meet our requirements and would  benefit from further development.    

A second phase of the project aimed to address some of these areas and initiate a number of  enhancements to Archivematica. We did this by sponsoring the development of Archivematica in  six discrete areas. We worked with Artefactual Systems (the lead developers for Archivematica) to  specify our requirements and test the resulting code. The developments sponsored were designed  specifically to deal with issues relating to the nature of research data (its large size and diverse  nature) and integration with other systems (for example repository systems and reporting  systems). We also aimed to reduce some of the barriers to uptake by improving available  documentation – specifically regarding tools for setting up automated workflows. These  enhancements and resources will be made available to all Archivematica users and have value for  use cases beyond the sphere of research data management.    

During phase two, the project team also drew up detailed implementation plans to inform  subsequent work in a future phase of the project. It was recognised that just recommending the  use of Archivematica was not enough and that other practitioners would be interested in exactly  how we would approach the implementation. For example, how would Archivematica integrate  with other systems and how would the workflow be configured?    

One of the important themes running through the project relates to the nature of research data  and how we can use available tools and registries to identify such a diverse set of file formats. This  strand of the project has been of relevance to many other digital preservation practitioners  working with different types of data and is of primary concern to our community. As well as  engaging with The National Archives to discuss how we can increase the coverage of their  technical registry (PRONOM) to include research data file formats, we have also been considering  workflows within digital preservation tools such as Archivematica and how we can encourage the  community to engage with this problem in a practical way.  

A key element of this project has been advocacy and awareness raising. We have been keen to talk to others about the project and disseminate our findings as widely as possible. Alongside the  publication of our reports at the end of each phase of work, we have also spread the word  through a number of different channels.    

The project team have presented at several conferences and meetings. In order to reach a wide  range of individual, our choice of events has been targeted at a variety of different audiences  including digital preservation community, archivists, research data managers and librarians and  those working with a particular technology. 

We have also maintained a project webpage to promote the project and have released numerous  blog posts which have been read by an international audience. This has enabled us to disseminate  information about progress in a more dynamic and immediate way as the project moves forward  and has helped to highlight some of the thought processes we have gone through as we consider  Archivematica and the nature of research data.


FInd out more: 

Scroll to top