Which checksum algorithm does everyone use? Introducing the NDSA Fixity Survey Report

Also in this section
Blog Topics

Latest Comments

A perspective on need among digital preservation professionals
- Micky Lindlar 3 months ago
  
  Hi James! Great work - thanks for conducting it and raising awareness of it through this blog. I'm ...
An Unexpected Gift
- Niamh Murphy 8 months ago
  
  This is fantastic! Thank you so much, Andy! Merry Christmas!
Workflows At The University Of Sheffield: Showcasing The Work Of The Last Year
- cate 1 month ago
  
  I love this! Would it be possible to see a higher res image of your workflow illustration? I think ...

Jenny Mitcham

Last updated on 3 November 2021

Almost a year ago now, the DPC published a Technology Watch Guidance Note from Matthew Addis entitled ‘Which checksum algorithm shall I use?’. It was really helpful at the time to summarise and condense advice and good practice on checksums and their use in digital preservation (and of course help people to make that all important decision on which checksum algorithm to pick).

Without a doubt, checksums are of key importance to our work in digital preservation. Use of them is regarded as good practice (as encapsulated in models such as the NDSA Levels of Preservation and DPC’s Rapid Assessment Model and certification standards such as CoreTrustSeal and ISO16363), but there is still a gap in our knowledge.

How do we know exactly when to create them, which tools to use, how frequently to check them, which events would trigger their use, what to do in the event of a failure...and of course, which algorithm we should use to create them? Where is the step by step handbook to provide all the answers?

Of course, there isn’t one.

Checksum DigitalPreservation We could try to write one but it would be impossible to get it right. Like pretty much any digital preservation question, there is no one-size-fits-all solution that will suit every organization in every context. Decisions on checksums and their use should be balanced alongside other factors such as the value of the content, specific risks and available resources within an organization. Fixity practices do not exist in isolation and other aspects relating to the long term management of digital content will also have an impact (such as how many copies are stored, where they are stored, how often backups are retained). The resulting solution for any organization will typically be unique to them.

What is everyone doing?

With a lack of prescriptive guidance our next best option is to look around and see what everyone else is doing. There is a huge benefit in being able to benchmark practices against others within the digital preservation community.

This is where the value of a community survey such as the NDSA Fixity Survey really becomes obvious. It allows us to find out about fixity practices across many organizations at once as well as seeing the evolution of practice over time.

The NDSA first carried out a fixity survey in 2017 and the report that was produced provided a useful snapshot of fixity practices. It was noted that there wasn’t a clear consensus on how checksums are created and verified across the digital preservation community and it was clear that for many respondents current procedures were a work in progress.

Surveying the community

This year it was agreed that the NDSA Fixity Survey should be re-run to see how much things had changed. Seeing the value of work like this, I was keen to volunteer to join the Fixity Survey working group and get involved in reviewing the survey questions and analyzing the results.

It was an interesting, productive and highly collaborative experience. Our co-chairs, Carol Kussmann and Sibyl Schaefer, scheduled regular meetings and kept us to an ambitious timeline. It is gratifying to see the results of all of this hard work released just in time for World Digital Preservation Day.

The report is available for all to read and I’d recommend you take a look. The survey responses demonstrate that there is still a huge amount of variety in fixity practices across the digital preservation community. The value of using fixity information such as checksums is clear, but there are many differences in the way practitioners use them and it is clear there is not always a one-size-fits-all solution even within an organization. Many respondents reported multiple working practices within their own local context.

The often manual nature of fixity checking procedures is also apparent from the survey results and it remains a theme that many respondents still saw their fixity practices as a work in progress. This is no bad thing and reflects a desire for continuous improvement and a willingness to respond to evolving good practice as it emerges.

Survey questions this year were expanded to include (among other things) a new section on fixity failures. It was particularly interesting to review this section of the results for the first time. Many of us within the community are well rehearsed in discussing the reasons for regular fixity checking and how important it is for catching and fixing errors so it is useful to see a snapshot of how often fixity checking errors actually occur and in which circumstances. I would speculate that this might be the most useful section for any one who wants to understand the risks in order to inform their own fixity checking practices.

This year for the first time, the notion of environmental responsibility was touched on in the survey questions, specifically in relation to the factors that are considered with regard to defining the frequency of fixity checking. Responses suggest that this issue was not a key factor in the decision making of many respondents at this point in time.

One of the most interesting sections of the Fixity Survey Report for me is the case studies. The working group followed up the survey with a number of interviews with a range of organizations of different sizes and contexts to gather further information about how fixity is used in the long term management of digital archives. These case studies really capture the complexity and evolving nature of fixity practices within the digital preservation community, providing a much more complete picture than that which can be captured by survey questions alone. It was certainly worth the extra effort to carry out this additional work alongside the analysis of survey responses and I hope you will find the case studies as interesting as I did.

So which checksum algorithm is the most widely used?

I expect you are all wondering...

The survey results show that the most common fixity-checking algorithm currently in use is MD5...but the answers to this survey question also reveals greater complexity, with many organizations employing multiple checksum algorithms. I’d definitely recommend you check out the full breakdown of results for this and other questions in the Fixity Survey Report for 2021.

Thanks

I’d like to end by thanking the NDSA Fixity Survey Working Group members, all those who agreed to be interviewed for the case studies and everyone within the community that took the time to respond to the survey. The engagement of a wide section of the digital preservation community is needed to carry out a piece of work like this, but I hope you will agree it has been well worth the effort.

Happy reading...and happy World Digital Preservation Day 2021!

Add comment

A perspective on need among digital preservation professionals

An Unexpected Gift

Workflows At The University Of Sheffield: Showcasing The Work Of The Last Year