Minimum Viable Preservation

Also in this section
Blog Topics

Latest Comments

Archiving Facebook, Right Now
- George Oates 2 months ago
  
  Hi Andy, Yes, it's still a difficult nightmare, and that's just to archive your own stuff! Thanks for ...
Preserving Legislative Records: Why they matter and what the Nairobi City County Assembly can teach us
- Nelly 3 months ago
  
  Thanks for this informative insight Villy
- Adah 3 months ago
  
  A Very informative article.

DPC Blog RSS Feed

Also in this section

Matthew Addis

Last updated on 13 July 2023

A Minimum Viable Product (MVP) is a common term in the tech sector to describe something that has just enough features to satisfy early customers, and to provide feedback for future product development. A similar term 'Minimal Viable Preservation' came up in discussion at iPRES this year during the apres-ipres unconference that followed. Many organisations have a large volume of digital material that needs some form of basic treatment to 'stabilise it' from a preservation point of view pending more detailed action to be taken in the future (when budgets and resources permit!). This is just like a product MVP - just enough to satisfy initial preservation needs and giving an understanding of what's needed in the future. In many cases I suspect this 'basic treatment' might be all the attention that some content will get. The best laid intentions today of 'further action' in the future can often become lost in the deluge of new content and new challenges that institutions face when that future arrives. That's why some form of Minimal Viable Preservation done today can be so important - especially when automated and made cost-effective at scale so it can be applied as a matter of routine - and we all know that if something isn't routine then there's the very real chance that it won't get done at all!

Paul Wheatley used the term Minimal Effort Ingest in his blog on a 'Valediction for Validation', which was posted after the same apres-ipres session. Paul's question was about the value of rigorous file-format validation and the benefits it brings compared with the effort it entails. Validation is often part of a QA step at the beginning of a digital preservation lifecycle. It front-loads effort into the 'ingest' process, it can be very labour intensive and can provide inconclusive benefits. This doesn’t seem to fit with a Minimal Viable Preservation ideal. Whilst Paul didn't use the term MVP, I think his suggestion is a good one of focussing more on useful content rendering rather than formal conformance to file format specifications. Paul said that a sensible objective is to answer the question "Does this digital object render without error, accurately and usefully for the user, and is it likely to render in the future? If not, should we do something about that?" That sounds more like MVP to me!

So what questions should Minimal Viable Preservation answer? My first stab would be that MVP should be able to answer three things: (1) Do I know what the content is? (2) Can I render it in a useful way and do so in an independent environment? and (3) Have I got the content and renderer stashed away somewhere safe? This isn't dissimilar to Tim Gollin's 'Parsimonious Preservation' approach and how to put it into practice. I'm fond of referencing this work because it boils preservation down to two main things: 'know what you've got' and 'keep the bits safe'. Very MVP. In effect, I've added 'know that you can render it' and 'keep the renderer safe'. It's the combination of having content plus the capability to render it that gives some hope of being able to use that content in the future and being able to do so independently from the original environment in which the content is created. I'm less concerned with the intricacies and effort associated with exhaustive characterisation, formal file format validation or even normalisation to purportedly long-lived formats. I just want to know that I've got stuff safe and I have some form of independent means to access and use it. You could say it’s moving the focus to the ‘means to use’ and away from ‘polishing the bits’. There's a lot more that could be said about the rendering side of the house, including whether a renderers should be open-source, cross-platform, unencumbered by licenses or restrictions, provide export to other formats, be based on standards, be supported by a community, be runnable in 'headless' mode so rendering checks can be automated, and whether they can be packaged up in docker containers or VMs to make it easier to run them in the future. But whilst all ideal, and a topic that would lead us into the interesting and related area of Software Sustainability as well as the need for Registries and Repostories of Renders (I've just made a mental note to follow-up with a blog post called the three Rs), maybe this is deviating from the bare minimum needed for MVP - simply having any render that you know can read content is much better than having no renderer at all!

So how do we do all this in practice? How do we commoditise MVP and make it accessible to the widest possible set of stakeholders? How do we define what MVP is and how do we enact it easily and effectively? Everyone could go away and define their own version of MVP and implement it in their own way - and there's a long tradition of many organisations thinking that their content and users are somehow 'special' and 'unique' and need their own specific approaches. But what interests me is whether we can define a common baseline to MVP that's can be adopted across the community and can be deployed easily by anyone who needs it. There are a couple of new initiatives that could come together very nicely to help get us closer. The first is Community Owned Workflows (COW). MVP is a workflow. It should be defined, documented and shared as a workflow (or workflows) by the community. COW provides a mechanism for this. The second is Preservation Action Registries (PAR). I was part of the team that presented this at iPRES and it too had lots of discussion at the apres-ipres session. PAR is about how to specify, exchange and execute digital preservation actions in a way that is open, unambiguous, machine understandable and machine executable. In effect, PAR allows people to define and share their good preservation practice in a way that can be exchanged and automatically run by preservation software and systems. If MVP workflows were to be defined in COW, then PAR provides a way to formalise the steps and allow MVP workflows to be run by systems such as those from Arkivum, Artefactual or Preservica (all founding contributors to PAR), but also independently by anyone else or any other system that supports the open PAR specification and APIs. Whilst COW and PAR provide rich and flexible frameworks for MVP, implementation doesn't have to get too complicated - and if it does then it probably wouldn't be MVP anymore! For example, for many content types, MVP may need to be little more than establishing fixity, identifying file formats and selecting/testing/storing a renderer that can deal with those formats. For AV formats, ffmpeg and VLC immediately come to mind. That's not to say that other content types won't be harder or need more steps, especially complex digital objects where an authentic user experience is a key part of preservation, and iPRES was a source of inspirational work in this area such as the British Library's work on emerging formats. But maybe there is a place for a simple MVP approach to complex objects too - if it's not easy to find or keep or run a full-on renderer, then simply recording 'a previous rendering of' the content (video, screencams, webrecorder.io etc.) could be an MVP approach to take.

I thought I'd end this post with a story of preservation work at the BBC. Years ago, when I worked on the PrestoSpace project, I talked to a team at the BBC working on migration of some of their video tapes (over 1 million of them to be precise). The statistic that stuck in my head is that when transferring content from their old tapes to a new carrier, then if a particular tape didn't playback first time and needed some specialist intervention, then the cost was 5X as much as a tape that could be transferred easily and automatically. The temptation is to identify the problematic material first and focus efforts on them because they are clearly more 'endangered' and more 'at risk'. But in doing that, the 'good' stuff turns into 'problematic stuff' because of the time it spends at the 'back of the queue' quietly deteriorating from lack of attention. Overall costs spiral as does content loss. Better is a pragmatic and 'engineering approach' to preservation. Focus on the easy stuff first and stabilise it. Then worry about the ‘hard stuff’ that needs expensive and costly intervention. I think there is a strong case for MVP digital preservation to follow a similar approach - a simple and lightweight process that can be applied to the more 'straightforward' end of digital content. 'Stabilise' it from a preservation point of view using a minimal set of activities that guard against it becoming 'much harder' to do in the future. Do this automatically and at scale to keep the costs down. And most importantly, do this now. We all know that digital content has a habit of becoming derelict if left unattended and unloved. This is MVP.

I'd be the first to agree that MVP isn't a new concept, but what's interesting for me is how COW and PAR along with emerging approaches in the software world such as workflow automation, containerisation and virtualisation could all come together to provide us as a community with the tools needed to put MVP into practice.

Add comment

Archiving Facebook, Right Now

Preserving Legislative Records: Why they matter and what the Nairobi City County Assembly can teach us

Matthew Addis