There seems to be a lot of chatter around at the moment, and has been for some time, about how digital preservation should be ‘business as usual’. I like the idea; preservation becoming a core part of business activity. What we do every day. The only thing is I think this approach is wrong.
Somehow the implication (still) seems to be that ‘preservation’ is something that exists outside of our regular business activities. (Let’s put a definition of ‘preservation’ aside for another day) It’s possible to see preservation as being separate to the routine of activities we do. It’s as though some see it as a separate set of activities to appraisal, acquisition, arrangement and access. It does make for a slick conference paper title; ‘Making Progress in Digital Preservation … from basic steps to business as usual’. (I stole that example from a DPC presentation) I’d argue that this emphasis places preservation in a context in which it’s possible to carry out all these ‘regular’ activities but to ignore aspects of preservation as something to be tackled later. It positions preservation as an optional activity.
Why preserve digital content. Preservation is a way to maintain access to content over time. We acquire content so as to make it available for our users. Acquire it for the benefit of research. Acquire it to be made available. For the long term. We all want our collections to have a return on our (not insignificant) investment. What we call preservation should be embedded within the whole end-to-end process of life cycle management.
So why is ‘business as usual’ wrong? I’m not a fan of the term ‘digital preservation’; it’s too general and too vague. I prefer to use the phrase ‘life cycle management’ – taken from work done by the Digital Curation Centre (DCC). This model puts flesh on the bones and sets out steps necessary ‘…for successful curation and preservation of data from initial conceptualisation or receipt through the iterative curation cycle.’ Preservation as a continuous process, as a set of ongoing tasks, not all of which are performed against the data, that collectively add up to ongoing access to data.
Let’s start with descriptive metadata. If we look at what we do with our data we can see that a lot of things happen to it; data is created, it is transferred, copied, arranged, stored, backed up, probed for metadata, opened, manipulated and tested and so on. For life cycle management we need metadata that describes and supports these ‘events’. PREMIS is a way of capturing metadata that supports understanding our collections by creating administrative metadata around these events. This administrative metadata is an essential part of knowing what we hold, understanding the diversity of our content and using this to support the creation of rich chain of custody metadata. We tend to call it appraisal, acquisition, arrangement and access and the connection between these events and preservation is not always made. At #ONLINE17 Karen Smith Yoshimura of OCLC proposed MARC obsolete as designed for description not discovery - no place on semantic web. (Taken from a Tweet by Libraryak, @ksaunders0) MARC does little to support life cycle management and, some argue, little to support exploitation of data.
In her blog Helen Hockx talks about working to preserve a data set and notes; ‘Metadata really is essential. Without the notes, finding aid and scanned codebook, we would not be able to make sense of the dataset.‘ If the process of preservation is an ongoing one then so is the collection of administrative metadata. Hockx uses a wide range of metadata sources to create a data set that can be exploited; in other words to ‘preserve’ it.
There is a more interesting activity going on here. Metadata as a collection item. As more administrative metadata is collected so it becomes a key data set not only for collection and life cycle management but of interest to a wider community of users. The more we treat our digital collections as data so administrative metadata supports the exploitation and manipulation of that data and so better opens it to re-use. Especially so when we create administrative metadata to support a designated community of users.
Using tools such as the Duke University Data Accessioner can place more security around copying data from portable media. Events that can be recorded as PREMIS metadata that speak directly to chain of custody. Automating the creation of checksums and subsequently validating those as processing content progresses is another PREMIS event. The collection of validation metadata answers the question; how do I know that what I think I have is still what I actually have? The history of validation directly supporting authenticity.
Baking in preservation, and not talking about moving to business as usual, shifts the focus from preservation as a resource hungry and separate activity to one that is just what we do. I’d argue that re-working the processes around appraisal, acquisition, arrangement and access achieves 2 key goals; it focusses preservation activity as something that is done from the very beginning, and, it creates a more attractive proposition for management in that preservation is seen as an activity to directly support long term access to content.
It’s also a more agile way to approach preservation. If a preservation intervention; a PREMIS event, is triggered by external factors such as perceived format obsolescence and this is something that can change over time, then activities around appraisal, acquisition, arrangement and access can also be changed in response to the same events. Then, not only is preservation baked in from the very beginning but it becomes adaptable to external change through the formal processes of appraisal, acquisition, arrangement and access.
I’ve been using PREMIS as an example of a way of creating administrative metadata. By adopting a formal approach to documenting ‘events’ from acquisition onwards we not only support preservation but a more comprehensive chain of custody. Metadata that supports who did what and when, leads to better authenticity of the content.
It’s a key message to senior managers; build preservation into everything that we do with our digital content. It’s a message that has been resisted on the basis of the need for perceived additional resources.
What I’m describing could equally be called ‘business as usual’. What I’m arguing for is a closer integration of ‘preservation’ with the regular activities of appraisal, acquisition, arrangement and access. Preservation is not a separate activity, not something done outside of the routine of appraisal, acquisition, arrangement and access. Baking in preservation to regular activities is an opportunity, and one that represents quite a win because we have to do those activities anyway.
I’ve tried to argue here that we are already doing life cycle management during appraisal, acquisition, arrangement and access. We just don’t call it that or create administrative metadata out of these processes for preservation purposes. I suspect my argument is possibly a case of semantics; of what we say and how we say it getting in the way of a good idea. But however we do preservation it’s not a move towards business as usual. Preservation has always been business as usual.