At the turn of the millennium no clear digital preservation metadata recommendation existed. Various element sets had been released by different organizations, but all served different scopes and purposes. By 2002 it became clear that a central Preservation Metadata Framework could significantly benefit the community and give it a lingua franca, not only allowing us to build interoperable workflows but also to better compare preservation processes. This preservation metadata should include, in a rigorously defined yet technically neutral way, all core information pieces an institution would need to know about a digital object they want to preserve for the long-term. Out of this need the international PREMIS (Preservation Metadata: Implementation Strategies) working group was kicked-off in June 2003. In 2006 the PREMIS working group was superseded by the PREMIS Editorial Committee, who continues to maintain and promote the standard and interact with the digital preservation community until today.
In 2005 version 1.0 of the PREMIS Data Dictionary was released, followed by version 2.0 in March 2008 and 3.0 in January of 2016. While PREMIS is implementation neutral, meaning that it makes no assumption about the system or technology you use PREMIS in, sample implementations such as the PREMIS OWL Ontology and PREMIS XML schemas have also been developed and released by the Editorial Committee. In addition, countless supporting information, guidelines, tutorials and workshops have been released over the course of the past 20 years and have been translated, e.g. into Japanese, Spanish, German and Czech.
PREMIS impact is reflected by the countless PREMIS implementations around the world. It is used by large end-to-end (commercial) systems such as Archivematica, Preservica, Roda and Rosetta as well as by digital repository solutions such as DSpace and Fedora Commons and also in countless bespoke solutions in institutions of various types and sizes.
The PREMIS data model reflects the lowest common denominator about digital preservation: the fact that in order to preserve information for the long term, we need to know who (Agent Entity) has given us the right (Rights Entity) to perform actions for the desired preservation outcome (Event Entity) on a digital object (Object Entity). The Data Dictionary gives more granular insight into the information needed. It describes, for example, that it is helpful to know a file's file format, its size, its filename and its significant properties. While this might sound logical to those dealing with digital preservation day by day, the rigorous data dictionary descriptions and relationships between different properties make the data dictionary universally understandable and can be understood by software developers with no digital preservation background as well as by digital preservation analysts and by library or information science students wanting to learn more about the field.
Since its first version in 2005, the data dictionary has evolved as digital preservation knowledge and good practice has also evolved. Version 2.0, released in 2008, introduced Significant Properties as structured descriptions as well as the concept of the Preservation Level to capture the “intensity” of the preservation process to be applied. This reflects what the digital preservation community was working on at that time: building on the ideas of the CEDARS project, the work in EU-funded projects like PLANETS (Preservation and Long-term Access through Networked Services) or JISC-funded projects like InSPECT (Investigating the Significant Properties of Electronic Content Over Time) which demonstrated a shift towards implementable methodologies and tools to capture and codify information about a digital object’s properties. Simultaneously, a wider range of digital preservation tools became available and implemented PREMIS. While all agreed that a common set of core digital preservation metadata is essential, the need for tool or use case specific information also became apparent, which resulted in the introduction of the “extension containers” in PREMIS v2.0.
While emulation had played a minor role in projects like PLANETS, it became the key focus in projects from 2009 onwards. The EU-funded project KEEP (Keeping Emulation Environments Portable, 2009-2012) built the foundation for developments like the DPA 2014 winner bwFLA, who presented a functional networked emulation service. Again, these developments of the digital preservation community were incorporated into the PREMIS Data Dictionary and v3.0, released in 2016, revised the data model to allow for software and hardware environments to be described with PREMIS and thus preserved.
17 years after the first release of the PREMIS Data Dictionary, the digital preservation community has grown in size as well as experience. We see a strong focus on legal and also ethical questions related to digital preservation. How has a mandate for an object changed over the years? How can we model different level of rights such as privacy, property and access rights? The Editorial Committee is currently working towards an updated Rights Entity to be released in version 4.0 of the PREMIS Data Dictionary.
Since the PREMIS Working Group was kicked off in 2005 tools have been developed and vanished, countless projects have been funded and completed, communities and networks were founded and ceased to exist. PREMIS persists. Because PREMIS is maintained by practitioners and for practitioners; because PREMIS has grown with the digital preservation community; and because implementable, core preservation metadata is a key to digital preservation success.