The Oxford Common File Layout (OCFL) initiative began as a discussion among digital repository practitioners about the ideal layout and characteristics for a repository’s persisted objects. The need for a common layout grew out of a lack of commonality in the way objects are stored by repositories. It has since grown into an open community effort defining an application-independent way of storing versioned digital objects with a focus on long-term digital preservation. The OCFL represents a specification of the community’s collective recommendations, addressing five primary requirements:

  1. Completeness, so that a repository can be rebuilt from the files it stores,
  2. Parsability, both by humans and machines, to ensure content can be understood in the absence of original software,
  3. Robustness against errors, corruption, and migration between storage technologies,
  4. Versioning, so repositories can update and make changes to objects that allow the history of the object to persist, and
  5. The ability to store content on various storage infrastructures, including cloud object stores.

Over the last two decades, the repository community has struggled with the reality of supporting and maintaining repository software. Over time, repository software has changed significantly, either through normal upgrades and changes to the underlying technology, or because strategic decisions are made to choose a new repository solution. Either way, institutions have struggled with how to make sure that their preserved content outlasts the evolving technology.

OCFL 20190614 hand 2014 Simeon Warner

Experience with two previous formats serves as the basis for OCFL. The first is the Library of Congress "BagIt" format which is a robust and widely used format for transporting digital objects from one system or location to another. Some repository systems have used BagIt is as storage format for archive information packages (AIPs) but that was not its goal and, given the focus on transport, it rightfully does not support versioning. BagIt demonstrates good approaches to completeness, parsability, and robustness however.

The second format is the "Moab" design for managing the lifecycle of a digital object within a repository. Moab was developed for, and is in use in, the Stanford Digital Repository but has not been adopted elsewhere. Most importantly, Moab added a mechanism to efficiently create new versions of objects while retaining the immutability of existing versions. The approach is efficient in that only files that have changed or are added need to be stored in the new version. Other parts of Moab have proved less good as storage has evolved from local disk and tape to include remote cloud object stores. Notably, it has proved cumbersome to have the large numbers of small administrative files that Moab uses.

The OCFL Editorial Group was formed in March 2018, made up of participants from Cornell, Stanford, DuraSpace (nowLYRASIS), Oxford, and Emory. Over the following six months the Editorial Group worked with the community to identify use cases and scope the initial release of the specification. Work after that focused on drafting the specification, creating test data, and creating validation software. There have been bi-weekly editorial group calls and monthly community calls. Version 1.0 of the specification (https://ocfl.io/1.0/spec/) was released in July 2020 along with implementation notes (https://ocfl.io/1.0/implementation-notes/).

OCFL editors Oxford 20180907 165937 Simeon Warner

The OCFL specifications describe two aspects of content storage. The first is the storage of a single object, a group of one or more content files and administrative information that are identified together. The object may contain a sequence of versions of the files that represent the evolution of the object's contents. The second is an approach for the arrangement of OCFL objects under an OCFL Storage Root. A key goal of the OCFL is the rebuildability of a repository from an OCFL Storage Root without additional information resources. Consequently, a key implementation consideration should be to ensure that OCFL Objects contain all the data and metadata required to achieve this. With reference to the OAIS model, this would include all the descriptive, administrative, structural, representation and preservation metadata relevant to the object.

Although only just released, the OCFL specification is being adopted as the basis for the Fedora 6 repository system and other implementations are planned at a number of institutions. All information and feedback is tracked openly on Github (https://github.com/ocfl/spec) and there are ongoing regular community calls.


Scroll to top