Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark

 

Introduction

 

In a digital environment, decisions taken regarding creation and selection have significant implications for preservation. The link between access and preservation is far more explicit than for paper and other traditional materials, as access to a digital resource can be lost within a relatively brief period of time if active steps are not taken to maintain (i.e. preserve) it from the beginning. As the interactive Decision Tree indicates, if it is neither feasible nor desirable to preserve a digital resource across various changes in technology, then its acquisition may need to be re-evaluated. While many of the same principles from the traditional preservation environment can usefully be applied, policies and procedures will need to be adapted to the digital environment.

In a print environment, the physical dimensions of an archive mean the reasons to select are relatively well understood, while the decision to preserve can be taken quite separately and within a timeframe which may span several decades. In contrast, digital resources can proliferate and appraisal can be a daunting task. Moreover they can become inaccessible relatively quickly, so decisions about selection and preservation may need to be taken simultaneously for digital collections.

While this may mean that greater rigour is required in selecting digital resources than for printed or other analogue material, it will avoid costs which will otherwise occur later as retrospective preservation of digital resources is not recommended.

Accurate documentation is also crucial in the digital environment. This will provide not only essential details for managing the resource over time but also information on context without which there may be little point in preserving the digital object itself even if it is technically feasible to do so. In the accompanying Decision Tree, it is suggested that acquisition be re-evaluated if documentation is inadequate.

In the case of networked digital resources, where providing access to a resource does not necessarily require bringing the resource physically into a collection, the concept of acquisition is quite different from traditional collections. There are a range of options available to provide access or to build 'virtual collections'. For example, making copies/mirrors for access, providing a hyperlink to a resource, online catalogues and finding aids.

In some cases an institution may be reluctant to take primary preservation responsibility for material if it feels that interest in its preservation is so widely shared that it would constitute an unfair burden on their own institution. This emphasises the need for collaboration between institutions and the need to establish equitable agreements for shared efforts where necessary. A number of services have emerged in recent years, like the Keepers Registry for electronic journals or the DLF/OCLC Registry of Digital Masters which allow institutions to identify preservation intent - their commitment to preserve material that may be of general interest. The accompanying Decision Tree for appraisal and selection is based on the assumption that the resource has not yet been acquired and indicates a number of points at which cost implications will need to be taken into account before the decision to proceed with acquisition. It suggests that, at these points, difficult choices may need to be made about whether the resource justifies the costs or whether it is preferable not to proceed with acquisition.

 

Development of policy and procedure

 

Before embarking on the acquisition and ingest of digital collections it may be necessary to establish whether current policies (e.g. collection development) and procedures are still fit for purpose in the digital age (see Institutional policies and strategies) . Depending on the structure and wording of existing documents this review may result in anything from small changes that increase scope to include digital objects through to the creation of new policy documents specifically covering digital collections. Additions or alterations to the policy may include descriptions of the types of objects that will be acquired, in relation to format and/or content, as well as addressing other issues such as intellectual property rights, sensitivity and access considerations. It is essential that any changes are ratified by the relevant management committees within your organisation to ensure support and buy-in.

Appraisal/retention policy

The Decision Tree accompanying this section may be used as a tool to construct or test the selection, or appraisal/retention policy for your organisation.

Appraisal of born digital objects should include a measured assessment of their value to the parent organisation set against the challenges of long-term preservation and providing access. These challenges may include an organisation's ability to read or open a version of the master file, the ability to secure sufficient rights to manage and provide access to current and future versions of the file, or simply staffing and funding resources. Organisations should therefore initially focus on a balance between acquisition of high value digital objects and these longer term curatorial obligations. It should be remembered that organisations can provide access to resources that they have accessioned without placing them in specific preservation or retention workflows. A detailed policy document which clearly identifies the most important digital resources (from either a format or content perspective) can give guidance on appraisal of born digital objects destined for such pathways. For lesser value digital acquisitions, which often come bundled with higher value acquisitions, it may be enough to outline the level of access and preservation an organisation will provide to them. This outline should include an indication of a retention schedule suitable to this type of content. This may mean also including a disposal schedule or de-accessioning policy if appropriate (see Retention and review).

Agreements and Guidance for depositors - file formats, required documentation

Once policy has been established there will be a number of additional supporting documents that will be required to facilitate the acquisition and appraisal process. Alongside the standard procedural documents an organisation may wish to create a suite of standard depositor agreements and licences to aid in the negotiation process. These will be particularly useful in ensuring that the minimum permissions and intellectual property rights required for preservation are granted. Without sufficient licence agreements an organisation may find itself in possession of digital collections that it does not hold the rights to actively preserve or provide access to (see Legal compliance). These may also be complemented by guidance notes for depositors that set out requirements for material to be transferred and accompanying documentation.

 

Standards for acquisition and transfer

 

Experience shows that the transfer from a producer to an archive can be tortuous and therefore any tools which can streamline the process are likely to benefit both sides. Two initiatives have attempted to standardise the interface between Producers and Archives into a consistent, well-understood process, cultivating a mutual understanding between producers and archives in regard to their respective roles: Producer-Archive Interface – Methodology Abstract Standard (PAIMAS, ISO 20652:2006); and the Producer–Archive Interface Standard (PAIS, ISO 20104:2015).

PAIMAS provides a standardized description of the interactions between producers and an archive. It segments the transfer process into a number of phases, providing a detailed description of the anticipated outcome of each phase and the actions required to bring about this outcome. The four principle phases - Preliminary phase, Formal Definition Phase, Transfer Phase, Validation Phase - serve as a basis for identifying areas within the Producer-Archive interface that would benefit from more focused standards, recommendations, and best practices, and also provides a foundation for the development of automated processes and software tools to support the information transfer process. PAIMAS implicitly expands the detailed requirements for Ingest and Administration within the OAIS reference model.

PAIS provides a standard method for formally defining the digital information objects to be transferred by an information Producer to an Archive and for effectively packaging these objects in the form of Submission Information Packages. It is intended to support more precise definitions of the digital objects, helping archives process and validate objects received during submission.

 

Acquisition workflow

 

Negotiation

Negotiating the terms of the deposit should take place before any records have been transferred. Many aspects of the deposit agreement may be covered in the acquisition policy of the organisation but details about each deposit, especially for local and specialist archives may be required at a collection level. The depositor should state if there are any limitations in what or when the records can be published, for example can some material be opened immediately while others can only be opened on the death of the depositor or after a set period of time?

A key consideration is the right of the organisation to alter the record for preservation purposes, for example migration to a format that can be preserved in the long term or accessed.

If the transfer includes content that is essential to the understanding of the records but does not constitute a record itself there should be an agreement that the organisation can delete those files when their content has been captured for use elsewhere (for example as metadata for the records).

Transfer

Most institutions will need to develop procedures and documents to support the smooth transfer of digital resources from suppliers into their collections.

When transferring born digital objects into an organisation's IT environment consideration should be given as to how this is to take place so as to ensure the security and completeness of the transfer. For smaller organisations it may be sufficient to have the relevant digital files delivered on a drive or similar hardware and check their contents against the descriptive file manifest. Alternatively organisations may wish to transfer digital files using an in-house FTP, or a paid for third party solution (such as a cloud based file sharing service), to ensure chain of custody.

The table below outlines options for transfer and accessioning of digital materials. Decisions on file formats and if relevant storage media (see Storage, Legacy media, and File formats and standards) will support and be interdependent with this process.

Options for Transfer and Accessioning of File Formats and Storage Media

Options

Issue

Requirements

All options

 

Limit range of file formats received

Limit range of media received (most cost-effective long-term option)

  • Simplifies management and reduces overall costs.
  • Depositor may lack resource or expertise to comply.
  • Wide variety of file formats used and proprietary extensions to open standards.
  • Physical storage media used for transfer may only be temporary carriers and content will be transferred to long-term storage used.
  • Guidelines on preferred file formats.
  • Degree of influence over the deposit.
  • Advocacy and Collaboration strategies to achieve desired outcomes.
  • Guidelines on preferred transfer media and transfer procedures.

Accept file formats as received but convert to standard file format

Accept storage media as received but transfer contents to standard storage used

  • Simplifies management and reduces longer term costs.
  • May not be technically feasible to convert to standard file format.
  • It will be necessary to check that accidental loss of data has not occurred.
  • Legal compliance, Copyright permissions or statutory preservation rights.
  • Resources and technical expertise at host institution.
  • Election of preferred formats.
  • Documentation of native formats to allow conversion.
  • Integrity checks for conversion process.

Accept and store as received (least cost-effective option long-term, despite lower initial costs)

  • Complicates management and increases costs of managing resources over time.
  • High risk option, particularly if large numbers of digital materials are being collected.
  • A choice of file formats may be available. That deposited may not be the most suitable for preservation.
  • Storage media may be of unknown quality and suitability for long-term preservation.
  • Formats may be obsolete or not supported within the institution.
  • Clearly defined priorities for both short and long-term preservation.
  • Ability to address issues such as encryption, proprietary software etc. in received items.
  • Ability to ensure future access to information contained in the item.

 

Validation

Once transfer has taken place, the files should be located in a secure, quarantined, backed up environment and a check in process should be promptly initiated. Having transferred and housed a copy of a born digital collection an organisation is now liable for certain legal responsibilities such as Freedom of Information Requests if in the public sector. Following this, a letter of acknowledgement of receipt should be sent to the donor. It is important at this early stage that no instruction to destroy the original files is given.

Records should be virus checked at the earliest opportunity to ensure that the material has not been infected with malware or viruses. If any are found the depositing organisation should be alerted and the media either returned to them (if they do not have a copy) or formatted and returned or destroyed according to the depositor's preference. Once the records are confirmed as virus free a check should be carried out to ensure that all the records are present and undamaged. The most reliable method for this is to verify the files against the manifest. Create checksums for the files and compare them against those listed on the manifest pre-transfer. If the checksums match you can be sure that the records have not been corrupted or accidentally altered between the points of transfer from the depositor and arrival at the organisation. If no verifiable manifest was provided with the deposit, it may be impossible to comprehensively verify the integrity of the files and manual viewing of a sample of files may be necessary to provide some indication of completeness and quality. In this case, a verifiable manifest should be generated to enable subsequent fixity checking.

At this stage the records will hopefully have been confirmed as complete (according to the manifest), retaining their integrity (exactly what the depositor supplied) and are virus and malware free. They can now be ingested into the digital preservation system.

Metadata describing the deposited material will assist in ensuring the fixity (see Fixity and checksums) of the material during the transfer process as well as supporting subsequent preservation and access. This might include:

A verifiable manifest consisting of a list of the file and folder names and checksums/fixity values for each file

The size of the files (with a total volume)

A list of the file formats

A statement detailing any IPR associated with the records

Where possible the onus of providing information on the IPR of the records should reside with the depositor.

Ingest Process

The period between transfer to the organisation and ingest into the organisation repository or digital preservation environment may be substantial. This accession phase can be especially prolonged for large born digital collections, sometimes amounting to years, but it is during this phase that a qualitative appraisal of the objects can be made. Items are examined, their technical metadata harvested, their descriptive metadata enhanced and the general accession processes of the organisation applicable to any object take over.

It is during this sometimes prolonged appraisal period that items can be reconsidered for ingest, or rejected if on examination it is felt they do not meet the acquisition or collection profile of the organisation, the file format specification laid out in the guidance documents, or for any other reason. A moratorium may be imposed on items of particular sensitivity such as personal information, commercially sensitive information, or items that break libel laws for instance. In such cases it is important to clearly specify the closure period of the file.

Ingest Procedures to prepare data and documentation for storage and preservation

Unique numbering

Each digital resource accessioned by an institution should be allocated a unique identifier. This number will identify the resource in the Institution's catalogue and be used to locate or identify physical media and documentation. In the event of a resource being de-accessioned for any reason, this unique number should not be re-allocated. See Persistent identifiers for advice if you use a persistent identifier scheme.

Handling and Storage transfer guidelines

Handling and transfer guidelines for accessioning staff should be developed reflecting IT and preservation staff advice on best practice for different storage media and file transfer to long-term storage systems (see Legacy media, Digital forensics, and Storage).

Re-formatting file formats

Where the file formats used to transfer the resource are unsuitable for long-term preservation, the Institution may re-format the resource onto its preferred file formats. In addition to archive formats, versions in other formats suitable for delivery to users may also be produced from the original (see File formats and standards, and Storage).

Copying

Multiple backup copies of an item may be generated during accessioning as part of institutions' storage and preservation policy and to enable disaster recovery procedures (see Storage).

Security

System and physical security policies and procedures should be in place to ensure the care and integrity of items during accessioning. These should be developed from and reflect the institutional policies and procedures on security (see Information security).

Edition and version control

Procedures for updating and edition control of any dynamic digital materials accessioned (e.g. annual snapshots of databases which are regularly being updated) or for version control of accessioned items where appropriate (e.g. items accessioned in different formats or for which different formats for preservation and access have been generated.)

Cataloguing and documentation standards

Metadata and documentation received or created during transfer, validation and ingest is essential in order effectively to exchange information and documents between platforms and individuals. At a minimum, it should provide information about an item's provenance and administrative history (including any data processing involved since its creation), content, structure, and about the terms and conditions attached to its subsequent management and use including IPR rights and the period over which they pertain (see Metadata and documentation). It should be sufficiently detailed to support:

  • Resource discovery (e.g. the location of a resource which is at least briefly described along with many other resources).
  • Resource evaluation (e.g. the process by which a user determines whether s/he requires access to that resource).
  • Resource ordering (e.g. that information which instructs a user about the terms and conditions attached to a resource and the processes or other means by which access to that resource may be acquired).
  • Resource use (e.g. that information which may be required by a user in order to access the resource's information content).
  • Resource management (e.g. administrative information essential to a resource's management and preservation as part of a broader collection and including information about location, version control, etc).

Processing times

Ideally targets should be set and monitored for the maximum time between acquisition and cataloguing to prevent backlogs of unprocessed and potentially at risk materials developing during the accessioning process.

 

Skills, resources and capacity

Organisations should consider whether they have sufficient technical and staffing resources to acquire digital collections. This information may however not be apparent at the outset of an acquisition, as the various challenges of curation of specific digital collections may only reveal themselves over time. Organisations should therefore plan for knowledge, skill and staffing gaps and where possible address these through training, recruitment or engagement of specialist professional digital curation services. Where funding resources cannot meet these, often a dedicated in-house knowledge building drive may suffice for the interim. (see Staff training and development, and Procurement and third party services).

Costs of acquisition and ingest

Trying to establish indicative costs for digital preservation activities is always problematic. These should not just include storage (the most obvious) but should also look at the cost of the staff time required to manage the accession and ingest of each born digital object, a process which can mirror time-wise the accession pathway of physical artefacts. Other anticipated costs might include curation processes like normalisation, analysis, enrichment of metadata, increased robustness of storage, disaster recovery etc.

Although an organisation should find best value solutions for these lifecycle costs, it should be recognised that the investment needed to provide a robust preservation pathway that can safeguard our digital heritage may be significant, and that certain processes must be adhered to irrespective of the nature of an accessioned object. Organisations should therefore bear this in mind when acquiring born digital collections. (see Business cases, benefits, costs, and impact)

 

Summary of recommendations

 

Acquisition and appraisal - recommendations checklist

Agreements and Guidance for depositors

Create a suite of standard depositor agreements and licences
checkbox3 Create appropriate guidance for depositors

Transfer procedures

checkbox3 Provide documentation to guide and support transfer of digital materials from suppliers
checkbox3 Decide how your transfer procedures can best be developed to support your storage and preservation policies

Validation procedures

checkbox3 Check media, content, and structure

Procedures to prepare data and documentation for storage and preservation

checkbox3 Unique numbering of each item accessioned
checkbox3 Handling and storage transfer guidelines for different media
checkbox3 Re-formatting of file formats if required according to agreed guidelines
checkbox3 Generating multiple copies of an item as part of an institution's storage and preservation policy
checkbox3 System and physical security policy and procedures for items during accessioning

Procedures for cataloguing and documentation

checkbox3 A minimum standard of information required for cataloguing including IPR information
checkbox3 Guidelines for retrospective documentation or catalogue enhancement.
checkbox3 Procedures for updating, and managing versions or editions of an item.
checkbox3 Procedures to update collection management databases
checkbox3 Selection of cataloguing and documentation standards
checkbox3 Targets for accessioning tasks and timescales for their completion

Review of procedures

checkbox3 Guidelines and schedules should ideally be reviewed annually, or as often as is practical to keep pace with an organisation's developing requirements and collections development policies

Staff training

checkbox3 Plan for knowledge, skill and staffing gaps and where possible address these through training, recruitment or engagement of specialist third-party services

Costs

checkbox3 Evaluate and plan for lifecycle costs of acquisitions

 

Resources

ISO 20104:2015 Space data and information transfer systems -- Producer-Archive Interface Specification (PAIS)

CCSDS 651.1-B-1, Producer-Archive Interface Specification (PAIS) (2014) RECOMMENDED STANDARD CCSDS 651.1-B-1 BLUE BOOK February 2014

http://public.ccsds.org/publications/archive/651x1b1.pdf

The Blue Book is a free to access pre-print of ISO 20104:2015. The PAIS standard aims to provide a standard method for formally defining the digital information objects to be transferred by an information Producer to an Archive and for effectively packaging these objects in the form of Submission Information Packages (SIPs). This supports effective transfer and validation of SIP data (104 pages).

What is appraisal?

http://www.nationalarchives.gov.uk/documents/information-management/what-is-appraisal.pdf

This guidance from The National Archives applies to UK public records in any format, including paper, digital, audio, film or model format as defined by the Public Records Act 1958, and all organisations responsible for such records (2013, 7 pages).

Preserving eBooks, DPC Technology Watch Report 14-01 July 2014

http://dx.doi.org/10.7207/twr14-01

This report discusses current developments and issues with which public, national, and higher-education libraries, publishers, aggregators, and preservation institutions must contend to ensure long-term access to eBook content and which affect acquisition as well as preservation (31 pages).

Preservation, Trust and Continuing Access for e-Journals, DPC Technology Watch Report 13-04 September 2013

http://dx.doi.org/10.7207/twr13-04

This report discusses current developments and issues which libraries, publishers, intermediaries and service providers are facing in the area of digital preservation, trust and continuing access for e-journals. It is not solely focused on technology, and covers relevant legal, economic and service issues in acquiring access to networked digital resources and the unique preservation challenges this presents (43 pages).

The UNESCO/PERSIST Guidelines for the selection of digital heritage for longterm preservation

https://www.unesco.nl/sites/default/files/dossier/persistcontentguidelinesfinal1march2016.pdf?download=1

The UNESCO/PERSIST (Platform to Enhance the Sustainability of the Information Society Transglobally) Project released these Guidelines on the selection of digital heritage for long-term preservation in March 2016. The aim of the Guidelines is to provide an overarching starting point for libraries, archives, museums and other heritage institutions when drafting their own policies on the selection of digital heritage for long-term sustainable digital preservation. (19 pages).

Community Owned digital Preservation Tool Registry COPTR

http://coptr.digipres.org/Main_Page

COPTR describes tools useful for long term digital preservation and acts primarily as a finding and evaluation tool to help practitioners find the tools they need to preserve digital data. COPTR captures basic, factual details about a tool, what it does, how to find more information (relevant URLs) and references to user experiences with the tool. The scope is a broad interpretation of the term "digital preservation". In other words, if a tool is useful in performing a digital preservation function such as those described in the OAIS model or the DCC lifecycle model, then it's within scope of this registry. You can use the POWRR Tool Grid to see which technical tools in COPTR can help to support acquisition, ingest, or multiple functions.

DLF/OCLC Registry of Digital Masters

http://www.diglib.org/community/groups/rdm/

The DLF/OCLC Registry of Digital Masters provides a central place for library staff to search for, and find, digitally preserved materials. Typical items include digitised monographs and serials. A registered object ensures that the digital object (or soon to be digitised) followed established standards and best practices for digitisation and that the institution that digitised it has made a commitment to digital preservation of this object.

Keepers Registry

http://thekeepers.org

The Keepers Registry acts as a global monitor on the archiving arrangements for electronic journals. It has three main purposes to: enable librarians and policy makers to find out who is looking after which e-journal, how and with what terms of access.; highlight the e-journals which are still "at risk of loss"; and showcase the organisations (the keepers) which act as digital shelves for access over the long term. It has a Title List Comparison feature to help you discover the archival status of a list of serial titles important to you: reporting those which are being archived and those which are "at risk".

MediaRIVERS (Media Research and Instructional Value Evaluation and Ranking System)

https://github.com/IUMDPI/MediaSCORE

Software created by Indiana University in collaboration with AVPreserve guides a structured assessment of research and instructional value for media holdings. The free, open source version requires installation and configuration on a server, and a hosted application is available on a monthly subscription basis.

Practical E-Records:software and tools for archivists

http://e-records.chrisprom.com/

Pages created by Chris Prom for Transfer Guidelines, E-Records Deposit Policy, and Submission Agreement Form provide sample templates that you can modify and/or provide to record producers whose records your repository wishes to accession. Permission to modify and republish these transfer guidelines is provided under a Creative Commons Attribution 3.0 United States License.

Archaeology Data Service Guidelines for Depositors

http://archaeologydataservice.ac.uk/advice/guidelinesForDepositors

The ADS Guidelines for Depositors provide guidance on how to correctly prepare data and compile metadata for deposition with ADS and describe the ways in which data can be deposited. There is also a series of shorter summary worksheets and checklists covering: data management; selection and retention; preferred file formats and metadata. Other resources for the use of potential depositors include a series of Guides to Good Practice, which complement the ADS Guidelines and provide more detailed information on specific data types.

Selecting and transferring records

http://www.nationalarchives.gov.uk/information-management/manage-information/selection-and-transfer/

These pages provide guidance on the selection and transfer of records. UK bodies transferring records to The National Archives or to places of deposit under the Public Records Act 1958 should follow this process for records in all formats and media, including paper and digital records. It consists of guidance on six steps:

Step 1: Appraising your records

Step 2: Selecting your records

Step 3: Sensitivity reviews of selected records

Step 4: Cataloguing and preparation of records

Step 5: Planning and arranging delivery of records

Step 6: Accessioning your records

The Work of Appraisal in the Age of Digital Reproduction

http://archival-integration.blogspot.co.uk/2015/06/the-work-of-appraisal-in-age-of-digital.html#pii

The Bentley Historical Library's ArchivesSpace-Archivematica-DSpace Workflow Integration project discussion highlights current digital archives appraisal techniques employed by the Bentley, many of which they are hoping to integrate into Archivematica (June 2015).

Acquisition & management of digital collections at the Library of Congress

http://www.slideshare.net/NASIG/acquisition-management-of-digital-collections-at-the-library-of-congress-34244613

The Library of Congress, as the national library and the home of the US Copyright Office, is heavily involved in digital acquisition and management. This concise and informative powerpoint by Ted Westervelt shares the experiences that the Library of Congress has had and lessons it has learned (2014, 30 slides).

Trust Me, I'm an Archivist: Experiences with Digital Donors

http://www.ariadne.ac.uk/issue65/hilton-et-al

This 2010 article by staff at the Wellcome Trust Library discusses four common scenarios that seem to act as new blocks to the transfer of digital material: Lack of Long-term Planning; IT vs Records Management; Duplication and Abundance; and The Fear of Digital. It concludes that we need to change the way we present information, how we work with digital material and how we can support and assist our donors. The degree of engagement that is standard practice with paper records will not suffice for born-digital material: our interaction with depositors will ideally be even closer and even more frequent, as we help them deal not merely with new technical challenges but with the plethora of soft-skills issues, of preconceptions and of attachments that surround them.