Organisational Activities - Storage and Preservation

Print

Maintaining access to digital resources over the long-term involves interdependent strategies for preservation in the short to medium term based on safeguarding storage media, content and documentation, and computer software and hardware; and strategies for long-term preservation to address the issues of software and hardware obsolescence.This section is therefore divided into two parts: the first dealing with storage and maintenance of digital resources; and the second with strategies for their long-term preservation.

A preservation strategy for digital resources is most effective if it addresses the full life-cycle of the resource allowing the greatest efficiencies between data creation, preservation and use.This section should therefore be read in conjunction with related sections and chapters particularly the other sections of this chapter and Media and Formats.

Storage of digital resources supports both access and preservation. Depending on the needs of the organisation and the media, it may be necessary to create both preservation and access copies and to have strategies for each. We have used the term "digital preservation" in this handbook to define all the activities employed to ensure continued access to digital resources which have retained properties of authenticity, integrity and functionality.The term "archiving" can be substituted for preservation provided this definition remains. Archiving is usually interpreted within the computing industry simply to indicate that something has been stored and is no longer immediately accessible.The richer interpretation used here means that there will need to be more thought and preparation given to what resources are stored, how they are maintained and subsequently accessed and by whom.

There is no single definitive solution which can be applied for the preservation of any digital resource. However, an approach which is based on good management practices commenced as early as possible in the lifecycle of a resource, will safeguard the initial investment and facilitate authorised access at least for the short to medium term. Preventive preservation is as crucial a strategy in preservation programmes for digital resources as it is for non-digital material and good storage practice plays a major role in both. Key initial decisions needing to be made by institutions taking responsibility for short- or long-term preservation of digital resources will be:

1) Whether storage and/or preservation will be undertaken by the host institution or under contract to a trusted third party (see Third Party Services for discussion of issues relating to whether or not to outsource); 2) Which resources justify preservation and for what period of time.

The assumption in 2) is that not all resources can or need to be preserved forever, some will not need to be preserved at all, others will need to be preserved only for a defined period of time, a relatively small sub-set will need to be preserved indefinitely. Making this decision as early as possible will help to conserve resources for the most valuable digital assets.

This section deals with the range of strategies and approaches which will help to ensure important digital resources do not become inaccessible prematurely. Many constitute a relatively modest investment compared to the initial costs of creating the resource, which are often substantial.They can also represent significant cost savings longer term. In any event, failure to commit resources to managing digital resources throughout their lifecycle will inevitably result in their loss and/or costly restoration so investment in strategies to prevent this is eminently justified.

4.3.1 Storage and Maintenance

Storage media and file formats

General advice on storage media and file formats is provided in Media and Formats. Policy and selection of storage media and file formats will have implications for institutional strategies such as outreach and development of standards and best practice guidelines (see Outreach and Standards and Best Practice Guidelines) and for accessioning (see Acquisition and Appraisal). Decisions will need to be made during accessioning on whether to store resources as received or to reformat. A table outlining options, issues and requirements to assist with this decision process is provided in Accessioning.

Management of media and systems

Media refreshing and reformatting

Rationale

An essential management component for all digital media to avoid media degradation and to facilitate longer term preservation strategies.

Requirements

Disaster recovery planning

Rationale

The development and use of a disaster recovery plan based on sound principles, endorsed by senior management, and able to be activated by trained staff will greatly reduce the severity of the impact of disasters and incidents.

"The assumption is that with good disaster planning data recovery will be, under most circumstances, unnecessary. The problem is that while attention has been paid to disaster planning and the identification of good recovery procedures the effectiveness of these tend to depend upon pre-disaster effort. This effort often never takes place." (Waters and Garrett 1996)

Requirements

Case study - disaster recovery procedures - Data Archive, University of Essex

The Data Archive is the UK national data centre for the Social Sciences funded by the Economic and Social Research Council (ESRC) and the Joint Information Systems Committee (JISC).The Data Archive has over 4000 mainstream digital datasets or studies, comprising over 125,000 individual files.

The digital storage system at the Data Archive is based on a Hierarchical Storage Management System (HSM) where the files appear to be local to the user but are mainly based on tape. As each file is requested it is either brought back from the disk cache on the system or automatically "restored" from the required tape. Any subsequent requests for that file are returned from disk cache.

Disaster recovery at the Data Archive is based around the resilience provided by creating multiple copies of the data and specified recovery procedures. Each file from any dataset has at least four copies and these are as follows:

Main copy This copy is held on the main area on the HSM file system.

Shadow copy At least one shadow copy is made. As files are updated, they are "shadowed" onto a separate tape in the main system. Multiple versions of these files are kept to allow staff to go back to a previous version of a file.

CD-ROM copy A CD-ROM is created for each dataset as part of the preservation procedure.This allows staff to access an alternate local source in the case of downtime of the main system and serves as an alternative long-term storage media. For each study all of the files are compressed and stored as a single zip file and written on to a CD-ROM. Subsequent updates to this study are created as complete zip files xxxx_2.zip and appended to the existing CD-ROM for that study.

Off-site near-line copy An off-site, near-line copy is kept in case of a major disaster at Essex. Due to restrictions of small file sizes on these systems, these are kept in the form of a range of datasets, which have been grouped together, compressed and encrypted.

Disasters can occur in different forms and at varying levels.The Data Archive has in place a range of recovery measures designed to meet any conceivable disaster.

Source:The Data Archive. Systems and Preservation Procedures(1999 unpublished) reproduced with the kind permission of the Data Archive.

Environmental conditions

Rationale

Appropriate environmental conditions will increase the longevity of digital storage media and help prevent accidental damage to a data resource or its documentation.

Requirement

The following figure summarises British Standard 4783.

Figure 6

Summary of Environmental Conditions Recommended in BS 4783 for Data Storage Media

Device Operating Non-Operating Long term storage
Magnetic tape
cassettes 12.7mm
18 to 24°C
45 to 55% RH
5 to 32°C
5 to 80% RH
18 to 22°C
35 to 45% RH
Magnetic tape
cartridges
10 to 45°C
20 to 80% RH
5 to 45°C
20 to 80% RH
18 to 22°C
35 to 45% RH
Magnetic tape 4 & 8mm helical scan 5 to 45°C
20 to 80% RH
5 to 45°C
20 to 80% RH
5 to 32°C
20 to 60% RH
CD-ROM 10 to 50°C
10 to 80% RH
-10 to 50°C
5 to 90% RH
18 to 22°C
35 to 45% RH

Extracts from BS 4783 reproduced with the permission of the British Standards Institution under licence number 2001/SK0280

Care and handling

Rationale

Appropriate care and handling will protect fragile digital media from damage.

Requirements

Audit

Rationale

There needs to be assurance that the resource has not been inadvertently or deliberately changed following refreshment and/or migration procedures and to check the readability and integrity of the data over time.

Requirements

Security

Rationale

Rigorous security procedures will a) ensure compliance with any legal and regulatory requirements; b) protect digital resources from inadvertent or deliberate changes; c) provide an audit trail to satisfy accountability requirements; d) act as a deterrent to potential internal security breaches; e) protect the authenticity of digital resources; f) safeguard against theft or loss.

It is important to note that not all digital resources will require identical levels of security. Some, for example commercial in-confidence, will require more rigorous security regimes than less sensitive material. Guidance on levels of security can be found in BS 7799 Information Security Management (Tanner and Lomax-Smith 1999). All personal data will need to conform with the requirements of the Data Protection Act (1998) (PRO 1999).

Requirements

Management of computer storage

Rationale

Unlike storage space for physical collections, computer storage is both reducing in cost and increasing in capacity all the time. Costs for processor capacity and storage media are expected to continue to drop (halving every 18 months at least according to Moore's Law) for several years to come (Kenney and Chapman 1996). However while storage is much less of a problem than it was, it conforms to good practice to establish policies and procedures which clarify what digital resources need to be accessible online, nearline or offline. Digital resources can be generated relatively easily, and the prospects for storage space to become cluttered with several versions of documents and other less valuable digital resources are quite high. It makes sense to establish when certain categories of resources may be automatically removed from online storage after a defined period of time, when others will be re-assessed, and which resources will be considered to be sacrosanct.

These decisions will need to be well documented and understood by all stakeholders within the institution.

Requirements

4.3.2 Preservation Strategies

This section is divided into primary preservation strategies and secondary preservation strategies. Primary preservation strategies as defined here are those which might be selected by an archiving repository for medium to long-term preservation of digital materials for which they have accepted preservation responsibility. Secondary preservation strategies are those which might be employed in the short to medium term either by the repository with long-term preservation responsibility and/or by those with a more transient interest in the materials. Chronologically, secondary strategies may precede primary strategies. Some secondary strategies may substantially defer the need for, or alternatively greatly strengthen, primary preservation strategies so describing them as secondary strategies does not necessarily imply their inferiority.Two strategies dominate current options for preserving digital resources long-term, these are migration and emulation. Both have champions and detractors, both have acknowledged difficulties. The need for both may also be deferred and/or simplified if appropriate preventive preservation procedures such as storage and maintenance (see Storage and Maintenance) and selected secondary preservation strategies, have been used.

The other potential long-term strategy,

to an analogue preservation format, differs from the other strategies in two important ways:

  1. It can only sensibly be considered for a relatively small category of digital resources and is patently inappropriate for the increasing numbers of more complex digital resources being created.
  2. By its nature, it loses the digital characteristics of the resources it converts and is therefore a preservation strategy for some digital resources, as opposed to a digital preservation strategy, where the essential aim is to retain the digital characteristics of the resource. The latter should be preferred.

Another option represented here as a secondary strategy is digital archaeology (secondary strategy 7).This is not precisely a preservation strategy at all but rather when the absence of preservation strategies has left valuable resources inaccessible.

It should be emphasised that employing a judicious mix of secondary strategies 1-5 combined with responsible storage and maintenance decisions in Acquisition and Appraisal has the potential significantly to reduce both risks of losing access to digital resources in the short-term and costs of preserving access to them in the long-term.

Primary preservation strategies

Preservation strategies selected by archiving repositories with long-term preservation responsibility for the digital materials in their care. It should be noted that discussion of costs in this context is of necessity based on educated assumptions as opposed to empirical evidence gathered over a very long timeframe. Cost models for complex digital materials particularly those of recent origin are still at the research stage at the time of writing.

Migration

Description

A means of overcoming technological obsolescence by transferring digital resources from one hardware/software generation to the next.The purpose of migration is to preserve the intellectual content of digital objects and to retain the ability for clients to retrieve, display, and otherwise use them in the face of constantly changing technology. Migration differs from the refreshing of storage media in that it is not always possible to make an exact digital copy or replicate original features and appearance and still maintain the compatibility of the resource with the new generation of technology.

(Note: There are differing degrees of migration, ranging from relatively straightforward conversion to a major paradigm shift. Obviously the latter category will be most relevant to the disadvantages outlined below. It should also be noted that by using the secondary preservation strategy of standards, it may be possible to delay the need for migration).

Advantages

Disadvantages

Requirements

Related strategies

Emulation

Description

A means of overcoming technological obsolescence of hardware and software by developing techniques for imitating obsolete systems on future generations of computers.

Advantages

Disadvantages

Requirements

Related strategies

Secondary preservation strategies

Secondary preservation strategies are those which might be selected either by the archiving repository with long-term responsibility for the preservation of digital materials and/or by those with a more transient interest in the digital materials they have created and/or acquired. A judicious combination of secondary strategies and appropriate storage and maintenance (see Storage and Maintenance) can be a cost-effective means of ensuring continued access to digital materials for as long as they are needed, either deferring or in some cases, even avoiding, the need for primary preservation strategies.

Technology preservation

Description

A means of overcoming technological obsolescence by retaining the hardware and software used to access the digital resource. It should be noted that the current definition of this strategy involves individual institutions needing to maintain both hardware and software for all materials they create and/or acquire. A variation of this strategy has been suggested which involves the setting up of a facility offering documentation for hardware and software and file format specification (PRO 1999), (DLM Forum 1997). If these recommendations were implemented, this variation on the technology preservation strategy could become a much more feasible proposition and provide valuable support for genuinely long-term emulation or migration strategies.

Advantages

Disadvantages

Requirements

Related strategies

Adherence to standards

Description

Adhering to stable and widely adopted open standards when creating and archiving digital resources.These are not tied to specific hardware/software platforms and thus can defer inaccessibility of digital resource due to technological obsolescence. Can either be self-imposed by institutions creating digital resources, or imposed by agencies receiving digital resources (see also Standards and Best Practice Guidelines and Media and Formats).

Advantages

Disadvantages

Requirements

Related strategies

Backwards compatibility

Description
Being able to retain accessibility to a digital resource following upgrade to new software and/or operating systems.

Advantages

Disadvantages

Related strategies

Encapsulation
Description

Grouping together a digital resource and whatever is necessary to maintain access to it.This can include metadata, software viewers, and discrete files forming the digital resource.

Advantages

Disadvantages

Related strategies

Permanent identifiers
Description

A means of locating a digital object even when its location changes. Examples are Universal Resource Names (URN's); Handles; Digital Object Identifiers (DOI's); Persistent Uniform Resource Locators (PURLs)

Advantages

Disadvantages

Related strategies

All, except Conversion to Analogue Formats.

Converting to stable analogue format

Description

Converting certain valuable digital resources to a stable analogue medium such as permanent paper or preservation microfilm or, more recently, nickel disk readable by electron microscope.This cannot be recommended as more than a pragmatic interim strategy for a small category of digital materials, pending the development of more appropriate digital preservation strategies.

Advantages

Disadvantages

Requirements

Related strategies

Digital archaeology

Description

Rescuing digital resources which have become inaccessible as a result of technological obsolescence and/or media degradation. Not so much a strategy in itself as a substitute for one when digital materials have fallen outside a systematic preservation programme.

Advantages

Disadvantages

See Exemplars and Further Reading