Handbook

Legacy media

Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark

Introduction

 

Many organisations will have large amounts of data stored on legacy media, such as magnetic and optical media, and data will continue to be received on old carriers. Ultimately, the best long-term strategy for the preservation of the data will be migration to file-based storage and active management thereafter (see Storage section). Often the original media will continue to be preserved alongside this, so it will be necessary to understand their preservation and storage requirements. For organisations with large collections of legacy media, understanding the risks facing each media type will also help with prioritising collections for migration and application of digital forensics tools and methods will also be helpful (see Digital forensics section).

For the preservation of magnetic and optical media, two aspects need to be considered - the media itself and the hardware and software needed to interpret it. In some cases the second aspect will be the most challenging. As the popularity of a media format declines, the manufacture of hardware ceases and becomes more difficult to procure and maintain.

 

Preserving legacy media

In most cases, the simplest way to mitigate risks with storage media is to transfer all content into a managed storage system. This means that the content can be managed without reference to the original storage medium. This would probably be adequate for the vast majority of digital content requiring preservation. However, there may be a few instances where it is necessary to retain the original media carrier in some way. In some cases, the storage medium could simply be retained as an artifact, with no expectation of long-term access, e.g. where it forms part of a hybrid collection or has some kind of value by association. (e.g. part of the collections of a prominent author). However, where continued access to the content is required, careful thought needs to be given to how it could be accessed in the future.

One thing that we do know from experience is that digital storage media types change frequently over time. For example, the previous version of this handbook contained an overview of magnetic and optical storage media and provided estimates of the lifetimes of selected storage media types that were popular in the mid-1990's (a digital preservation handbook written in previous decades would presumably have included assessments of punched cards and paper tape). Given current trends in storage technology, it is perhaps better now to provide a framework that supports the ongoing evaluation of storage media, which might now include flash memory sticks or external hard-drives. One such framework has been provided by The National Archives (Brown, 2008). This uses a scorecard approach, measuring selected storage media against six criteria:

  • longevity (e.g., proven operational lifetimes)
  • capacity
  • viability (e.g., in terms of retaining evidential integrity)
  • obsolescence
  • cost
  • susceptibility (e.g., to physical damage and to different environmental conditions).

In practice, however, these kinds of assessment can only get you so far. There is a growing body of evidence that suggests that variation in manufacturing quality also plays a major role in media longevity (Harvey, 2011). That is why, in the end, digital preservation normally depends upon the transfer of content from media into a managed storage environment.

 

Resources

Selecting storage media for long-term preservation, TNA Digital Preservation Guidance Note 2: August 2008

https://www.nationalarchives.gov.uk/documents/selecting-storage-media.pdf

This document is one of a series of guidance notes produced by The National Archives,giving general advice on issues relating to the preservation and management of electronic records. It is intended for use by anyone involved in the creation of electronic. It provides information for the creators and managers of electronic records about the selection of physical storage media in the context of long-term preservation. Note guidance is as of August 2008. (7 pages).

Care, Handling and Storage of Removable media, TNA Digital Preservation Guidance Note 3: August 2008

http://www.nationalarchives.gov.uk/documents/information-management/removable-media-care.pdf

This document is one of a series of guidance notes produced by The National Archives,giving general advice on issues relating to the preservation and management ofelectronic records. It provides advice on the care, handling and storage of removable storage media. Note guidance is as of August 2008. (10 pages).

You've Got to Walk Before You Can Run: First Steps for Managing Born-Digital Content Received on Physical Media

http://www.oclc.org/content/dam/research/publications/library/2012/2012-06.pdf

A step by step guide about getting digital born material off of various physical media. It focuses on identifying and stabilizing your holdings so that you'll be in a position to take additional steps as resources, expertise, and time permit. The POWRR project document Resources for Technical Steps (3 pages) adds additional resources for some of the steps. (7 pages).

Kryoflux: Commercial tool for reading floppy disks

http://www.kryoflux.com/

KryoFlux is a USB-based device designed specifically to acquire reliable low-level reads suitable for software preservation. This is the official hardware developed by The Software Preservation Society,

Digital Preservation Management: Chamber of Horrors

http://dpworkshop.org/dpm-eng/oldmedia/disks.html

Some examples of obsolete and endangered disks.

Lost Formats

http://www.experimentaljetset.nl/archive/lostformats

Web page from the Lost Formats Preservation Society with a very nice overview of silhouettes of the shapes to allow quick identification and key brief history and features such as dimensions and storage capacity. All silhouettes shown as same size rather than to scale. Last major update appears to be c.2008 but content is still valuable for all but the most recent formats.

Museum Of Obsolete Media

http://www.obsoletemedia.org/category/format/

Great resource covering a very wide-range of obsolete audio, video, data, and film storage media. You can browse the categories or the Gallery and Timeline. Particularly good if you know what you are looking for and derived mostly from the relevant Wikipedia entries.

 

Case studies

A Fistful of Floppies: Digital Preservation in Action

https://ischool.uw.edu/sites/default/files/capstone/posters/JStanley_Capstone_Landscape.pdf

The University of Washington Library system currently holds a small collection of electronic thesis and dissertation (ETD) accompanying materials from the late 1980's to 2011 on floppy disks and CD-Rs. These materials will soon reach or have already exceeded the limit of their expected lifespans. This 2015 project looked at the digital preservation possibilities for this collection of materials using digital forensics as a model.(1 page).

Enford, D., et al 2008, Media Matters: developing processes for preserving digital objects on physical carriers at the National Library of Australia, Papers from 74th IFLA General Conference and Council

http://archive.ifla.org/IV/ifla74/papers/084-Webb-en.pdf

The National Library of Australia had a relatively small but important collection of digital materials on physical carriers, including both published materials and unpublished manuscripts in digital form. The Digital Preservation Workflow Project aimed to produce a semi-automated, scalable process for transferring data from physical carriers to preservation digital mass storage, helping to mitigate the major risks associated with the physical carriers. (17 pages).

Digital Preservation Planning Case Study

http://www.dpconline.org/component/docman/doc_download/863-2013-may-getting-started-london-planning-case-study-ed-fay

Presentation on getting started with digital preservation planning, including scoping, risk assessing and prioritising your collection (including legacy media examples), and staff roles and responsibilities. 2013 (20 pages).

 

References

 

Brown, A., 2008. Selecting storage media for long-term preservation. TNA Digital Preservation Guidance Note 2: August 2008. Available: https://www.nationalarchives.gov.uk/documents/selecting-storage-media.pdf

Harvey, R., 2011. Preserving Digital Materials 2nd edition. De Gruyter Saur.

Read More

Preservation planning

Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark

 

What is preservation planning?

 

Preservation planning is the function within a digital repository for monitoring changes that may impact on the sustainability of, or access to, the digital material that the repository holds. It should be proactive: both current and forward-looking in terms of acquisitions and trends. Changes might occur within the repository, within the organisation in which the repository resides, or external to the repository and organisation themselves. Changes might be monitored in the following areas:

  • Technology watch

packaging
storage
formats
tools
environment
access mechanisms

  • Designated communities

needs and expectations of users
needs and expectations of producers
emerging tools for machine to machine access
formal feedback from users and producers

The concept of preservation planning is defined within the functional model of the OAIS standard (CCSDS, 2012). This section focuses primarily on the Monitoring components within the OAIS definition. The 'Monitor Technology' and 'Monitor Designated Community' functions of OAIS provide surveys that inform preservation planning activities. These alert the repository about changes in the external environment and risks that could impact on its ability to preserve and maintain access to the information in its custody, such as innovations in storage and access technologies, or shifts in the scope or expectations of the Designated Community (see Lavoie, 2015,13). Preservation planning then develops recommendations for updating the repository's policies and procedures to accommodate these changes. The Preservation planning function represents the OAIS's safeguard against a constantly evolving user and technology environment. It detects changes or risks that impact the repository's ability to meet its responsibilities, designs strategies for addressing them, and assists in the implementation of these strategies within the archival system.

 

What is the purpose of preservation planning?

 

Identifying triggers for taking action to preserve digital materials

Where change has been identified, a risk assessment process can be used to analyse and identify the change that represents a significant risk to the digital material in the repository. Risks can then be addressed and hopefully mitigated following a preservation planning exercise to decide on appropriate preservation action. In this case, the monitoring or technology watch process is identifying trigger points for further analysis, preservation planning and, where relevant, action to preserve digital materials.

Building a knowledge base to inform preservation activities

The process of monitoring internal and external factors as part of a preservation planning activity can inform the knowledge base of an organisation, and in doing so improve its ability to perform digital preservation activities effectively. For example the "knowledge base" of an organisation might be augmented with information about the capabilities of a new software tool, or the obsolescence and unavailability of an existing tool. In some cases this process might be best performed individually or within an organisation, but alternatively might be more usefully performed in a collaborative manner. The vast depth and breadth of knowledge required for digital preservation naturally favours a collaborative approach, whereby particular organisations are able specialise in a particular area and contribute that knowledge to an open or shared knowledge base.

Implementations of a preservation planning service

The degree to which technology watch will be necessary will vary according to the degree of uniformity or control over formats and media that can be exercised by the institution. Those with little control over media and formats received and a high degree of diversity in their holdings will find this function essential. For most other institutions the IS strategy should seek to develop corporate standards so that everybody uses the same software and versions and is migrated to new versions as the products develop.

Failure to implement an effective technology watch or IS strategy incorporating this will risk potential loss of access to digital holdings and higher costs. It may be possible for example to re-establish access through digital forensics (see Digital forensics) but this may be expensive compared to pre-emptive strategies.

A retrospective survey of digital holdings (see Getting started) and a risk assessment and action plan (see Risk and change management) may be a necessary first step for many institutions, prior to implementing a technology watch.

Good preservation metadata in a computerised catalogue identifying the storage medium, the necessary hardware, operating system and software will enable a technology watch strategy (see Metadata and documentation).

Integrated preservation systems, and individual tools and registries can also support this function (see Technical solutions and tools).

 

Resources

Some of the core preservation watch activities are generic and therefore ready made for collaboration while others are highly localised and not easily shared.

DPC Technology Watch Report Series

http://www.dpconline.org/advice/technology-watch-reports

These reports provide an advanced introduction to specific issues for those charged with establishing or running services for long term preservation and access. They are updated and new reports added periodically.

Scout – a preservation watch system, OPF blog post 16th Dec 2013

http://openpreservation.org/blog/2013/12/16/scout-preservation-watch-system/

The SCAPE Project designed a demonstrator for an automated preservation watch service, called SCOUT. SCOUT was described by its developers as providing "...an ontological knowledge base to centralize all necessary information to detect preservation risks and opportunities. It uses plugins to allow easy integration of new sources of information, as file format registries, tools for characterization, migration and quality assurance, policies, human knowledge and others."

Assessing file format risks: searching for Bigfoot? OPF Blog post 29th Oct 2014

http://openpreservation.org/blog/2013/09/30/assessing-file-format-risks-searching-bigfoot/

This detailed blog post raises concerns about challenges with automating preservation watch.

Barbara Sierman, Paul Wheatley 2010 Evaluation of Preservation Planning within OAIS, based on the Planets Functional Model Planets Deliverable no. PP7-D6.1

http://www.planets-project.eu/docs/reports/Planets_PP7-D6_EvaluationOfPPWithinOAIS.pdf

The Planets Project realised various aspects of the concepts defined within the OAIS Preservation Planning function, and performed an evaluation of OAIS based on these practical experiences. 2010 (34 pages).

Community Owned digital Preservation Tool Registry COPTR

http://coptr.digipres.org/Main_Page

COPTR describes tools useful for long term digital preservation and acts primarily as a finding and evaluation tool to help practitioners find the tools they need to preserve digital data. COPTR aims to collate the knowledge of the digital preservation community on preservation tools in one place. It was initially populated with data from registries run by the COPTR partner organisations, including those maintained by the Digital Curation Centre, the Digital Curation Exchange, National Digital Stewardship Alliance, the Open Preservation Foundation, and Preserving digital Objects With Restricted Resources project (POWRR). COPTR captures basic, factual details about a tool, what it does, how to find more information (relevant URLs) and references to user experiences with the tool. The scope is a broad interpretation of the term "digital preservation". In other words, if a tool is useful in performing a digital preservation function such as those described in the OAIS model or the DCC lifecycle model, then it's within scope of this registry.

 

Case studies

OCLC Research Report - Preservation Health Check: Monitoring Threats to Digital Repository Content

http://www.oclc.org/research/themes/research-collections/phc.html

The OCLC Research Preservation Health Check activity was initiated by Open Planets Foundation. The Pilot used a sample of preservation metadata provided by the Bibliothèque Nationale de France. The report presents the preliminary findings of Phase 1 of the Pilot and suggests that there is an opportunity to use PREMIS preservation metadata as an evidence base to support a threat assessment exercise based on the Simple Property-Oriented Threat (SPOT) model.

Digital Preservation Planning Case Study

http://www.dpconline.org/component/docman/doc_download/863-2013-may-getting-started-london-planning-case-study-ed-fay

Presentation on getting started with digital preservation planning, including scoping, risk assessing and prioritising your collection (including legacy media examples), and staff roles and responsibilities. 2013 (20 pages).

 

 References

 

Consultative Committee for Space Data Systems, 2012. Reference model for an open archival information system (OAIS): Recommended practice (CCSDS 650.0-M-2: Magenta Book), CCSDS, Washington, DC. Available: https://public.ccsds.org/pubs/650x0m2.pdf

(Note this is a free to download version of ISO 14721:2012, Space Data and Information Transfer Systems – Open Archival Information System (OAIS) – Reference Model, 2nd edn).

Lavoie, B., 2014. The Open Archival Information System (OAIS) Reference Model: Introductory Guide (2nd Edition) DPC Technology Watch Report 14-02 October 2014. Available: http://dx.doi.org/10.7207/TWR14-02

 

Read More

Preservation action

Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark

Introduction

 

We know by now that digital preservation is comprised of a series of challenges emanating from organisational, resourcing, managerial, cultural, and technical issues. This section of the Handbook will focus specifically on actions that can be taken to help mitigate the technical challenges of preserving digital materials over time.

 

Technological obsolescence of formats

 

Technological obsolescence has long been considered a significant challenge of long term digital preservation. However in recent years studies have suggested that format obsolescence isn't always as prevalent as previously feared (Rosenthal 2015a, Jackson 2012). It is one issue that must be recognised and countered if digital materials are to survive over generations of technological change but It is certainly not the only challenge. Many established file formats are still with us, still supported, and still usable. It is quite likely that the majority of file formats you deal with will be commonly understood and well supported rather than obsolete.

A simple definition of obsolescence is the process of becoming outdated or no longer used. When talking about technological obsolescence, we refer for example to 'this Wordperfect 3.1 software is obsolete' or 'this BBC Micro computer is obsolete'. The exact moment at which obsolescence occurs can be difficult to pinpoint, particularly for materials that have only recently become obsolete. For example, just because the original application (e.g. MS Word) no longer supports a given format, it doesn't mean no other software that can read the format is unavailable. Similarly one institution may continue to use and maintain a piece of legacy software long after others have upgraded to new versions. It is perhaps therefore more useful to talk about 'institutional obsolescence', namely that the technology in question is no longer in use or easily accessed by a particular institution.

Obsolescence is an issue because all files have their own hardware and software dependencies. This was particularly the case in the early days of computing.

Change becomes an issue when it compromises the meaning of the content or its interpretation by a user. A core goal of digital preservation actions is to preserve the integrity and authenticity of the material being preserved, despite these generational changes in computing technology. In the next section we will discuss some common strategies to help minimise these changes.

 

Preservation strategies

 

In this section we review the technical strategies that can be employed to preserve digital information. After a flurry of activity in the late 1990s there has been relatively little progress in finding new strategies, though there has been significant research and development into varying implementation options and supporting technologies such as quality assurance, digital forensics (see Digital forensics), and technical representation information registries (see Technical solutions and tools in the Handbook). The techniques we will cover here are:

  • Format Migration
  • Emulation
  • Computer Museums

 

Format migration

 

Format Migration is one of the most widely utilised preservation strategies employed and most digital preservation systems contain functionality or system data that assumes a migration solution. Format migration is different from storage media migration. It involves transferring or transforming (i.e. migrating) data from an ageing/obsolete format to a new format, possibly using new applications systems at each stage to interpret information. Moving from one version of a format standard to a later standard is a version of this method; for example moving from MS Word Version 6 (from 1993) to MS Word for Windows 2010. For frameworks and tools that are helpful for evaluating technical obsolescence of file formats see File formats and standards.

Format migration, like any intervention that has the potential to change the structure and content of data, can introduce errors and loss of information. Therefore, it is important to define metrics to measure possible loss of information and use these to do tests on the correctness and quality of format migration.

Recent work touching on quality assurance and digital preservation actions includes the work of the AQUA, SPRUCE, and SCAPE projects. To measure error rates, it is necessary to determine some very specific metrics. You might need to define what you count as an error and whether you weight some errors as being more important than others. This depends on the context/content of the record and what characteristics of the material are deemed 'significant' to preserve, as well as the migration tools and successive formats used in any migration pathway.

Some practical issues involved in this process include when to migrate – is it better to migrate from generation to generation, or should some generations be skipped? You will need to keep a record of all transformations, their results and to document detected losses of information so as to maintain evidence of authenticity and authority. PREMIS can be a useful tool for this - see the Handbook section on Metadata and documentation for more information about this standard. It is good practice always to retain the original file format as deposited to return to if required.

 

Emulation

 

Emulation offers an alternative solution to migration that allows archives to preserve and deliver access to users directly from original files. This technique attempts to preserve the original behaviours and the look and feel of applications, as well as informational content. It is based on the view that only the original programme is the authority on the format and this is particularly useful for complex objects with multiple interdependencies, such as games or interactive apps.

An emulator, as the name implies, is a programme that runs on a current computer architecture but provides the same facilities and behaviour as an earlier one. This approach has been endorsed by a number of heritage organisations, often in collaboration with technical experts and in recent years there has been some notable success in implementing emulation solutions for cultural heritage (see Resources below) . However some significant challenges remain, not least there are often rights issues associated with software licensing that need to be resolved (Rosenthal 2015b).

A particular benefit of emulation is that a single solution can be deployed to provide access to a large number of objects, so long as all those objects require delivery on the same operating system or hardware stack. Use of legacy computing equipment may however prove difficult for users, though they will almost certainly be accessing an 'authentic' representation of the records. Of course emulators have to be built and maintained, requiring a pool of expertise to be available and this cannot always be assumed. New emulators will be needed as computer architectures become obsolete, and both of these present costs and resource needs.

 

Computer museums

 

This methodology proposes the keeping of computers and their systems software (operating systems, drivers, etc.) as well as the data and applications programmes. Effort must be expended to keep all platforms in good order, and to retain all the knowledge necessary to maintain and use the machines and their programmes. The idea relies on having a source of spare parts too, but these will dwindle, as will pools of expertise. Hence this strategy tends to be an interim measure rather than a long-term solution. Some formal museums do exist, such as the Computer History Museum in California and the Centre for Computing History in Cambridge. These typically maintain machines in working order though do not provide preservation services. See also the Legacy media section of the Handbook for further information on historic file formats and media.

 

Implementation

 

The DPC Technology Watch Reports are a particularly useful guide to most common genres and file formats (including email, social media, Audio-Visual, eBooks, e-Journals, GIS, CAD, web archiving etc.) and show which strategies tend to be used most commonly in each of these areas. Tools to assist with implementation of preservation strategies are discussed in the Technical solutions and tools area of the Handbook particularly in File formats and standards.

 

 Resources

DPC Technology Watch Report series

http://www.dpconline.org/publications/technology-watch-reports

The DPC Technology Watch Report series is intended as an advanced introduction to specific issues for those charged with establishing or running services for long term access. They identify and track developments in IT, standards and tools which are critical to digital preservation activities. They are commissioned by experts on these developments and are thoroughly scrutinised by peers before being released.

Emulation & Virtualization as Preservation Strategies

https://mellon.org/media/filer_public/0c/3e/0c3eee7d-4166-4ba6-a767-6b42e6a1c2a7/rosenthal-emulation-2015.pdf

This 2015 report on Emulation and Virtualization as Preservation Strategies by David Rosenthal was funded by the Mellon Foundation, the Sloan Foundation and IMLS. It concludes recent developments in emulation frameworks make it possible to deliver emulations to readers via the Web in ways that make them appear as normal components of Web pages. This removes what was the major barrier to deployment of emulation as a preservation strategy. Barriers remain, the two most important are that the tools for creating preserved system images are inadequate, and that the legal basis for delivering emulations is unclear, and where it is clear it is highly restrictive. Both of these raise the cost of building and providing access to a substantial, well-curated collection of emulated digital artefacts beyond reach. If these barriers can be addressed, emulation will play a much greater role in digital preservation in the coming years. (37 pages).

Systematic planning for digital preservation: evaluating potential strategies and building preservation plans

http://www.ifs.tuwien.ac.at/~becker/pubs/becker-ijdl2009.pdf

This article published in 2009 describes a systematic approach for evaluating potential alternatives for preservation actions and building thoroughly defined, accountable preservation plans for keeping digital content alive over time. The work was undertaken as part of the Europran Union-funded PLANETS project . (25 pages).

File format conversion

http://www.nationalarchives.gov.uk/documents/information-management/format-conversion.pdf

Format conversion may can help you maintain access and use of your information and mitigate risks that arise from obsolescence. This 2011 guidance from The National Archives gives you the steps you should go through in performing a file format conversion process. (29 pages).

What organizations are preserving software

http://qanda.digipres.org/1068/what-organizations-are-preserving-software

This post and responses from August 2015 on the Digital Preservation Q&A site provides a useful list and links for institutions preserving software for emulation strategies.

SCAPE Project Final best practice guidelines and recommendations

http://scape-project.eu/wp-content/uploads/2014/02/SCAPE_D20.6_KB_V1.0.pdf

This SCAPE project report published in 2014 covers three major areas: implementation of large-scale migration as a preservation strategy. Other areas are preservation of research data; and Bit preservation. (127 pages).

 

Case studies

The Internet Arcade

https://archive.org/details/internetarcade

The Internet Arcade is a web-based library of arcade (coin-operated) video games from the 1970s through to the 1990s from the Internet Archive, implemented using an in-browser emulation solution to provide access to the collection.

Rhizome

http://rhizome.org/editorial/2015/apr/17/theresa-duncan-cd-roms-are-now-playable-online/

In the 1990s, Theresa Duncan and collaborators made three videogames that exemplified interactive storytelling at its very best. Two decades later, the works (like most CD-ROMs) have fallen into obscurity. This online exhibition, co-presented by Rhizome and the New Museum brings them back, making them playable online via emulation.

Assessing Migration Risk for Scientific Data Formats

http://www.ijdc.net/index.php/ijdc/article/view/202/271

This paper explore a simple hypothesis – that, where migration paths exist, the majority of scientific data files can be safely migrated leaving only a few that must be handled more carefully – in the context of several scientific data formats that are or were widely used. The approach is to gather information about potential migration mismatches and, using custom tools, evaluate a large collection of data files for the incidence of these risks. The results support the initial hypothesis, though with some caveats.

Portico - Preservation Step-by-Step

http://www.portico.org/digital-preservation/services/preservation-approach/preservation-step-by-step

A useful step by step guide to the preservation planning and migration strategies employed by Portico The preservation plan may include an initial migration of the packaging or files in specific formats (for example, Portico migrates publisher specific e-journal article XML to the NLM archival standard).

Trash to treasure: Retro computer, software collection helps National Library access digital pieces

http://www.abc.net.au/news/2015-06-20/collecting-retro-computer-technology-to-save-digital-treasures/6560494

The National Library of Australia made public its own efforts to develop a collection of legacy computing hardware and software. It uses it to support data recovery and then implements other preservation strategies and does not rely on the computer museum for long-term preservation.

 

 References

David Rosenthal, 2015a. "The Prostate Cancer of Preservation" Re-examined. Available: http://blog.dshr.org/2015/09/the-prostate-cancer-of-preservation-re.html

David Rosenthal, 2015b. Emulation & Virtualization as Preservation Strategies. Available: https://mellon.org/media/filer_public/0c/3e/0c3eee7d-4166-4ba6-a767-6b42e6a1c2a7/rosenthal-emulation-2015.pdf

Andrew N. Jackson, 2012. Formats over Time: Exploring UK Web History,. Available: http://arxiv.org/abs/1210.1714

 

Read More

Access

Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark

 

Introduction

 

There has always been a strong link between preservation and access. The major objective of preserving the information content of traditional resources is so that they can remain accessible for both current and future generations. Preserving access to digital objects is the key objective of digital preservation programmes but requires more active management throughout the lifecycle of the resource before it can be assured. It is, therefore, essential to consider issues important to access provision from the beginning of the preservation process, ideally as early as the acquisition phase. This is represented within the Decision Tree for Selection of Digital Material for Long-Term Retention included within the Acquisition and appraisal section of this handbook. With this in mind this section aims to identify the main issues that must be considered, the decisions that should be made when planning for access provision and how these may impact on preservation more generally.

 

Understanding users

 

Understanding potential users is essential when planning for the provision of access to digital objects as well as being a key consideration of broader digital preservation activities. The importance of such work is perhaps most evident in the focus on 'Designated Communities' within the Open Archival Information System Reference Model. Knowledge gained about these potential users will inform decisions made throughout the lifecycle but will likely hold most weight when choosing suitable access delivery solutions, balanced with resource and technological considerations. It is important to approach the identification of user communities and their needs systematically and objectively. In short, understanding what users want to do and what functionality can be provided by the repository.

The methodology used for the gathering of this information will vary depending on the organisational context. Potential options and tools may include the following:

  • Analysis of current usage (access requests for both physical and digital objects, website statistics etc.)
  • Surveys
  • Focus groups
  • Interviews
  • Use cases
  • Task analysis

When carrying out user analysis it is important to consider both existing users and non-users. Although interaction with non-users is inherently more difficult this can be a useful process towards understanding current barriers to use as well as identifying potential new market sectors.

Once collected this information should be used to inform decisions that are made in relation to the implementation of access delivery solutions. It is also important to continue to monitor the development of user communities and this should be incorporated in the standard Preservation planning activities within your organisation.

 

Access formats

 

A key consideration when planning for access is the format in which the digital objects will be delivered to the users. While there is a strong link between preservation and access in terms of the overriding objective of a digital preservation programme, there is also a need to make a clear distinction between them. There may be a combination of technical, legal, and pragmatic reasons to separate the access copy from the preservation copy, so it may be desirable or even necessary, to deliver an access copy of the digital object to the user in a different format from that held within the preservation system's storage. Indeed, an organisation may wish to offer different 'flavours' of format depending on the needs of the particular user or community in question. When selecting formats for access there are several questions an organisation will need to consider, these may include the following:

  • What is the mostly commonly used/widely supported format for the object type?
  • Will users have access to free viewers/software that support the proposed file type?
  • What file size is produced and what are the implications for delivery to the user?
  • Is the format easy to use?
  • Will users require guidance or supporting documentation to allow them to access/use the objects?
  • Does the organisation have separate user communities with different requirements for access?

See also File formats and standards for details of common preservation and access formats.

 

Legal issues for access

 

There are a variety of different legal issues that will probably need to be addressed when providing access to digital objects that will affect both the technological solutions that are deployed as well as who can access the material and when. This is one of the main access considerations that overlaps with acquisitions, as mentioned above, and it is essential that the correct information is gathered at that time to facilitate access requirements later in the life cycle. Without this information it may not be possible to properly manage access and may open the organisation to a number of potential legal risks.

Legal issues to be considered will include:

  • Restrictions of use relating sensitivity and data protection
  • Agreed embargoes on content where early access may represent a breach of contract
  • Management of intellectual property rights, e.g. copyright

Management of IPR, in particular, should be aligned with the acquisition process with careful consideration given to transfer and ownership agreements and copyright licences put in place at that time. Licences must clearly state permitted access and reuse permissions, including third party licensing. These must then be clearly represented in policy and procedures for access, whether managed through a rights management system or by other methods.

 

Forms of access provision

 

The final key decisions an organisation must make are in the form of:

  • Policy
  • Procedure
  • Free or charged services
  • Online/Offline access, and the access environment provided
  • Access for the disabled
  • Storage and security

If the access copy is the only copy of a digital resource, then the danger of loss from theft or damage is clearly very high. If this approach is taken a risk assessment needs to be undertaken consisting of some of the following questions (See also Acquisition and appraisal and Storage):

 

Conclusions

 

Access is closely linked to many other digital preservation issues and technologies covered in the Handbook. In particular you may wish to look at Institutional policies and strategies, Legal compliance, Metadata and documentation, Acquisition and appraisal, Storage, Legacy media, File formats and standards, and Information security.

 

Resources

Born-Digital Access in Archival Repositories: Mapping the Current Landscape, Preliminary Report August 2015

https://docs.google.com/document/d/15v3Z6fFNydrXcGfGWXA4xzyWlivirfUXhHoqgVDBtUg/preview?sle=true#

This interesting document represents preliminary findings and analysis of a study and survey on current born-digital access practices in over 200 cultural heritage institutions. Respondents were primarily from the USA.

Reference model for an open archival information system (OAIS): Recommended practice (CCSDS 650.0-M-2: Magenta Book), Consultative Committee for Space Data Systems 2012

https://public.ccsds.org/pubs/650x0m2.pdf

This was later published as ISO 14721:2012, Space Data and Information Transfer Systems – Open Archival Information System (OAIS) – Reference Model, 2nd edition. The Access function within OAIS manages the processes and services by which consumers – and especially the Designated Community – locate, request, and receive delivery of items residing in the OAIS's archival storage. As such, it is the primary mechanism by which the archive meets its responsibility to make its information available to its user community. (135 pages).

Adrian Brown 2013 Practical Digital Preservation a how-to guide for organizations of any size

Chapter 9 (28 pages) of this book is devoted to the topic of providing access to users.

Community Owned digital Preservation Tool Registry COPTR

http://www.digipres.org/tools/

There are a large number of tools for access or that have access functionality incorporated in them. The Handbook recommends searching for them via the POWRR Grid tool within COPTR. The POWRR Tool Grid provides a set of interactive views designed to help practitioners identify and select tools that they need to solve digital preservation challenges. The Access, Use and Reuse column of the Grid identifies access tools for specific types of content or generic tools and systems that have access functions. Everything in the Grid is hyperlinked, so simply click through the displays until you find the information you are looking for. Clicking on the name of a specific preservation tool will reveal more detail on the COPTR wiki, which is where you should go to expand or update the tools information.

AIMS Born-Digital Collections: An Inter-Institutional Model for Stewardship. University of Hull, Stanford University, University of Virginia, and Yale University (2012)

http://dcs.library.virginia.edu/files/2013/02/AIMS_final.pdf

The AIMS (An Inter-Institutional Model for Stewardship) framework is a methodology for stewarding born-digital materials. It is divided into four main sections for high-level best practices for born-digital workflows: collection development, accessioning, arrangement and description, and discovery and access. Access primarily focuses on redaction and sensitive information. The appendices include, for example, sample processing workflow diagrams, an analysis of tools, and donor surveys. (195 pages).

 

Case studies

TNA case studies: Online access

http://www.nationalarchives.gov.uk/archives-sector/online-access.htm

a series of nine case studies published by TNA on how collections have been made more accessible by putting records online. They are drawn from a wide variety of archives.

Codebreakers: makers of modern genetics

https://digirati.com/work/galleries-libraries-archives-museums/case-studies/wellcome-library/

A case study by digirati, the developers of the Wellcome Trust Library player focussing on the player its use in accessing the Francis Crick collection.The Wellcome Library's digital player is freely available for anyone to download and use. The player can be used to display all types of digital content, including cover-to-cover books, archives, works of art, videos and audio files. The software can be downloaded from the Wellcome Library GitHub account (https://github.com/wellcomelibrary/player).

Managing Risk with a Virtual Reading Room: Two Born Digital Projects, Michelle Light

http://digitalscholarship.unlv.edu/cgi/viewcontent.cgi?article=1462&context=lib_articles

Between 2010 and 2013, the University of California, Irvine, launched a site to provide online access to the personal papers of Richard Rorty and Mark Poster in the form of a virtual reading room. The virtual reading room mitigated the risks involved in providing this kind of access to personal, archival materials with privacy and copyright issues by limiting the number of qualified users and by limiting the discoverability of full-text content on the open web. The case study goes through each phase of research and thinking, including comparable projects happening at other institutions and lessons learned in a very open and informative way.

From Accession to Access: A Born-Digital Materials Case Study, by Cyndi Shein Journal of Western Archives Volume 5 Issue 1 (2014): 1-42

http://digitalcommons.usu.edu/cgi/viewcontent.cgi?article=1036&context=westernarchives

Between 2011 and 2013 the Getty Institutional Records and Archives made its first foray into the comprehensive ingest, arrangement, description, and delivery of unique born-digital material when it received oral history interviews generated by some of thePacific Standard Time: Art in L.A. project partners. This case study touches upon the challenges and affordances inherent to this hybrid collection of audiovisual recordings, digital mixed-media files, and analog transcripts. It describes the Archives' efforts to develop a basic processing workflow that applies the resource-management strategy commonly known as "MPLP" in a digital environment, while striving to safeguard the integrity and authenticity of the files, adhere to professional standards, and uphold fundamental archival principles. The study describes the resulting workflow and highlights a few of the inexpensive technologies that were successfully employed to automate or expedite steps in the processing of content that was transferred via easily-accessible media and consisted of current file formats.

Read More

Metadata and documentation

 

Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark

 

Introduction

 

This section provides a brief novice to intermediate level overview of metadata and documentation, with a focus on the PREMIS digital preservation metadata standard. It draws on the 2nd edition of the DPC Technology Watch Report on Preservation Metadata. The report itself discussies a wider range of issues and practice in greater depth with extensive further reading and advice (Gartner and Lavoie, 2013). It is recommended to readers who need a more advanced level briefing.

Metadata is data about a digital resource that is stored in a structured form suitable for machine processing. It serves many purposes in long-term preservation, providing a record of activities that have been performed upon the digital material and a basis on which future decisions on preservation activities can be made in the future, as well as supporting discovery and use. The information contained within a metadata record often encompasses a range of topics. There is no clear line between what is preservation metadata and what is not, but ultimately the purpose of preservation metadata is to support the goals of long-term digital preservation, which are to maintain the availability, identity, persistence, renderability, understandability, and authenticity of digital objects over long periods of time.

Documentation is the information (such as software manuals, survey designs, and user guides) provided by a creator and the repository that supplements the metadata and provides enough information to enable the resource's use by others. It is often the only material providing insight into how a digital resource was created, manipulated, managed and used by its creator and it is often the key to others to make informed use of the resource.

There are a number of factors which make metadata and documentation particularly critical for the continued viability of digital materials and they relate to fundamental differences between traditional and digital resources:

  • Technology. Digital resources are dependent on hardware and software to render them intelligible. Technical requirements need to be recorded so that decisions on appropriate preservation and access strategies may be made.
  • Change. While traditional materials may be preserved by predominantly passive preventive preservation programmes, digital materials will be subject to repeated actions, and there will be many different operators and quite possibly different institutions influencing the management of digital materials over a prolonged period of time. Recording actions taken on a resource and changes occurring as a result will provide a key to future managers and users of the resource.
  • Authenticity. Metadata and documentation may be the major, if not the only, means of reliably establishing the authenticity of material following changes.
  • Rights management. While traditional resources may or may not be copied as part of their preservation programme, digital resources must be copied if they are to remain accessible. Managers need to know that they have the right to copy for the purposes of preservation, what (if any) technologies have been used to control rights management and what (if any) implications there are for controlling access.
  • Future re-use. It may not be possible for others to use the material without adequate documentation.
  • Cost. It is expensive to create metadata manually and preservation metadata may not always be easily generated automatically. Additional metadata for digital preservation needs therefore requires careful cost/benefit trade-offs.

 

The PREMIS (PREservation Metadata: Implementation Strategies) Standard

 

PREMIS (PREservation Metadata: Implementation Strategies) is the international standard for metadata to support the preservation of digital objects and ensure their long-term usability. Developed by an international team, PREMIS is implemented in digital preservation projects around the world, and support for PREMIS is incorporated into a number of commercial and open-source digital preservation tools and systems.

The PREMIS Data Dictionary (PREMIS, 2013) is organized around a data model consisting of five entities associated with the digital preservation process:

  1. Intellectual Entity - a coherent set of content that is described as a unit: e.g., a book
  2. Object - a discrete unit of information in digital form, e.g., a PDF file
  3. Event - a preservation action, e.g., ingest of the PDF file into the repository
  4. Agent - a person, organization, or software program associated with an Event, e.g., the publisher of a PDF file
  5. Rights - one or more permissions pertaining to an Object, e.g., permission to make copies of the PDF file for preservation purposes

Taken together, the semantic units defined in the PREMIS Data Dictionary represent the 'core' information needed to support digital preservation activities in most repository contexts. However, the concept of 'core' in regard to PREMIS is loosely defined: not all of the semantic units are considered mandatory in all situations, and some are optional in all situations. The Data Dictionary attempts to strike a balance between recognizing that there will be a significant overlap of metadata requirements across different repository contexts, while at the same time acknowledging that all contexts are different in some way, and therefore their respective metadata requirements will rarely be exactly the same.

 

Implementation

 

Although the PREMIS Data Dictionary is not a formal standard, in the sense of being managed by a recognized standards agency, it has achieved the status of the accepted standard for preservation metadata in the digital preservation community. A strength but also a limitation of the PREMIS Data Dictionary is that it must be tailored to meet the requirements of the specific context; it is not an off-the-shelf solution in the sense that an archive simply implements the Data Dictionary wholesale. Only a portion may be relevant in some digital preservation circumstances; alternatively, the repository may find that additional information beyond what is defined in the Dictionary is needed to support their requirements. For example, the Data Dictionary makes no provisions for documenting information about a repository's business/policy dependencies, which may be needed to support preservation decision-making.

In short, each repository will need to invest some effort to adapt preservation metadata and documentation standards to its particular circumstances and requirements.

During implementation an institution normally identifies its own minimum standard of information required for catalogued items in the collection. Each institution can also identify its preferred levels of metadata and documentation for acquisitions and may notify and encourage suppliers or depositors to supply this information. Staff review and revise supplied information to ensure it conforms to institutional guidelines and they generate catalogue records for deposited data incorporating cataloguing and documentation standards to ensure that information about those items can be made available to users through appropriate catalogues. In many cases the contextual information for resources will be crucial to their future use and this aspect of documentation should not be overlooked.

The level of cataloguing and documentation accompanying or subsequently added to an item, and any limitations these may impose, can be documented for the benefit of future users. Where data resources are managed by third parties but made available via an institution, information may be supplied by the third party in an agreed form which conforms to institution guidelines or in the supplier's native format.

Where a need for enhanced access exists, an Institution may undertake to enhance documentation and cataloguing information to a higher standard to meet new requirements. Retrospective documentation or catalogue enhancement should also occur when the validation or audit of the documentation and cataloguing for a resource shows this to be below a minimum acceptable standard.

A significant number of both users and suppliers of preservation metadata have adopted PREMIS and many of the initial obstacles to implementation have been addressed by them. The process of implementing PREMIS in a working environment is made easier by a number of tools which can extract metadata from digital objects and output PREMIS XML. The PREMIS Maintenance Activity maintains a webpage listing the most important tools available for use with PREMIS. It also includes an active email discussion list and a wiki for sharing documents. For further information see Resources and case studies below.

See also related sections of the Handbook including Acquisition and appraisal, and Preservation planning.

 

Resources

PREMIS Data Dictionary for Preservation Metadata, Version 3.0

http://www.loc.gov/standards/premis/v3/index.html

The PREMIS Data Dictionary and its supporting documentation is a comprehensive, practical resource for implementing preservation metadata in digital archiving systems. The Data Dictionary is built on a data model that defines five entities: Intellectual Entities, Objects, Events, Rights, and Agents. Each semantic unit defined in the Data Dictionary is a property of one of the entities in the data model. Version 3.0 was released in June 2015 (273 pages).

Preservation Metadata (2nd edition), DPC Technology Watch Report

http://dx.doi.org/10.7207/twr13-03

This report focuses on new developments in preservation metadata made possible by the emergence of PREMIS as a de facto international standard. It focuses on key implementation topics including revisions of the Data Dictionary; community outreach; packaging (with a focus on METS), tools, PREMIS implementations in digital preservation systems, and implementation resources. Published in 2013 (36 pages).

Tools for preservation metadata implementation

http://www.loc.gov/standards/premis/tools_for_premis.php

The PREMIS Maintenance Activity maintains a webpage listing the most important tools available for use with PREMIS. This contains entries on tools, in addition to pointers to others which may be used to generate METS (Metadata Encoding and Transmission Standard - an XML schema for packaging digital object metadata) files in conjunction with PREMIS. The majority of the tools listed are for extracting technical metadata from digital objects and converting it for encoding within the PREMIS Object entity. Others can be used for checking formats, or validating files against checksums

PREMIS website

http://www.loc.gov/standards/premis/index.html

The PREMIS Editorial Committee coordinates revisions and implementation of the PREMIS standard, which consists of the Data Dictionary, an XML schema, and supporting documentation. The PREMIS Implementers' Group forum, hosted by the PREMIS Maintenance Activity, includes an active email discussion list and a wiki for sharing documents. The wiki is a particularly useful resource for new implementers, as it includes materials from PREMIS tutorials, a collection of examples of PREMIS usage and links to information on PREMIS tools. The PREMIS Maintenance Activity maintains an active registry of PREMIS implementations.

Documenting your data

http://www.data-archive.ac.uk/create-manage/document

An excellent set of resources to assist researchers with the documention and metadata for their research studies, drawn together by the UK Data Archive.

Archaeology Data Service Guidelines for Depositors

http://archaeologydataservice.ac.uk/advice/guidelinesForDepositors

The ADS Guidelines for Depositors provide guidance on how to correctly prepare data and compile metadata for deposition with ADS and describe the ways in which data can be deposited. There is also a series of shorter summary worksheets and checklists covering: data management; selection and retention; preferred file formats and metadata. Other resources for the use of potential depositors include a series of Guides to Good Practice, which complement the ADS Guidelines and provide more detailed information on specific data types.

 

Case studies

DPC case note: British Library ASR2 using METS to keep data and metadata together for preservation

http://www.dpconline.org/component/docman/doc_download/474-casenoteasr2.pdf

This Jisc-funded case study examines the 'Archival Sound Recordings 2' project from the British Library, noting that one of the challenges for long term access to digitised content is to ensure that descriptive information and digitised content are not separated from each other. The British Library has used a standard called METS to prevent this. July 2010 (4 pages).

Designing Metadata for Long-Term Data Preservation:DataONE Case Study

https://www.asis.org/asist2010/proceedings/proceedings/ASIST_AM10/submissions/435_Final_Submission.pdf

A short description of how PREMIS was utilized to specify the requirements for preservation metadata for DataONE (Data Observation Network for Earth) science data. 2010 (2 pages).

Preservica Case Study: Q&A with Glen McAninch, Kentucky Department for Libraries and Archives

http://preservica.com/resource/qa-glen-mcaninch-kentucky-department-libraries-archives/

Glen McAninch discusses the Importance of Provenance, Context and Metadata in Preserving Digital Archival Records.

PREMIS Implementations Registry

http://www.loc.gov/standards/premis/registry/index.php

The PREMIS Maintenance Activity maintains an active registry of over 40 PREMIS implementations with details of the repository and its use of PREMIS. Although not formally case studies, entries have details of practical experience e.g., Creating a digital repository at the Swedish National Archives using PREMIS.

 

References

 

Gartner, R. and Lavoie, B., 2013. Preservation Metadata (2nd edition), DPC Technology Watch Report 13-3 May 2013. Available: http://dx.doi.org/10.7207/twr13-03

PREMIS, 2013. Data Dictionary for Preservation Metadata, Version 3.0. Available: http://www.loc.gov/standards/premis/v3/index.html

Read More

Technical solutions and tools

Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark

 

Who is it for?

Operational managers (DigCurV Manager Lens) and staff (DigCurV Practitioner Lens) in repositories, publishers and other data creators, third party service providers.

 

Assumed level of knowledge

Novice to Intermediate.

 

Purpose

  • To focus on technical tools and applications that support digital preservation: software, applications, programs and technical services.
  • To consider the practical deployment of preservation techniques and technologies whether as relatively small and discrete programs (like DROID) or enterprise wide solutions that integrate many tools.
  • This section excludes other more strategic or policy issues and standards that are sometimes described as tools: these are covered elsewhere in the Handbook.

  Download a PDF of this section.

Read More

Tools

Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark

 

A beginner's guide to digital preservation tools

 

The utility of technical tools for digital preservation depends on the context of their deployment. A community recommendation may be strong but if it does not align with your specific function or organisational context then there is a significant chance that the tool will fail to perform. So before selecting digital preservation tools it is important to consider carefully the technical workflow and institutional setting in which they are embedded. A practical example of this has been presented by Northumberland Estates who developed a straightforward evaluation framework to assess tools in context.

An alternative way to consider this topic is to review the extent to which any given tool will deliver preservation actions arising from an agreed preservation plan, which in turn derives from a given policy framework.

 

Thinking about digital preservation tools

 

The following issues are frequently encountered in the process of deploying digital preservation tools. This is not a comprehensive list but consideration of these issues will help sensible and realistic choices.

Open source versus commercial software

Some organizations - often in higher education and especially institutional research repositories - are comfortable with the use of open source software, especially where they have an in-house group of developers. 'Open source' software is where the underlying code is made available for free, enabling a free flow of additions, amendments or development. Other organizations which don't have easy access to developers, tend to have procurement rules that prefer 'off-the shelf' commercial solutions backed by on-going support contracts. The distinction between Open Source versus Commercial software is often over-stated because both influence each other. Nonetheless you may need to consider your organization's norms and culture while you select tools.

Enterprise-level solutions versus micro-services

Some digital preservation tools are designed to offer 'soup to nuts' solutions, meaning that they provide an integrated end-to-end process that enables all (or most) digital preservation functions to be delivered for a whole organisation. In fact enterprise-level solutions are most often constructed by aggregating individual tools integrated into a single interface. The solution to any given problem might be relatively simple and your organisation may be happy assembling a series of small tools for discrete functions. This encourages rapid progress and is helpful with testing and trialling tools; but it can be hard to maintain over an extended period. In other organisations there is much tighter control over the deployment of software and an expectation that solutions are built across an entire workflow - requiring comprehensive solutions. This can be slower to respond but can be more sustainable in the long term. Before selecting a tool it is helpful to consider where on this spectrum your organization normally sits.

Describing workflows

A key consideration for tools is where they sit on an overall workflow so before selecting tools it helps to consider and map out the entire workflow. Being explicit about a workflow can also help identify redundant processes as well major bottlenecks. One frequent challenge is that tools solve a problem in one element of a workflow, only to create a problem elsewhere. In addition, organisations may have multiple workflows that may have different requirements that conflict in some way. Describing a workflow therefore provides a basis for anticipating difficulties and can provide a roadmap for ongoing development.

Specifying clear requirements

In order to evaluate the usefulness and value to your organisation of the many tools available it helps to have an explicit statement of requirements. Tools can be compared and benchmarked transparently and decisions justified accordingly. Properly executed, requirements-gathering activities can involve a range of stakeholders and therefore maximise the potential for alignment and efficiency, achieving wider strategic and organisational objectives.

Changing and evolving requirements

It is normal for requirements to change through time. Indeed digital preservation is largely concerned with meeting the challenges associated with inevitable changes in technology. So it is necessary to monitor and review tools to ensure that they remain fit for purpose and that any changes in requirements are made explicit. A periodic review of the specification of requirements is recommended.

Sustainability of tools and community participation

An important consideration in any decision over the tools you use for digital preservation is the sustainability element. Sustainability in terms of tools may include an active user base, support, and development. For instance, a large user base, both in terms of commercial and open source providers can be a vital indicator for identifying a viable tool.It's worth noting that a community can change rapidly and for reasons that might not be easily predicted. 'New kids on the block' can quickly become mainstream while large communities can dwindle as quickly as new technologies overtake existing ones. Consequently it may be necessary to monitor the health of the developer community supporting your tools.

 

Finding digital preservation tools: tools and tools registries

 

One of the welcome features of digital preservation in the last two decades has been the rapid development of software, tools and services that enhance and enable digital preservation workflows. As the digital preservation community has grown in size and sophistication so our tools have become more powerful and more refined. This proliferation and increased specialism can also act as a barrier to deployment: especially when tools have been the product of relatively short lived research projects with limited reach. Consequently the diversity of tools can seem increasingly bewildering to new users, while the route to market for developers is increasingly complicated.

Tools registries have emerged in recent years as a way to help users find tools that they need. A number of registries now exist that describe digital preservation tools. Depending on the interests of the people behind them, they can also provide detailed descriptions, reviews or comments about tools from the wider community. So they are not just helpful for users: by allowing experts to review tools and assess their performance they signpost strengths and weaknesses and provide a basis for future development; by connecting tools to users they help developers reach a much wider audience and get feedback to improve their tools.

Registries are a common way for the digital preservation community to share information. Other types of registries exist such as 'format registries' that outline the performance of given file formats, or 'environment registries' that describe the technology stack necessary to create an execution environment to emulate or virtualize software. These are covered elsewhere in the Handbook.

 

Too many registries?

 

While registries are a good way to manage the proliferation of tools, it is now recognised that a proliferation of registries is also a potential barrier to use. The COPTR registry was designed specifically to address this problem, drawing on data from multiple sources including DCC, POWRR, and the Library of Congress.

 

Practical support and guidance

 

Having considered some of the tools registries and digital preservation tools that are available to organisations, the next question that often arises is which one to choose that fits your organisational purpose. First and foremost it is important that your selection is aligned to organisational need and strategic direction; the resources and case studies below provide evaluation tools and advice to support successful implementation.

 

Resources

Tool registries

Community Owned digital Preservation Tool Registry COPTR

http://coptr.digipres.org/Main_Page

COPTR describes tools useful for long term digital preservation and acts primarily as a finding and evaluation tool to help practitioners find the tools they need to preserve digital data. COPTR aims to collate the knowledge of the digital preservation community on preservation tools in one place. It was initially populated with data from registries run by the COPTR partner organisations, including those maintained by the Digital Curation Centre, the Digital Curation Exchange, National Digital Stewardship Alliance, the Open Preservation Foundation, Preserving digital Objects With Restricted Resources project (POWRR) http://digitalpowrr.niu.edu/ listed below. COPTR captures basic, factual details about a tool, what it does, how to find more information (relevant URLs) and references to user experiences with the tool. The scope is a broad interpretation of the term "digital preservation". In other words, if a tool is useful in performing a digital preservation function such as those described in the OAIS model or the DCC lifecycle model, then it's within scope of this registry.

AV Preserve tools list

http://www.avpreserve.com/avpsresources/tools/

A list of tools of particular use in the long term preservation of audio visual materials, both digitised and born-digital.

Digital Curation Centre (DCC) tools and services list

http://www.dcc.ac.uk/resources/external/tools-services

The DCC is a centre of excellence, to support researchers in the UK tackling challenges for the preservation and curation of digital resources. To achieve this goal it offered a number of support and advisory services supported with targeted research and development. The former includes a catalogue of tools and services which categorises tools for researchers and curators. The information is also integrated in COPTR (see above).

DCH-RP registry

http://www.dch-rp.eu/index.php?en/137/registry-of-services-tools

The Digital Cultural Heritage Roadmap for Preservation (DCH-RP) tools registry collected and described information and knowledge related to tools, technologies and systems that can be applied for the purposes of digital cultural heritage preservation. Version 3 of the registry was created in 2014.

Inventory of FLOSS (Free/libre open-source software) in the cultural heritage domain

https://docs.google.com/spreadsheet/ccc?key=0Ag_7rVJwt0CpdFRJOEJxdEk4ZEMxQ01jaDgxQXFSTkE#gid=0

Produced by the EU funded Europeana Project, this inventory lists free open source software which may be of use in the cultural heritage sector. While not limited to digital preservation tools the inventory does contain information on a variety of tools with digital preservation applications, assessing their purpose, quality of documentation, level of support, license requirements and providing links to project information and source code. Background information on FLOSS is available on the Europeana site http://www.europeana.eu/portal/.

Library of Congress NDIIPP tools showcase

http://www.digitalpreservation.gov/tools/

The Library of Congress's digital preservation tools registry is a selective list of tools and services of interest to those working in digital preservation. It is no longer being actively maintained and content is integrated in COPTR (see above).

Preserving digital Objects With Restricted Resources (POWRR) Tool Grid

http://digitalpowrr.niu.edu/tool-grid/

POWRR investigated, evaluated, and recommended scalable, sustainable digital preservation solutions for organisations with relatively small amounts of data and/or fewer resources. A significant output of the project was the tool grid produced in early 2013 based on the OAIS Reference Model functional categories. An up to date version of the POWRR Tool Grid can now be generated in COPTR (see above).

Digital Preservation Q&A

http://qanda.digipres.org/

This is a site where you can post queries and answers to help each other make best use of tools, techniques, processes, workflows, practices and approaches to insuring long term access to digital information. Digital Preservation Q&A is currently moderated by representatives from NDSA and OPF member organizations.

Practical e-records

http://e-records.chrisprom.com/author/prom/

Software and Tools for Archivists blog from Chris Prom. Although some information may be several years old the blog provides a useful starting point for understanding the uses of a variety of tools for digital preservation and a standardised evaluation of the tools against set criteria, including ease of installation, usability, scalability etc. In addition to information on tools the blog contains a host of other useful resources, including policy and workflow templates, recommended approaches.

 

Case studies

Diary of a repository preservation project

http://blog.soton.ac.uk/keepit/

A record of progress (between April 2009 and September 2010) as the Jisc-funded KeepIt project tackled the challenges of preserving digital repository content in research, teaching, science and the arts. It includes helpful experience for assessing preservation tools.

Northumberland Estates

http://wiki.dpconline.org/index.php?title=Northumberland_estates_case_study

Northumberland Estates developed a straightforward evaluation framework to assess tools in context. The project set out to survey digital repository options currently available for small to medium organisations with limited resources. Note the recommendations reached in the final business case reflect the organisational needs of Northumberland Estates and may not align themselves with your own goals. The case study was prepared as part of the Jisc-funded SPRUCE project.

Read More

Fixity and checksums

Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark

Fixity

 

“Fixity, in the preservation sense, means the assurance that a digital file has remained unchanged, i.e. fixed.” (Bailey, 2014). Fixity doesn’t just apply to files, but to any digital object that has a series of bits inside it where that ‘bitstream’ needs to be kept intact with the knowledge that it hasn’t changed. Fixity could be applied to images or video inside an audiovisual object, to individual files within a zip, to metadata inside an XML structure, to records in a database, or to objects in an object store. However, files are currently the most common way of storing digital materials and fixity of files can established and monitored through the use of checksums.

 

Checksums

 

A checksum on a file is a ‘digital fingerprint’ whereby even the smallest change to the file will cause the checksum to change completely. Checksums are typically created using cryptographic techniques and can be generated using a range of readily available and open source tools. It is important to note that whilst checksums can be used to detect if the contents of a file have changed, they do not tell you where in the file that the change has occurred.

Checksums have three main uses:

    1. To know that a file has been correctly received from a content owner or source and then transferred successfully to preservation storage
    2. To know that file fixity has been maintained when that file is being stored.
    3. To be given to users of the file in the future so they know that the file has been correctly retrieved from storage and delivered to them.

This allows a ‘chain of custody’ to be established between those who produce or supply the digital materials, those responsible for its ongoing storage, and those who need to use the digital material that has been stored. In the OAIS reference model (ISO, 2012) these are the producers, the OAIS itself is the repository, and the consumers.

 

Application in digital preservation

 

If an organisation has multiple copies of their files, for example as recommended in the Storage section, then checksums can be used to monitor the fixity of each copy of a file and if one of the copies has changed then one of the other copies can be used to create a known good replacement. The approach is to compute a new checksum for each copy of a file on a regular basis and compare this with the reference value that is known to be correct. If a deviation is found then the file is known to have been corrupted in some way and will need replacing with a new good copy. This process is known as ‘data scrubbing’.

Checksums are ideal for detecting if unwanted changes to digital materials have taken place. However, sometimes the digital materials will be changed deliberately, for example if a file format is migrated. This causes the checksum to change. This requires new checksums to be established after the migration which become the way of checking data integrity of the new file going forward.

Files should be checked against their checksums on a regular basis. How often to perform checks depends on many factors including the type of storage, how well it is maintained, and how often it is being used. As a general guideline, checking data tapes might be done annually and checking hard drive based systems might be done every six months. More frequent checks allow problems to be detected and fixed sooner, but at the expense of more load on the storage system and more processing resources.

Checksums can be stored in a variety of ways, for example within a PREMIS record, in a database, or within a ‘manifest’ that accompanies the files in a storage system.

Tool support is good for checksum generation and use. As they are relatively simple functions, checksums are integrated into many other digital preservation tools. For example, generating checksums as part of the ingest process and adding this fixity information to the Archive Information Packages generated, or allowing manifests of checksums to be generated for multiple files and for the manifest and files to be bundled together for easy transport or storage. In addition md5sum and md5deep provide simple command line tools that operate across platforms to generate checksums on individual files or directories.

There are several different checksum algorithms, e.g. MD5 and SHA-256 that can be used to generate checksums of increasing strength. The ‘stronger’ the algorithm then the harder it is to deliberately change a file in a way that goes undetected. This can be important for applications where there is a need to demonstrate resistance to malicious corruption or alteration of digital materials, for example where evidential weight and legal admissibility is important. However, if checksums are being used to detect accidental loss or damage to files, for example due to a storage failure, then MD5 is sufficient and has the advantage of being well supported in tools and is quick to calculate.

The Handbook follows the National Digital Stewardship Alliance (NDSA) preservation levels (NDSA, 2013) in recommending four levels at which digital preservation can be supported through file fixity and data integrity techniques. Many of the benefits of fixity checking can only be achieved if there are multiple copies of the digital materials, for example allowing repair if integrity of one of the copies has been lost.

Level

Activity

Risks addressed and benefits achieved

1

 

  • Check file fixity on ingest if it has been provided with the content.
  • Create fixity info if it wasn’t provided with the content.
  • Corrupted or incorrect digital materials are not knowingly stored.
  • Authenticity of the digital materials can be asserted.
  • Baseline fixity established so unwanted data changes have potential to be detected.

2

 

  • Check fixity on all ingests
  • Use write-blockers when working with original media
  • Virus-check high risk content.
  • No digital material of unconfirmed integrity can enter preservation storage. Evidential weight supported for authenticity.
  • Assurance can be given to all content providers that their content has been safely received. Original media is protected.
  • No malicious content can enter preservation storage.

3

 

  • Check fixity of content held on preservation storage systems at regular intervals.
  • Maintain logs of fixity info and supply audit on demand.
  • Ability to detect corrupt data.
  • Virus-check all content.
  • Protection from wide range of data corruption and loss events. Problems with storage are detected earlier.
  • Data corruption or loss does not go undetected due to ‘silent errors’ or ‘undetected failures'. Digital materials are not in a state of ‘unknown’ integrity.
  • Ongoing evidential weight can be given that digital materials are intact and correct.

4

 

  • Check fixity of all content in response to specific events or activities
  • Ability to replace/repair corrupted data
  • Ensure no one person has write access to all copies.
  • Failure modes that threaten digital materials are proactively countered. All copies of digital materials are actively maintained.
  • Assurance to users of the integrity and authenticity of digital materials being accessed.
  • Effectiveness of preservation approach can be measured and demonstrated.
  • Compliance with standards, e.g. ISO 16363 Audit and certification of trustworthy digital repositories.

Write-blocking

Note that the National Digital Stewardship Alliance (NDSA) recommends the use of write-blockers at level 2. This is to prevent write access to media that digital materials might be on prior to being copied to the preservation storage system. For example, if digital material is delivered to an organisation on a hard disc drive or USB key then a write blocker would prevent accidental deletion of this digital material when the drive or key is read. Digital material might not be on physical media, e.g. it could be on a legacy storage server or delivered through a network transfer, e.g. an ftp upload. In these cases write blockers wouldn't apply and other measures would be used to make the digital material 'read only' on the source and hence immutable before confirmation that the digital material has been successfully transferred to preservation storage. Write blockers also don't exist for all types of media. If a write-blocker is applicable then the costs/skills required to use them should be balanced against the risk of damage to the original digital material or the need to have rigorous data authenticity. Therefore, some organisations might consider use of write blockers to be unnecessary or a level 3 or level 4 step.

 

Resources

Bailey, J., 2014, Protect Your Data: File Fixity and Data Integrity, The Signal, Library of Congress.

http://blogs.loc.gov/thesignal/2014/04/protect-your-data-file-fixity-and-data-integrity/

Checking Your Digital Content: What is Fixity and When Should I Be Checking It?

http://digitalpreservation.gov/ndsa/working_groups/documents/NDSA-Fixity-Guidance-Report-final100214.pdf?loclr=blogsig

Many in the preservation community know they should be checking the fixity of their content, but how, when and how often? This document published by NDSA in 2014 aims to help stewards answer these questions in a way that makes sense for their organization based on their needs and resources (7 pages).

AVPreserve Fixity Tool

http://www.avpreserve.com/tools/fixity/

MD5

https://tools.ietf.org/html/rfc1321

SHA-1

http://csrc.nist.gov/publications/fips/fips180-4/fips-180-4.pdf

SHA-256

https://csrc.nist.gov/csrc/media/projects/cryptographic-standards-and-guidelines/documents/examples/sha256.pdf

Md5deep and hashdeep

http://coptr.digipres.org/Md5deep_and_hashdeep

md5sum

http://coptr.digipres.org/Md5sum_Unix_command

The "Checksum" and the Digital Preservation of Oral History

https://www.youtube.com/watch?v=Emom_ncMqu0

A good short overview not limited to oral history, this video provides a brief introduction to the role of the checksum in digital preservation. It features Doug Boyd, Director of the Louie B. Nunn Center for Oral History at the University of Kentucky Libraries. (3 mins 25 secs)

 References

 

Bailey, J., 2014. Protect Your Data: File Fixity and Data Integrity.The Signal. [blog]. Available: http://blogs.loc.gov/thesignal/2014/04/protect-your-data-file-fixity-and-data-integrity/

ISO, 2012. ISO 14721:2012 - Space Data and Information Transfer Systems – Open Archival Information System (OAIS) – Reference Model, 2nd edn. Geneva: International Organization for Standardization. Available:https://www.iso.org/standard/57284.html

NDSA , 2013. The NDSA Levels of Digital Preservation: An Explanation and Uses, version 1 2013. National Digital Stewardship Alliance. Available: http://www.digitalpreservation.gov/ndsa/working_groups/documents/NDSA_Levels_Archiving_2013.pdf

 

Read More

File formats and standards

Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark

 

Introduction

 

The management of file formats should be considered in the wider strategic context of preservation planning. What can your organisation afford to do? How much developer effort will it require? What do the users require from your collections? Are you committing yourself to a storage problem? At all times, the answer to digital preservation issues is not to try and “do everything”. Your strategy ought to move you towards simple and practical actions, rather than trying to support more file formats than you need.

The purpose of this section is not to provide a detailed or exhaustive list of current formats for different types of content but to draw attention to the broader implications of file formats for their application, and implications for preservation.

A substantial part of this chapter refers to the possible selection of a file format for migration purposes. While migration is a valid preservation strategy, and quite common for many file formats, it is not the only approach or solution. Where appropriate, the chapter will refer to other suitable methods for preservation.

 

File formats organised by content types

 

Different content types have, over time, developed their own file formats as they strive to accommodate functionality specific to their needs. The main content types are images, video, audio and text; however, a growing number of formats are being structured to address the demands of new media, including formats for 3D models and archiving the web.

File formats vary enormously in terms of complexity, with some data being encoded in many layers. In some cases the file formats involved are just one part of a larger picture, a picture that includes software, hardware, and even entire information environments.

For further advice on preservation of specific types of digital content and associated file formats see the Content-specific preservation case studies in the Handbook.

 

File formats - what should we be worrying about?

 

Obsolescence

Formats evolve as users and developers identify and incorporate new functionality. New formats, or versions of formats, may introduce file format obsolescence as newer generations of software phase out support for older formats. When software does not provide for backwards compatibility with older file formats, data may become unusable. Both open source and commercial formats are vulnerable to obsolescence: vendors sometimes use planned obsolescence to entice customers to upgrade to new products while open source software communities may withdraw support for older formats if these are no longer generally needed by the community. Obsolescence can also be accidental: both businesses and open source communities can fail.

File format format obsolescence is a risk that needs to be understood. That said, the problem may not be as severe as the digital preservation community perceived it to be some 10 years ago. Many established file formats are still with us, still supported, and still usable. It is quite likely that the majority of file formats you deal with will be commonly understood and well supported.

Proliferation

Arguably, in some sectors, proliferation is more of a challenge than obsolescence. If formats aren’t normalised then an organisation can end up with a large number of different file formats, and versions of those formats: e.g. lots of different versions of PDF, word, image formats etc. In domains which develop rapidly evolving bespoke data formats this problem can be exacerbated. Tracking and managing all these formats - which ones are at risk, and which tools can be used for each one - can be a serious challenge.

Your digital preservation strategy should strive to mitigate the effects of obsolescence and proliferation. Strategies as migration, emulation, normalisation and a careful selection of file formats are all valid and worth considering, in the context of your collections and your organisation.

 

Aspects of file formats for digital preservation

 

Selecting target formats for preservation

Not all digital formats are suited or indeed designed for archiving or preservation. Any preservation policy should therefore recognise the requirements of the collection content and decide upon a file format which best preserves those qualities. Pairing content with a suitable choice of preservation format or access format; identifying what is important in the content.

Below we suggest some factors to consider in selecting your preferred file formats:

Open source vs proprietary?

Open source formats, such as JPEG2000, are very popular due to their non-proprietary nature and the sense of ownership that stakeholders can attain with their use. However, the choice of open source versus proprietary formats is not that simple and needs to be looked at closely. Proprietary formats, such as TIFF, are seen as being very robust; however, these formats will ultimately be susceptible to upgrade issues and obsolescence if the owner goes out of business or develops a new alternative. Similarly, open source formats can be seen as technologically neutral, being non-reliant on business models for their development however they can also been seen as vulnerable to the susceptibilities of the communities that support them.

Although such non-proprietary formats can be selected for many resource types this is not universally the case. For many new areas and applications, e.g. Geographical Information Systems or Virtual Reality only proprietary formats are available. In such cases a crucial factor will be the export formats supported to allow data to be moved out of (or into) these proprietary environments.

Documentation and standards

The availability of documentation - for example, published specifications - is an important factor in selecting a file format. Documentation may exist in the form of vendor’s specifications, an international standard, or may be created and maintained within the context of a user community. Look for a standard which is well-documented and widely implemented. Make sure the standard is listed in the PRONOM file format registry.

Adoption

A file format which is relied upon by a large user group creates many more options for its users. It is worth bearing in mind levels of use and support for formats in the wider world, but also finding out what organisations similar to you are doing and sharing best practice in the selection of formats. Wide adoption of a format can give you more confidence in your preservation strategy.

Lossless vs lossy

Lossy formats are those where data is compressed, or thrown away, as part of the encoding. The MP3 format is widely used for commercial distribution of music files over the web, because the lossy encoding process results in smaller file sizes.

TIFF is one example of an image format that is capable of supporting lossless data. It could hold a high-resolution image. JPEG is an example of a lossy image file format. Its versatility, and small file size, makes it a suitable choice for creating an access copy of an image of smaller size for transmission over a network. It would not be appropriate to store the JPEG image as both the access and archival format because of the irretrievable data loss this would involve.

One rule of thumb could be to choose lossless formats for the creation and storage of "archival masters"; lossy formats should only be used for delivery / access purposes, and not considered to be archival. A rule like this is particularly suitable for a digitisation project, particularly still images.

Support for metadata

Some file formats have support for metadata.This means that some metadata can be inscribed directly into an instance of a file (for example, JPEG2000 supports some rights metadata fields). This can be a consideration, depending on your approach to metadata management.

Significant properties of file formats

This is a complex area. One view regards significant properties as the "essence" of file content; a strategy that gets to the heart of "what to preserve". What does the user community expect from the rendition? What aspects of the original are you trying to preserve? This strategy could mean you don’t have to commit to preserving all aspects of a file format, only those that have the most meaning and value to the user.

Significant properties may also refer to a very specific range of technical metadata that is required to be present in order for a file to be rendered (e.g. image width). Some migration tools may strip out this metadata, or it may become lost through other curation actions in the repository. The preservation strategy needs to prevent this loss happening. It thus becomes important to identify, extract, store and preserve significant properties at early stage of the preservation process.

 

Things we can do

 

There are many things you could do to support file formats in your digital archive, and there are many tools available to help you with these tasks. There are now so many that digital preservation tool registries are being developed to help you locate and assess them (see the Tools and the Resources sections)

 

Tools for migration

Broadly, these are tools that transform a file format from an obsolete format into a newer format which can be supported. Many tools exist for doing this migration. They tend to confine themselves to doing one thing (e.g. ImageMagick only works for digital image objects).

A migration tool is just one part of a migration pathway. The pathway must include a destination / target format, which you will have selected in line with guidance as suggested above.

Migration tools may introduce risks. One of these risks is “invisible” changes happening to the content or to the data in the migration. To reduce this risk, one strategy is to devise a set of acceptance criteria for what the transformed object must keep, e.g. in terms of formatting, look and feel, or even functionality, and confirm desired outcomes with a process of quality assurance.

File format migration is not always the solution. Some CAD and CAM file formats cannot easily be migrated, for example. The aerospace industry has found that migration of older CAD files to a newer format requires a lot of validation, mainly because they are required by a regulatory framework to demonstrate that their data is sound and meets very strict standards. In short, the cost of migration and validation is (for them) much higher than an emulation solution, an approach which (in this case) involves keeping the CAD software and maintaining it.

See also the Tools and Content-specific preservation sections.

Tools for rendition

Broadly, these are tools that can read and play back a file format, so that the user community can read and interpret the resource; it’s most commonly applied to files stored in accessible formats. A basic rendition tool would be PDF Reader. A more sophisticated rendition tool would be the Wellcome Library media player, which supports OCR texts, images, and audio-visual files.

Tools for file format identification

Tools that can identify aspects of file formats which are not immediately obvious from their file extension. They do this by reading the file format header, and thus can identify e.g. mimetype, size, version. Examples of such tools include PRONOM, JHOVE, and the NZ Metadata Extraction Tool (see Resources below).

These tools are usefully deployed at point of ingest, so that you know from the start what sort of file formats you are taking into the archive.

Some identification tools can also point to likely rendition tools, or even (like PRONOM) suggest a migration path based on file format identification.

Tools for file format validation

JHOVE is one of the few tools that is able to validate a file format. It does this by comparing an instance of a file format with sets of expected behaviours, which it stores in its library. JHOVE can report on certain file formats and tell whether they are valid and well-formed.

Collection surveys

Survey file formats in use / know what you have / characterisation of your collections. This again ties into a planning strategy, letting you know what you need to support, and the likely effort required to do this.

A survey should pay particular attention to versions of file formats, and software needed for their reading / rendition. If possible, gather any information about published specifications for these formats; some specs are published on the web.

Useful emerging work in this area has taken place at the British Library, with projects on Sustainability Assessments (Maureen Pennock, Paul Wheatley, Peter May) and Collection Profiling (Michael Day, Maureen Pennock, Ann MacDonald). At time of writing there are no active links to these projects, but it is anticipated that the Sustainability Assessment work will be published on the DPC wiki. These are useful approaches and can be regarded as examples of current best practice. Even if you don’t assess or profile to the same depth as the BL, the exercise is a practical and applicable one.

Avoid Proliferation of File Types

Where possible, reduce the range of file formats you support, in order to reduce complexity. A sound approach to preservation planning is to normalise, rather than add multiple migration formats to your collection. The smaller the range of formats, the lower the overheads.

Community

Identify a consensus of agreement on target file formats; collaborate with institutions who hold similar collections to yours. What formats do they choose to work with?

 

Conclusion

 

For some kinds of content, there is consensus around the choice of preservation format. For example audio archiving where WAV is commonly used. In other areas consensus is much more difficult to achieve. The preservation of digital video is a complex area where progress has been stymied by a lack of agreement, and an uncontrolled proliferation of wrapper formats, delivery methods, and encoding methods. The choice of image file formats is slightly clearer, with a limited choice of formats for archiving and others for delivery. It has been generally agreed that the TIFF format is the correct format for archiving master files (the RAW or DNG format is also considered appropriate for archiving) but this is now being challenged by the JPEG2000 format which provides a far greater level of lossless compression compared to TIFF and is open source.

 

 Resources

Library of Congress recommended format specifications

http://www.loc.gov/preservation/resources/rfs/index.html

develop a set of specifications of formats which it recommends, both internally to its own professionals and externally to creators, vendors and archivists, as the preferred ones to use to ensure the preservation and long-term access. It covers both digital and analogue formats and is divided into six broad categories: Textual Works and Musical Compositions; Still Image Works; Audio Works; Moving Image Works; Software and Electronic Gaming and Learning; and Datasets/Databases.

Jisc significant properties reports

Between 2007 and 2008 Jisc funded five studies of significant properties for different types of content and files. Note discussion in the reports is as of 2007- 2008. The reports are as follows:

inSPECT Significant Properties Report 2007 (10 pages)

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.109.7923&rep=rep1&type=pdf

Significant Properties of E-learning Objects 2008 (65 pages)

http://www.webarchive.org.uk/wayback/archive/20140616090345/http://www.jisc.ac.uk/media/documents/programmes/preservation/spelos_report.pdf

The Significant Properties of Moving images 2008 (62 pages)

http://www.webarchive.org.uk/wayback/archive/20140616090254/http://www.jisc.ac.uk/media/documents/programmes/preservation/spmovimages_report.pdf

The Significant Properties of Software: A Study 2008 (97 pages)

http://www.webarchive.org.uk/wayback/archive/20100624233431/http://www.jisc.ac.uk/media/documents/programmes/preservation/spsoftware_report_redacted.pdf

The Significant Properties of Vector Images 2007 (61 pages)

http://www.webarchive.org.uk/wayback/archive/20140616090304/http://www.jisc.ac.uk/media/documents/programmes/preservation/vector_images.pdf

British Library File Formats Assessments

http://wiki.dpconline.org/index.php?title=File_Formats_Assessments

The Digital Preservation Team at the British Library has undertaken preservation risk file format assessments to capture knowledge about the gaps in current best practice, understanding and capability in working with specific file formats. The focus of each assessment is on capturing evidence-based preservation risks and the implications of institutional obsolescence which lead to problems maintaining the content over time. The assessments are hosted as a new section on the DPC Wiki. Three assessments covering JP2, TIFF and PDF have commenced the series.

Library of Congress sustainability factors

http://www.digitalpreservation.gov/formats/index.shtml

This site is concerned with the formats associated with media-independent digital content, i.e., content that is typically managed as files and which is generally not dependent upon a particular physical medium. It is not concerned with the formats associated with media-dependent digital content, i.e., formats that are dependent upon and inextricably linked to physical media, e.g., DVDs, audio CDs, and videotape formats like DigiBeta. It identifies and describes the formats that are promising for long-term sustainability, and develops strategies for sustaining these formats including recommendations pertaining to the tools and documentation needed for their management.

Help Solve the File Format Problem

http://fileformats.archiveteam.org

A crowd-sourced file format information wiki on the Archive Team site. All content is available under a Creative Commons 0 licence.

Is JPEG 2000 a digital preservation risk?

http://blogs.loc.gov/digitalpreservation/2013/01/is-jpeg-2000-a-preservation-risk/

An interesting guest blog and discussion thread on the JPEG 2000 image format.

OPF File Format Risk Registry

http://wiki.opf-labs.org/display/TR/OPF+File+Format+Risk+Registry

This focuses specifically on file format issues and risks that have implications for long-term preservation and accessibility and how to deal with these in a practical way. It aims to be complementary to more formal format registries.

PRONOM

http://apps.nationalarchives.gov.uk/pronom/Default.aspx

This file format registry is a major resource for anyone requiring impartial and definitive information about the file formats, software products and other technical components required to support long-term access to electronic records and other digital objects of cultural, historical or business value.

DROID (Digital Record Object Identification)

http://www.nationalarchives.gov.uk/information-management/manage-information/preserving-digital-records/droid/

This is an automatic file format identification tool providing categories of format identification for unknown files in a digital collection. It uses internal signatures to identify and report the specific file format and version of digital files. These signatures are stored in an XML signature file, generated from information recorded in the PRONOM registry.

 

Case studies

See the Detailed content preservation case studies section of the Handbook for relevant case studies.

Read More

Information security

Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark

 

Introduction

 

This section is intended as guidance for practitioners at a novice or intermediate level on the implications of information security for digital preservation. Information Security issues relate to system security (e.g., protecting digital preservation and networked systems / services from exposure to external / internal threats); collection security (e.g., protecting content from loss or change, the authorisation and audit of repository processes); and the legal and regulatory aspects (e.g. personal or confidential information in the digital material, secure access, redaction). Information security is a complex and important topic for information systems generally. It is important to rely on relevant expertise within your organisation and beyond it through government and other networks for general information security procedures and advice. You may also need appropriate advocacy for specific digital preservation procedures and requirements.

Rigorous security procedures will:

  1. Ensure compliance with any legal and regulatory requirements;
  2. Protect digital materials from inadvertent or deliberate changes;
  3. Provide an audit trail to satisfy accountability requirements;
  4. Act as a deterrent to potential internal security breaches;
  5. Protect the authenticity of digital materials;
  6. Safeguard against theft or loss.

Many types of digital material selected for long-term preservation may contain confidential and sensitive information that must be protected to ensure they are not accessed by non-authorised users. In many cases these may be legal or regulatory obligations on the organisation. These materials must be managed in accordance with the organisation's Information Security Policy to protect against security breaches. ISO 27001 describes the manner in which security procedures can be codified and monitored (ISO, 2013a). ISO 27002 provides guidelines on the implementation of ISO 27001-compliant security procedures (ISO, 2013b). Conforming organisations can be externally accredited and validated. In some cases your own organisation's Information Security Policy may also impact on digital preservation activities and you may need to enlist the support of your Information Governance and ICT teams to facilitate your processes.

Information security methods such as encryption add to the complexity of the preservation process and should be avoided if possible for archival copies. Other security approaches may therefore need to be more rigorously applied for sensitive unencrypted files; these might include restricting access to locked-down terminals in controlled locations (secure rooms), or strong user authentication requirements for remote access. However, these alternative approaches may not always be sufficient or feasible. Encryption may also be present on files that are received on ingest from a depositor, so it is important to be aware of information security options such as encryption, the management of encryption keys, and their implications for digital preservation.

 

Techniques for protecting information

 

Several information security techniques may be applied to protect digital material:

Encryption

Encryption is a cryptographic technique which protects digital material by converting it into a scrambled form. Encryption may be applied at many levels, from a single file to an entire disk. Many encryption algorithms exist, each of which scramble information in a different way. These require the use of a key to unscramble the data and convert it back to its original form. The strength of the encryption method is influenced by the key size. For example, 256-bit encryption will be more secure than 128-bit encryption.

It should be noted that encryption is only effective when a third party does not have access to the encryption key in use. A user who has entered the password for an encrypted drive and left their machine powered on and unattended will provide third parties with an opportunity to access data held in the encrypted area, which may result in its release.

Similarly encryption security measures (if used) can lose their effectiveness over time in a repository: there is effectively an arms race between encryption techniques and computational methods to break them. Hence, if used, all encryption by a repository must be actively managed and updated over time to remain secure.

Encrypted digital material can only be accessed over time in a repository if the organisation manages its keys. The loss or destruction of these keys will result in data becoming inaccessible.

Access Control

Access controls allow an administrator to specify who is allowed to access digital material and the type of access that is permitted (for example read only, write). The Handbook follows the National Digital Stewardship Alliance (NDSA) preservation levels in recommending four levels at which digital preservation can be supported through access control. The NDSA levels focus primarily on understanding who has access to content, who can perform what actions on that content and enforcing these access restrictions (NDSA, 2013) as follows:

 

NDSA level

Activity

1

  • Identify who has read, write, move and delete authorisation to individual files
  • Restrict who has those authorisations to individual files

2

  • Document access restrictions for content

3

  • Maintain logs of who performed what actions on files, including deletions and preservation actions

4

  • Perform audit of logs

 

 

Redaction

Redaction refers to the process of analysing a digital resource, identifying confidential or sensitive information, and removing or replacing it. Common techniques applied include anonymisation and pseudonymisation to remove personally identifiable information, as well as cleaning of authorship information. When related to datasets this is usually carried out by the removal of information while retaining the structure of the record in the version being released. You should always carry out redaction on a copy of the original, never on the original itself.

The majority of digital materials created using office systems, such as Microsoft Office, are stored in proprietary, binary-encoded formats. Binary formats may contain significant information which is not displayed, and its presence may therefore not be apparent. They may incorporate change histories, audit trails, or embedded metadata, by means of which deleted information can be recovered or simple redaction processes otherwise circumvented. Digital materials may be redacted through a combination of information deletion and conversion to a different format. Certain formats, such as plain ASCII text files, contain displayable information only. Conversion to this format will therefore eliminate any information that may be hidden in non-displayable portions of a bit stream.

 

Resources

ENISA. 2013, Cloud Security Incident Reporting

https://www.enisa.europa.eu/activities/Resilience-and-CIIP/cloud-computing/incident-reporting-for-cloud-computing/

The EU's Agency for Network & Information Security offers recommendations on the ways in which cloud providers and their customers should respond to – and report – security breaches. (38 pages).

ISO 27001:2013, Information technology— Security techniques — Information security management systems — Requirements. Geneva: International Organization for Standardization

http://www.iso.org/iso/catalogue_detail?csnumber=54534

ISO 27001 describes the manner in which security procedures can be codified and monitored. Conforming organisations can be externally accredited and validated. A template for a set of policies aligned with the standard is available. Note that these are headings, to assist with policy creation, rather than policy statements. However, similar policy sets are in use in a substantial number of organisations. (23 pages).

ISO 27002:2013, Information technology – Security techniques – Code of practice for information security controls. Geneva: International Organization for Standardization

http://www.iso.org/iso/catalogue_detail?csnumber=54533

ISO 27002 provides guidelines on the implementation of ISO 27001-compliant security procedures. (80 pages)

ISO 27799:2008, Health informatics – Information security management in health using ISO/IEC 27002. Geneva: International Organization for Standardization

http://www.iso.org/iso/catalogue_detail?csnumber=41298

ISO 27799 provides specific advice on implementing ISO 27002 and 27001 in the healthcare sector. (58 pages)

Cabinet Office, 2009, HMG IA Standard No. 1 – Technical Risk Assessment

https://www.ncsc.gov.uk/guidance/information-risk-management-hmg-ia-standard-numbers-1-2

A detailed discussion and standard intended for UK Risk Managers and Information Assurance Practitioners who are responsible for identifying, assessing and treating the technical risks to systems and services that handle, store and process digital government information. (114 pages).

Redaction toolkit (TNA 2011)

http://www.nationalarchives.gov.uk/documents/information-management/redaction_toolkit.pdf

This TNA toolkit was produced in 2011 to provide guidance on editing exempt material from information held by public bodies. It covers generic principles records in any media but has a small section specifically on electronic records and detailed guidance on methods for securely redacting electronic records of all types. (21 pages).

BitCurator

https://bitcurator.net/

BitCurator is a suite of open source digital forensics and data analysis tools to help collecting institutions holding born-digital materials. Parts of the toolset help locate private and sensitive information on digital media and prepare materials for public access.

Information Commissioners Office (ICO): Information security

https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/security/

The ICO website has guidance on reporting of security breaches and use of IT. For those working in organisations falling under the ICO's jurisdiction an understanding of what this guidance recommends is essential to starting conversations with ICT and Information Governance Colleagues as they will need to be assured that work can be carried out in compliance with ICO recommendations.

Access to the Secure Lab

https://www.ukdataservice.ac.uk/get-data/how-to-access/accesssecurelab

A number of confidential and sensitive microdata sources are becoming available through datalabs across the UK. These data are deemed potentially identifiable, and can only be accessed through a datalab facility (as opposed to download). In addition, researchers are asked to fullfil a number of additional application requirements. Some of these data may be accessed via the Secure Lab of the UK Data Service and this page provides useful overviews and access to relevant user agreements.

 

Case studies

Opening access to administrative data for evaluating public services: The case of the Justice Data Lab

http://evi.sagepub.com/content/21/2/232.full.pdf+html

The Justice Data Lab a unit within a secure setting holding evaluation and statistical expertise has enabled providers of programmes aimed at reducing re-offending to obtain evidence on how the impact of their interventions differs from that of a matched comparison group. This article explores the development of the Justice Data Lab, the methodological and other challenges faced, and the experiences of user organizations. The article draws out implications for future development of Data Labs and the use of administrative data for the evaluation of public services. (16 pages).

UK Data Service: Data Security

https://www.ukdataservice.ac.uk/manage-data/store/security

This webpage summarises how the UK Data Archive manages data security for its holdings. Data security may be needed to protect intellectual property rights, commercial interests, or to keep sensitive information safe. Arrangements need to be proportionate to the nature of the data and the risks involved. Attention to security is also needed when data are to be destroyed.

 

 References

 

NDSA, 2013. The NDSA Levels of Digital Preservation: An Explanation and Uses, version 1 (2013). Available: http://www.digitalpreservation.gov/ndsa/working_groups/documents/NDSA_Levels_Archiving_2013.pdf

ISO, 2013a. ISO 27001:2013 - Information technology - Security techniques - Information security management systems - Requirements. Geneva: International Organization for Standardization. Available: http://www.iso.org/iso/catalogue_detail?csnumber=54534

ISO, 2013b. ISO 27002:2013 - Information technology – Security techniques – Code of practice for information security controls. Geneva: International Organization for Standardization. Available: http://www.iso.org/iso/catalogue_detail?csnumber=54533

Read More

Scroll to top