Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark

Introduction

 

We know by now that digital preservation is comprised of a series of challenges emanating from organisational, resourcing, managerial, cultural, and technical issues. This section of the Handbook will focus specifically on actions that can be taken to help mitigate the technical challenges of preserving digital materials over time.

 

Technological obsolescence of formats

 

Technological obsolescence has long been considered a significant challenge of long term digital preservation. However in recent years studies have suggested that format obsolescence isn't always as prevalent as previously feared (Rosenthal 2015a, Jackson 2012). It is one issue that must be recognised and countered if digital materials are to survive over generations of technological change but It is certainly not the only challenge. Many established file formats are still with us, still supported, and still usable. It is quite likely that the majority of file formats you deal with will be commonly understood and well supported rather than obsolete.

A simple definition of obsolescence is the process of becoming outdated or no longer used. When talking about technological obsolescence, we refer for example to 'this Wordperfect 3.1 software is obsolete' or 'this BBC Micro computer is obsolete'. The exact moment at which obsolescence occurs can be difficult to pinpoint, particularly for materials that have only recently become obsolete. For example, just because the original application (e.g. MS Word) no longer supports a given format, it doesn't mean no other software that can read the format is unavailable. Similarly one institution may continue to use and maintain a piece of legacy software long after others have upgraded to new versions. It is perhaps therefore more useful to talk about 'institutional obsolescence', namely that the technology in question is no longer in use or easily accessed by a particular institution.

Obsolescence is an issue because all files have their own hardware and software dependencies. This was particularly the case in the early days of computing.

Change becomes an issue when it compromises the meaning of the content or its interpretation by a user. A core goal of digital preservation actions is to preserve the integrity and authenticity of the material being preserved, despite these generational changes in computing technology. In the next section we will discuss some common strategies to help minimise these changes.

 

Preservation strategies

 

In this section we review the technical strategies that can be employed to preserve digital information. After a flurry of activity in the late 1990s there has been relatively little progress in finding new strategies, though there has been significant research and development into varying implementation options and supporting technologies such as quality assurance, digital forensics (see Digital forensics), and technical representation information registries (see Technical solutions and tools in the Handbook). The techniques we will cover here are:

  • Format Migration
  • Emulation
  • Computer Museums

 

Format migration

 

Format Migration is one of the most widely utilised preservation strategies employed and most digital preservation systems contain functionality or system data that assumes a migration solution. Format migration is different from storage media migration. It involves transferring or transforming (i.e. migrating) data from an ageing/obsolete format to a new format, possibly using new applications systems at each stage to interpret information. Moving from one version of a format standard to a later standard is a version of this method; for example moving from MS Word Version 6 (from 1993) to MS Word for Windows 2010. For frameworks and tools that are helpful for evaluating technical obsolescence of file formats see File formats and standards.

Format migration, like any intervention that has the potential to change the structure and content of data, can introduce errors and loss of information. Therefore, it is important to define metrics to measure possible loss of information and use these to do tests on the correctness and quality of format migration.

Recent work touching on quality assurance and digital preservation actions includes the work of the AQUA, SPRUCE, and SCAPE projects. To measure error rates, it is necessary to determine some very specific metrics. You might need to define what you count as an error and whether you weight some errors as being more important than others. This depends on the context/content of the record and what characteristics of the material are deemed 'significant' to preserve, as well as the migration tools and successive formats used in any migration pathway.

Some practical issues involved in this process include when to migrate – is it better to migrate from generation to generation, or should some generations be skipped? You will need to keep a record of all transformations, their results and to document detected losses of information so as to maintain evidence of authenticity and authority. PREMIS can be a useful tool for this - see the Handbook section on Metadata and documentation for more information about this standard. It is good practice always to retain the original file format as deposited to return to if required.

 

Emulation

 

Emulation offers an alternative solution to migration that allows archives to preserve and deliver access to users directly from original files. This technique attempts to preserve the original behaviours and the look and feel of applications, as well as informational content. It is based on the view that only the original programme is the authority on the format and this is particularly useful for complex objects with multiple interdependencies, such as games or interactive apps.

An emulator, as the name implies, is a programme that runs on a current computer architecture but provides the same facilities and behaviour as an earlier one. This approach has been endorsed by a number of heritage organisations, often in collaboration with technical experts and in recent years there has been some notable success in implementing emulation solutions for cultural heritage (see Resources below) . However some significant challenges remain, not least there are often rights issues associated with software licensing that need to be resolved (Rosenthal 2015b).

A particular benefit of emulation is that a single solution can be deployed to provide access to a large number of objects, so long as all those objects require delivery on the same operating system or hardware stack. Use of legacy computing equipment may however prove difficult for users, though they will almost certainly be accessing an 'authentic' representation of the records. Of course emulators have to be built and maintained, requiring a pool of expertise to be available and this cannot always be assumed. New emulators will be needed as computer architectures become obsolete, and both of these present costs and resource needs.

 

Computer museums

 

This methodology proposes the keeping of computers and their systems software (operating systems, drivers, etc.) as well as the data and applications programmes. Effort must be expended to keep all platforms in good order, and to retain all the knowledge necessary to maintain and use the machines and their programmes. The idea relies on having a source of spare parts too, but these will dwindle, as will pools of expertise. Hence this strategy tends to be an interim measure rather than a long-term solution. Some formal museums do exist, such as the Computer History Museum in California and the Centre for Computing History in Cambridge. These typically maintain machines in working order though do not provide preservation services. See also the Legacy media section of the Handbook for further information on historic file formats and media.

 

Implementation

 

The DPC Technology Watch Reports are a particularly useful guide to most common genres and file formats (including email, social media, Audio-Visual, eBooks, e-Journals, GIS, CAD, web archiving etc.) and show which strategies tend to be used most commonly in each of these areas. Tools to assist with implementation of preservation strategies are discussed in the Technical solutions and tools area of the Handbook particularly in File formats and standards.

 

Resources

DPC Technology Watch Report series

http://www.dpconline.org/publications/technology-watch-reports

The DPC Technology Watch Report series is intended as an advanced introduction to specific issues for those charged with establishing or running services for long term access. They identify and track developments in IT, standards and tools which are critical to digital preservation activities. They are commissioned by experts on these developments and are thoroughly scrutinised by peers before being released.

Emulation & Virtualization as Preservation Strategies

https://mellon.org/media/filer_public/0c/3e/0c3eee7d-4166-4ba6-a767-6b42e6a1c2a7/rosenthal-emulation-2015.pdf

This 2015 report on Emulation and Virtualization as Preservation Strategies by David Rosenthal was funded by the Mellon Foundation, the Sloan Foundation and IMLS. It concludes recent developments in emulation frameworks make it possible to deliver emulations to readers via the Web in ways that make them appear as normal components of Web pages. This removes what was the major barrier to deployment of emulation as a preservation strategy. Barriers remain, the two most important are that the tools for creating preserved system images are inadequate, and that the legal basis for delivering emulations is unclear, and where it is clear it is highly restrictive. Both of these raise the cost of building and providing access to a substantial, well-curated collection of emulated digital artefacts beyond reach. If these barriers can be addressed, emulation will play a much greater role in digital preservation in the coming years. (37 pages).

Systematic planning for digital preservation: evaluating potential strategies and building preservation plans

http://www.ifs.tuwien.ac.at/~strodl/paper/becker-ijdl2009.pdf

This article published in 2009 describes a systematic approach for evaluating potential alternatives for preservation actions and building thoroughly defined, accountable preservation plans for keeping digital content alive over time. The work was undertaken as part of the Europran Union-funded PLANETS project . (25 pages).

File format conversion

http://www.nationalarchives.gov.uk/documents/information-management/format-conversion.pdf

Format conversion may can help you maintain access and use of your information and mitigate risks that arise from obsolescence. This 2011 guidance from The National Archives gives you the steps you should go through in performing a file format conversion process. (29 pages).

What organizations are preserving software

http://qanda.digipres.org/1068/what-organizations-are-preserving-software

This post and responses from August 2015 on the Digital Preservation Q&A site provides a useful list and links for institutions preserving software for emulation strategies.

SCAPE Project Final best practice guidelines and recommendations

http://scape-project.eu/wp-content/uploads/2014/02/SCAPE_D20.6_KB_V1.0.pdf

This SCAPE project report published in 2014 covers three major areas: implementation of large-scale migration as a preservation strategy. Other areas are preservation of research data; and Bit preservation. (127 pages).

 

Case studies

The Internet Arcade

https://archive.org/details/internetarcade

The Internet Arcade is a web-based library of arcade (coin-operated) video games from the 1970s through to the 1990s from the Internet Archive, implemented using an in-browser emulation solution to provide access to the collection.

Interject

http://www.webarchive.org.uk/interject/

This prototype from the British Library demonstrates how a mixture of preservation actions can be smoothly integrated into the search infrastructure of the UK Web Archive, by acting as an 'access helper' for end users. This web page picks a few specific examples of difficult or interesting cases, and allows you to inspect what the system knows about those formats. You can also view transformed versions of those resources (combining format conversion and emulation techniques).

Rhizome

http://rhizome.org/editorial/2015/apr/17/theresa-duncan-cd-roms-are-now-playable-online/

In the 1990s, Theresa Duncan and collaborators made three videogames that exemplified interactive storytelling at its very best. Two decades later, the works (like most CD-ROMs) have fallen into obscurity. This online exhibition, co-presented by Rhizome and the New Museum brings them back, making them playable online via emulation.

Assessing Migration Risk for Scientific Data Formats

http://www.ijdc.net/index.php/ijdc/article/view/202/271

This paper explore a simple hypothesis – that, where migration paths exist, the majority of scientific data files can be safely migrated leaving only a few that must be handled more carefully – in the context of several scientific data formats that are or were widely used. The approach is to gather information about potential migration mismatches and, using custom tools, evaluate a large collection of data files for the incidence of these risks. The results support the initial hypothesis, though with some caveats.

Portico - Preservation Step-by-Step

http://www.portico.org/digital-preservation/services/preservation-approach/preservation-step-by-step

A useful step by step guide to the preservation planning and migration strategies employed by Portico The preservation plan may include an initial migration of the packaging or files in specific formats (for example, Portico migrates publisher specific e-journal article XML to the NLM archival standard).

Trash to treasure: Retro computer, software collection helps National Library access digital pieces

http://www.abc.net.au/news/2015-06-20/collecting-retro-computer-technology-to-save-digital-treasures/6560494

The National Library of Australia made public its own efforts to develop a collection of legacy computing hardware and software. It uses it to support data recovery and then implements other preservation strategies and does not rely on the computer museum for long-term preservation.

 

References

David Rosenthal, 2015a. "The Prostate Cancer of Preservation" Re-examined. Available: http://blog.dshr.org/2015/09/the-prostate-cancer-of-preservation-re.html

David Rosenthal, 2015b. Emulation & Virtualization as Preservation Strategies. Available: https://mellon.org/media/filer_public/0c/3e/0c3eee7d-4166-4ba6-a767-6b42e6a1c2a7/rosenthal-emulation-2015.pdf

Andrew N. Jackson, 2012. Formats over Time: Exploring UK Web History,. Available: http://arxiv.org/abs/1210.1714