The British Library’s digital collections are vast, including everything from modern Non-Print Legal Deposit content (such as eJournals, eBooks and web archives), to born-digital electoral registers, eTheses, sound content, newspapers, personal digital archives, patent specifications, and digitized material of many kinds. Having established a digital preservation team in 2005, the Library has invested heavily in building a repository and undertaking research & development efforts to identify best practice for preserving digital collections so that they can be accessed by readers. Efforts have particularly prioritized contemporary digital-only acquisitions and digitized content that is either in high demand or Non-Print Legal Deposit.

In 2015 we became aware of a potential preservation issue regarding the presence of low-use born digital, legacy content acquired and stored on handheld media, dating back to the 1970s onwards. The age of the content meant that much of it had become institutionally obsolete, in terms of both storage media and file formats. A second complication was the uncertain lifespans of storage media, which are affected by many different factors including humidity and original disk quality and where the risk of both file and disk degradation increases over time. A proof of concept research project was therefore initiated to explore the scope of this potential problem and work out possible solutions. This became known as ‘Flashback’.

Flashback Image 1

‘Flashback’ was a proof of concept project in two phases. Phase one (July 2015 – January 2016) was a relatively small-scale investigation that looked at fifty sample items in detail in an attempt to scope the problem and test workflows that would identify viable solutions for preservation of and access to the content. Phase two (May 2016 – March 2017) was commissioned at the end of phase one to build on these initial findings and to scale this research up with a larger sample of approximately 1000 items, shifting the focus to identify and develop or otherwise document the very practical steps required to implement an end-to-end workflow for preservation, leading to end-user access in the Library’s reading rooms. A key difference between phase one and phase two was the level at which testing took place – phase one examined resources on an item-by-item basis and developed processes accordingly, whilst phase two tested the phase one processes on discrete collections of similar content in order to test how well they scaled.

Flashback Image 2

These two phases together provided the project team with valuable insight into the issues faced andpossible solutions. We summarise these in this short space as follows:-

  • Investigations into how the content should be packaged and described so that it could be integrated with existing catalogue records and cataloguing approaches, particularly for hybrid analogue/digital acquisitions: Three main types of content were identified that would require different solutions, namely stand-alone items (i.e. purely digital with no corresponding physical artefact) that had already been catalogued; hybrid items, non-serial (i.e monographs with discs) where catalogue records already existed and linked to the physical artefact, and; hybrid serial items, (i.e. serials each with discs) where catalogue records typically existed per title and not at the issue level.
  • Indicative data around disk failure rates and processing times: Whilst newer content seems to be relatively stable, particularly that on optical disk, degradation rates do increase with the age of the content. Data collected during the two phases enabled the team to estimate the amount of resource required to complete imaging of the legacy collection in a subsequent implementation phase.
  • Development and testing of re-usable disk imaging workflows for 3.5” and 5.25” floppy disks, as well as CDs and DVDs, including virus checking and checksumming: after testing ISOBuster, BitCurator and Kryoflux, the project developed workflows around BitCurator and Kryoflux.
  • Two sizable collections of sample content for testing: In all the project sampled over 700 items comprised of just under 1000 disks, either 3.5” floppy disk, 5.25” floppy disk, CD or DVD. These items represented several different generations of computing environment, from 1980s onwards, and multiple different content types from datasets and spreadsheets to software, games, and guides.
  • Insight into the scale of the challenge: Investigations revealed that approximately 100,000 legacy items were potentially held in the physical stores on hand-held digital media. As to be expected, there has been a decline in the intake of disk-based content from a peak of over 7,000 items in 2009, to just over 2,000 in 2015.
  • Establishment of a Flashback Legacy lab with a selection of working legacy computing environments. By the end of the proof of concept the Lab had around fifteen working machines dating back to the early 1980s, including a Macintosh Classic, and Apple IIe, and a BBC Master.
  • Development of a decision tree that differentiated between content types in order to inform initial selection of a preservation approach for testing in the Lab. This included a migration option and two main emulation paths: a) for objects where the folder structure was integral to navigation of the content, and b) for objects where software applications were intended to be run directly.
  • Production of a simple ‘preservation planning and evaluation’ process to analyse, document, and compare rendering of item significant characteristics in both ‘native’ and ‘preserved’ form: In phase one the project team ran an initial analysis of each item and documented these on an experiment plan, then ran the experiment using the approach indicated by the decision tree. Some items underwent multiple experiments, depending on the viability of the first approach tested. In phase two this was scaled up to test a single emulation environment at a collection level.
  • A small number of different emulation environments accessible on staff PCs with the University of Freiburg’s Eaas/EMiL emulation framework.

As a result of the proof of concept results, a further project was funded to refine and implement the proof of concept workflows and recommendations for Business-As-Usual to preserve and provide access to legacy digital collections acquired on disks. This project, known as Flashback Implementation, is currently underway.


Scroll to top