Planning for digital preservation version 2

Also in this section
Blog Topics

Latest Comments

Archiving Facebook, Right Now
- George Oates 1 month ago
  
  Hi Andy, Yes, it's still a difficult nightmare, and that's just to archive your own stuff! Thanks for ...
Preserving Legislative Records: Why they matter and what the Nairobi City County Assembly can teach us
- Nelly 1 month ago
  
  Thanks for this informative insight Villy
- Adah 1 month ago
  
  A Very informative article.

DPC Blog RSS Feed

Also in this section

Hugh J Campbell

Last updated on 29 March 2018

The Public Record Office of Northern Ireland (PRONI) completed a digital preservation project in March 2015. At that point, we had implemented a solution which included:

A standalone quarantine / virus checking solution;
The use of existing software, e.g. DROID, Bagger, TreeSize Pro;
A MS Excel tool to assist with the generation and manipulation of metadata;
Software to validate checksums and create archival information packages;
A digital repository to store and validate the digital records;
Software to apply the digital records to our existing catalogue systems, staff and public facing.

A lot has happened since the end of the project, some good, some not so good. The challenge now is to learn from the experience and start planning for the next version of the system.

We developed a metadata template based on our Electronic Document and Records Management (EDRM) system. This was based on the expectation that we would be unlikely to get any more metadata from any other source. We did, however, cover our backs by creating the means to handle additional metadata, albeit in a somewhat clumsy manner.

We built validation based on what we had identified as the mandatory elements of the metadata schema. What was missed, however, was the implications this had for ingesting some of our digitised material into the digital repository and making it available via staff and public facing catalogues. The digitised material was being processed through the digital preservation system and, therefore, it had to have all metadata identified as mandatory even though none of it would be passed through to the catalogue applications (which already contained entries for the physical objects and to which we would attach the digitised version). This is an obvious area for change in order to make the ingest process of digitised records, including links to the catalogue, more efficient.

As noted above, MS Excel was used to develop a tool to assist with the generation and manipulation of metadata. The tool will never win any awards for user experience or user friendliness unless the DPC introduces its version of the Golden Raspberry awards. Working with the tool is very manual and ‘clunky’ and it lends itself to making mistakes. This may in part be attributed to a decision made in the early days of the project that we would ask for / create a .csv metadata file for each folder in an accession rather than one single metadata file. Ideally, I would like the system to have the capability of handling both options. At times it has been easier to work with multiple files because it helps to keep the .csv file to a manageable size; on other occasions it is just a lot of hassle to work with multiple, small .csv files. We need to give careful consideration as to how to improve this tool.

The first exercise to validate the records in the digital repository threw up an issue. The validation included two checks: checksums and the number of records. Of course, validation failed. Inspection revealed that the culprit was our beloved friend, thumbs.db. These are hidden data files that Windows uses so that it can display files quickly when a folder is opened. Unfortunately, we appear to be unable to prevent the creation of thumbs.db. The problem was that an accession had the number of files recorded; the validation routine counted the files which now included new thumbs.db files and of course the counts did not match. This resulted in the validation failing. Another issue to be addressed.

Our digital preservation processes include the option to create automatically new entries in our catalogue or to link digital records to existing catalogue entries. When creating new entries at item level, the numbering starts at /1. This is something that we have recognised will need to change as we may want to add more records into an existing archive level within the catalogue. Another cataloguing related issue identified is the need to enforce order when creating digital catalogue items automatically. An example of this could be a folder containing minutes of meetings which you may want to present in chronological order. Again, this did not feature in version 1 of the digital preservation system. Thankfully, these catalogue related issues have not caused any serious issues to date but will need to be addressed in a future release of the system.

We knew from the outset of our project that we could be offered a wide variety of file formats. Whilst being prepared for this, we expected the majority to be fairly regular, e.g. images, office documents, pdf files. One of the very first accessions, however, was a collection of audio-visual material, which caused a number of issues, including

The need to research audio-visual compression as one of our access systems had a file size limitation, a fact we discovered the hard way!
Concerns over the time taken to create and validate checksums for large files.

Another set of audio-visual material raised a further issue: the need to open and view every file. This is an area which had been debated during the project, i.e. the extent to which we might have to interact with files to check their integrity. Issues here included:

Apparently ‘duplicate’ video files where one contained approximately one hour’s footage more than the other;
Files that wouldn’t play or convert using the creating software.

Discussions and debates within the project had often centred round what may or may not be practical to do. Manual intervention will be difficult to sustain when large numbers of files are involved. Yet, without manual intervention, I don’t think we would have identified and resolved the issues noted above.

Of course, manual intervention requires staff. On this front, things became tricky when the last member of our archival staff left the team! Answers to that problem on a postcard please!

We like to think that we are always looking ahead and keeping our eyes open for potential pitfalls. We currently have about 15TB of server storage of which we are using about 1TB. PRONI is now involved in a project that will result in the creation of digitised film and tape material. It is estimated that the first transfer of material will be approximately 200TB. We certainly did not anticipate the need to scale up the digital preservation system at anything like this rate. Heads are still being scratched on this one.

It’s clear that digital preservation system version 2 has to build upon the lessons and experiences of the last few years. If nothing else, it should be fun!

To find out more about PRONI, please visit our website and follow us on Facebook.

Add comment

Archiving Facebook, Right Now

Preserving Legislative Records: Why they matter and what the Nairobi City County Assembly can teach us

Hugh J Campbell