Blog

Unless otherwise stated, content is shared under CC-BY-NC Licence

PDF/Eh? redux: putting veraPDF into practice. Or how I rediscovered my inner geek

Paul Wheatley

Paul Wheatley

Last updated on 3 February 2017

Ancient history: how we got here

Way back in 2013 the DPC collaborated with the OPF on a project called SPRUCE. Following on from the success of another little project called AQUA, and with some very handy funding from the Jisc, we ran a bunch of mashup events and got hands on with all sorts of digital preservation challenges. The management of PDF files, and particularly risk assessment, was a recurring theme. In response, the SPRUCE project held a hackathon in Leeds where a host of DP geeks came up with a basic proof of concept for a PDF risk checker. Based on PDFBOX – a PDF/A validator – and with denizens of both Yorkshire and Canada in the room (plus a variety of other nationalities) it seemed entirely appropriate to call it PDF/Eh? For those unfamiliar with Yorkshire dialect, this probably won't help but is recommended nonetheless. A number of important elements (that would surface again in the future) were brought together at this hackathon, but the participants recognised that this would take a much bigger push and a dedicated project to do it "properly".

PDF is a preservation problem almost everyone has. It's certainly not the biggest problem out there, but it needs some work and it's a little surprising that as a community we haven't managed to nail it. PDF/A is beginning to form part of the solution but a standard needs to be adequately supported with tools. That's where veraPDF comes in.

Read More

What the world needs now

William Kilbride

William Kilbride

Last updated on 3 February 2017

Inaugurations

By coincidence I find myself working from home on the day a new US president is inaugurated. The coincidence: the last time an American president took office I was also working from home. Speeches and punditry burble through the kitchen radio as I hammer out a few more lines of something at the dining table. Same radio, same pundits, same table, very different presidents.   A different William too I suppose, mostly the effect of acquiring two children and a dog but also the result of completing 8 years at the DPC.  The DPC is very different, too, now.  So I find myself in reflective mood, considering the distances travelled and the road to come. 

I joined the DPC not long after Barrack Obama moved into the White House.  I moved my few boxes into that tiny office in HATII on 23rd February 2009. It’s worth recalling the difference between DPC 2009 and DPC 2017.  The most important change has been our membership, more-or-less doubling from 32 to 63.  It’s a remarkable statistic in the tough fiscal environment, and even more so when you realise how many of our 2009 members simply no longer exist.  Agencies like the Research Information Network, Museums Libraries and Archives Council, and the Centre for Digital Library Research didn’t so much leave the Coalition as expire.  The number doesn’t tell you about our growing diversity: how our interaction with banks, manufacturers and architects has set a trend for expansion into the commercial world.  And the numbers don’t tell you about how global the Coalition has become of late: the UN, NATO, UNHCR, the European Central Bank, the Academic Preservation Trust, The University of Berne, the International Criminal Tribunal have all joined in the last three years. 

Read More

A breakthrough year for web archiving in 2016?

Jane Winters

Jane Winters

Last updated on 3 February 2017

Anyone who works with web archives quickly becomes used to the fact that most people have not even heard of them – even fewer understand what they are and where you might be able to access them. In 2016, however, it seemed as though web archives began to filter into the public consciousness, to move from the technology pages of the more serious newspapers to the political and even cultural sections. In May 2016, for example, the BBC announced plans to close its Food website, removing approximately 11,000 well-used recipes from search engine results. There was an immediate public outcry – that combination of the BBC and food – and also some very welcome publicity for the value of web archives. The Mirror, for example, noted that ‘If all else fails, the Internet Archive has a record of almost all the BBC Food recipes (11,282 to be precise)’. Similarly, The Independent reported that ‘The easiest way to find any specific recipes is to head to the Internet Archive’s Wayback Machine, which keeps a catalogue of almost every website ever published’, with the subheading to the article warning that this ‘close call should remind us to store the websites that we care about’. The British Library got in on the act as well, blogging that ‘We have today instigated a further crawl of the BBC website with the specific aim of ensuring that we save the recipes from the food pages. We can also report that the Internet Archive, Library of Alexandria and the National Library of Iceland have also captured these pages so their future is assured’. Even more encouragingly, it was notable that in many cases below-the-line commenters independently volunteered web archiving, and especially the Internet Archive, as a solution to the problem.

Read More

Preservation Planning for Personal Digital Collections by Paul Wilson

Paul Wilson

Paul Wilson

Last updated on 6 March 2019

A condensed version of this case note also appears in the Technology Watch Report Personal Digital Archiving by Gabriela Redwine.

Paul Wilson’s case note summarizes his attempts to find a suitable preservation planning process and associated documentation to apply to his personal digital collections. Since he could find no preservation planning process appropriate to individuals, he obtained a slide set detailing a simple preservation workflow from the Digital Preservation Coalition, and used that as a foundation on which to establish an approach to the work. This general approach and accompanying documentation was tested and refined on two of his personal digital collections (one of 800 mementos and the other of 17,000 photos). Template documents were then derived from the results.

This case note describes the solutions best suited to Wilson's collections and resources, but the processes he has developed have a wide applicability to any personal or small collections. In the fuller article (which can be downloaded as a PDF below), Wilson narrates his experiences to provide insights into the practical outcomes of using published guidelines and tools for preservation planning. Individuals and small organisations will be able to replicate those actions described here that are relevant to their own situations. They will also be able to compare their own collections and circumstances with those in this case study in order to assess common conditions and challenges. All of the documents, as well as blank templates, are available below.

Read More

Business continuity procedures – UK Data Archive, University of Essex

Sara Day Thomson

Sara Day Thomson

Last updated on 13 December 2016

This case note was developed in 2015 as part of the work for the 2nd edition of the Digital Preservation Handbook.

Business Continuity planning and practice involves organizations proactively preparing for potential incidents and disruptions in order to avoid suspension of critical operations and services, or if operations and services are disrupted, that they resume operations and services as rapidly as required by those who depend on them. The development and use of a business continuity plan based on sound principles, endorsed by senior management, and activated by trained staff will greatly reduce the likelihood and severity of impact of disasters and incidents. It is an important component of ensuring bit preservation and makes a significant contribution to digital preservation through this.

The Data Archive is the UK national data centre for the Social Sciences funded by the Economic and Social Research Council (ESRC). The Archive holds certification to ISO 27001, the international standard for information security, which requires information security continuity to be embedded in an organisation's business continuity management systems. The digital storage system at the Data Archive is based, for security purposes, on segregated and distributed storage and access. Business continuity at the Data Archive is based around the resilience provided by creating multiple copies of the data and specified recovery procedures, alongside pre-emptive failure prevention. Each file from any dataset has at minimum three copies. The Archive also creates a read only archival copy of each study and any update as it is made available on the system.

Read More

Assessing long term access from short term digitization projects

Sara Day Thomson

Sara Day Thomson

Last updated on 13 December 2016

Appropriate and timely examination of the digital preservation plans of digitization projects can have a lasting impact. Projects may not know or understand the risks they run. Simple assessment can help them identify and address these risks sooner rather than later.

Digitization projects often - and sensibly - start by establishing and meeting the needs of a modern user community and are mostly funded over a short term. But the outputs from digitization projects are likely to be valuable in the long term, so how can we take steps to make the outputs of digitization robust in the long term? This case note reports some work undertaken by the University of London Computer Centre in assessing the long term plans of 16 digitization projects, providing a basic survey tool to help funders and project managers alike to relfect on the long term preservation plans.


Read More

Practical Preservation: West Yorkshire Archive Service accepts a digital collection

Sara Day Thomson

Sara Day Thomson

Last updated on 13 December 2016

Nobody has the perfect answer to digital preservation for every case. If we try we may fail; if we don’t try we will certainly fail.

Digital Preservation can be intimidating for organizations which have previously been used to managing and collecting paper archives. In this case note, staff from West Yorkshire Archives Service report on their experience in taking their first large digital archive. This made them confront new problems and new ways of working, they conclude that If we try we may fail; if we don’t try we will certainly fail.


Read More

Small Steps - Long View: how a museum service turned an oral history headache into an opportunity

Sara Day Thomson

Sara Day Thomson

Last updated on 13 December 2016

The benefits of digital preservation can be expressed in terms of new opportunities they create in the short and long term. Even relatively simple steps can bring early rewards if properly embedded within the mission of an organization.

This case note examines Glasgow Museums' approach to its large and growing digital collections. It describes how some simple steps in addressing digital preservation have created short and long term opportunities for the museums. They used some very traditional simple and well know approaches - creating an inventory, assessing significance and promoting access - as the basis for building confidence to manage the wider challenges they face.


Read More

ASR2: Using METS to keep data and metadata together for preservation

Sara Day Thomson

Sara Day Thomson

Last updated on 13 December 2016

Long-term access is improved when content and metadata are wrapped in a single package. In this way data managers will be able to access technical and administrative information with the content. The METS standard can help achieve this.

This case note examines the 'Archival Sound Recordings 2' project from the British Library, noting that one of the challenges for long term access to digitised content is to ensure that descriptive information and digitised content are not separated from each other. The British Library has used a standard called METS to prevent this.


Read More

Welsh Journals Online: Effective Leadership for a Common Goal

Sara Day Thomson

Sara Day Thomson

Last updated on 13 December 2016

Long-term access often requires co-operation from many staff. There is a risk that responsibilities are unclear. Consequently it is important that a senior member of staff is charged with delivering an organization’s digital preservation strategy.

This case note examines a complex digitisation project at the National Library of Wales from the perspective of the organisation. There are many parties with an interest in digital preservation and many different skills are required. This creates a risk which can be managed where an organisation is clear about where responsibility lies for preservation actions. The solution in this case was to nominate a single senior member of staff as the lead officer for digital preservation and allowing them to work across different sections of the institution to achieve a shared goal.


Read More

Scroll to top