DPC Events > Digital
Preservation in Institutional Repositories
Forum
Digital Preservation in Institutional Repositories
Report on the BL/CURL DPC Forum held at the British Library Conference
Centre, Tuesday 19th October 2004.
The 9th DPC Forum was a collaboration between CURL and the
British Library. The theme of institutional repositories was proposed
by CURL as being very timely as the move from theory to practice is likely
to accelerate, requiring more emphasis on sustainability and lessons
learned from the practical experience of early adopters. Clifford Lynch's
quote from a recent RLG DigiNews : 'An institutional repository needs
to be a service with continuity behind it........Institutions need to
recognize that they are making commitments for the long term.' Clifford
Lynch, 2004 http://www.rlg.org/en/page.php?Page_ID=19481#article0 was
used in promoting the Forum and several presenters used other pertinent
Lynch quotes. Themes emerging from the day were that there were many
challenges, but it was important to continue to gain practical experience
and build on experience and expertise. Some speakers also referred to
the current need to provide mediation for depositors of content but that
this was not scaleable. Ways and means of enhancing efficiency included
shared tools and services, such as the PRONOM file format registry, and
automating parts of the ingest processes.
In opening the Forum, Richard Ovenden, Keeper of Special Collections
at the Bodleian Library, set the institutional repository scene, as one
in which there is a gradual progression from theory to practice but uptake
has been slow (Introduction PDF
108KB). The purpose of this Forum would be to hear from the early adopters,
and listen and learn from them. The role and commitment of CURL to institutional
repositories and digital preservation was seen at task force level, in
individual CURL institutions, and through consortial activity. The role
of the DPC in setting the digital preservation agenda was now well known
and its value in training, information exchange and providing advice
and guidance was a valuable asset.
Delegates were referred to the JISC press releases contained in their
packs, which provided details of the successful proposals from the recent
4/04 Call on Digital Preservation and Institutional Asset management
and also the forthcoming Repositories programme call, which will be the
subject of two further calls in 2005 and indicated a major step forward
and a major investment by JISC.
Session 1 was chaired by Paul Ayris, Director of Library Services, University
College London, who introduced the first presentation by William Nixon,
Deputy Head of IT Services at the University of Glasgow who presented
a paper 'From ePrints to eSPIDA:
Digital Preservation at the University of Glasgow' (PDF 822KB). A
number of questions had been raised by the Glasgow experience, which
had started as a pilot service in 2001. Digital preservation was not
the primary focus as there was no content to preserve, but was becoming
more of an issue and providing the greatest challenge. We need rigorous,
robust preservation options if we are to move to the non-print world.
William also suggested that this may well prove to be a selling point
for academics in encouraging them to deposit their papers with the repository.
In reviewing progress to date, Nixon said that there was a need to transition
from project funding to embedding repositories into the bottom line of
institutions so that they can make a stewardship commitment without dependence
on project funding and move towards becoming a trusted digital repository.
John MacColl, Sub-Librarian, Digital Library, University of Edinburgh,
and and Jim Downing, Preservation Development Manager, DSpace@Cambridge
provided two perspectives of DSpace, as a manager of a repository service,
and as a developer of the preservation aspects of DSPace. John MacColl
drew attention to the services arising from project funding but which
could potentially fall into disrepute unless they are properly managed
over time (DSpace MacColl Presentation PDF
655KB). Digital preservation could be regarded as a high cost for individual
institutions to undertake and it might be necessary to make use of other
facilities. Advice and guidance were needed by the library community
and the Edinburgh would be looking to the DCC as a source of that technical
and practical guidance.
Jim Downing described the DSpace at Cambridge repository in which there
are no mandates on type of material or file formats but they do actively
provide advice on good practice (DSpace
Downing Presentation PDF 166KB). Better preservation metadata was
needed to support preservation planning. Tools such as PRONOM, which
are already available, are proving valuable in helping to provide monitor
technological obsolescence. Cambridge have been advised to retain human
readable action plans and to add automation, wherever feasible/appropriate,
but to retain human validation of automated steps. Currently DSpace at
Cambridge records all item and metadata changes but this would not be
scaleable. It would be necessary to refine policy and implementation.
The final session of the morning was a joint presentation on Storage
Resource Broker (SRB) at the AHDS (SRB
Presentation PDF 1.2MB). Hamish James provided an overview of what
SRB is and its role at AHDS. The SRB software assists in managing digital
objects scattered around multiple locations, a clear benefit for a distributed
service such as AHDS, which was moving from a loose federation of repositories
to a much more centralised preservation service, while still maintaining
its distributed nature. The collection was expected to grow to 10 TB
within the next two years, so any service must be scaleable. Andrew Speakman
then outlined some of the practical issues involved in installing SRB.
Andrew drew attention to a frequently recurring them in any discussion
of digital preservation, that of collaboration and the need to take advantage
of related effort which has already occurred. He also went on to outline
the pros and cons of SRB, pros included the ability to handle large networked
data volumes and high user acceptance. On the negative side, technical
support is not well advanced so there is a requirement for significant
in-house expertise as it is quite complex to install. In concluding Andrew
said that SRB has the potential to simplify day-to-day operations and
also to simplify distributed management of data and indicated that the
AHDS was looking for partners using SRB.
The afternoon session was chaired by Richard Boulderstone, Director
eStrategy, the British Library and began with a presentation 'Preserving
EPrints:Scaling the Preservation Mountain' (PDF 144KB) on the SHERPA
project presented by Sheila Anderson and co-authored with Stephen Pinfield.
Sheila outlined the SHERPA project objectives and partners Nottingham
(lead), Edinburhg, Glasgow, Leeds, Oxford, Sheffield, York, the British
Library, and AHDS. SHERPA is primarily concerned with e-prints, i.e.
a digital duplicate of an academic research paper that is made available
online as a means of improving access to the paper.
Differing views have been expressed on whether it is necessary to preserve
these documents but there is an opportunity here to move beyond saving
and rescuing digital objects to building the infrastructure required
to manage them from the start. A good start has been made in identifying
properties of e-prints, looking at selection and retention criteria,
preferred formats, rights issues etc. but none of these are 'doing' preservation.
Using the OAIS model as a guide, a preservation storage layer and preservation
planning (e.g. policies and procedures, risk assessment) needs to be
added, with preservation and administration metadata and preservation
protocols and processes in place.
A new two-year project, known as SHERPA DP, which is being led by AHDS
in partnership with Nottingham and 3-4 SHERPA partners and funded under
the recent JISC 4/04 Call has recently been announced. The aim of SHERPA
DP will be to develop a persistent preservation environment for SHERPA
partners based on the OAIS model and to explore the use of METS for packaging
and transferring metadata and content. A Digital Preservation User Guide
would be another practical deliverable from this project. The preservation
community would be looking to the DCC for support, particularly in functions
which are most appropriately centralised, such as technology watch.
The final presentation was from David Ryan, Head of Archives Services
and Digital Preservation at the National Archives, 'Delivering digital
records: towards a seamless flow'. David described the development of
the Digital Archive and key points needed for its success (TNA
Presentation Part 1 PDF 96KB), which were a strong business case
linked to core organisational aims, a good team, and the need to sell
the fact that this is not an insuperable problem. It has taken three
years for the Digital Archive to become a comprehensive service delivery
and all business targets have been met but it is critical to recognise
that stewardship is a long-term evolving business. In recruiting staff
it was essential to have the right technical skills, combined with the
ability to sell the work to others within the organisation (TNA
Presentation Part 2 PDF 90KB). The reality is that we must collect
e-records. The Digital Archive should be scaleable to 100TB, which is
way beyond current storage requirements though it is rapidly growing
(TNA Presentation Part 3 PDF
1MB). TNA works with government departments but the current procedures,
which tend to be case-by-case and handcrafted, was not scaleable (Editor's
note: a similar point was made by William Dixon in Glasgow's experience
of building their repository). Preservation planning is a key feature
of the Digital Archive, which must be able to accommodate changes in
preservation management over time. The main thing is to ensure that the
bitstream remains unharmed incase a different preservation strategy is
adopted (the current strategy is migration). Other TNA digital preservation
effort includes the PRONOM service (TNA
Presentation Part 4 PDF 613KB), which is now on Version 4 and is
designed to be the primary file format registry. PRONOM can be used to
help decisions about migration planning because it can indicate when
a file format is likely to become unsupported. The UK Central Government
Web Archive has captured c. 60 web sites to date and is currently held
separately from the Digital Archive but it was intended to bring the
two together. An issue is the size of the government website domain.
Finally the work of NDAD was described, and their role as contractor
for TNA in preserving data sets. Next steps would include a comparison
of the NDAD data model and the digital Archive data model. In closing,
David said that trusted digital repository certification was a key issue
and there was a need for a process to allow a federated system of preservation
and access.
A final panel session allowed delegates to put questions to all the
speakers.
|