LIFE Project Conference Report
The LIFE Project: Bringing Digital Preservation to LIFE
Report on the LIFE conference at 20th April 2006
The report below presents a personal summary of each presentation given
at the LIFE Project conference at the British Library, 20th April 2006.
In addition to these summaries, details of the LIFE Project Conference
can be found on the LIFE
website events page.
Keynote: Eileen Fenton, Executive Director; Portico
‘New Economic Challenges: Lifecycle considerations for Electronic
Resources’
Eileen Fenton provided an outline of the Portico project and how it
is looking at the preservation of electronic journals and how the experience
gained from this project may be applicable to other electronic resources.
The presentation focussed on three main elements: the difference between
print and electronic journals in terms of overall lifecycle cost, possible
models for the digital preservation of electronic journals and the lessons
gained from the project and suggestions for future work.
To further investigate the differences between print and e-journals,
an initial study of eleven academic libraries in the US was carried out
in order to track and assess the costs and activities associated with
the management of both print and electronic journals. The study discovered
that, on a title-by-title basis, the lifecycle management costs (not
including subscription and licensing) were lower for e-journals that
those associated with print journals and that these costs lay in different
areas for each type of publication. The study also found that libraries
were able to provide costs and activities for the actual preservation
of print journals but not for those in an electronic format. This was
not to say that libraries were not aware of costs being applicable to
the preservation of e-journals but that the roles, responsibilities and
methodologies involved in such preservation were often not clearly defined.
Two main possible models for digital preservation were highlighted during
the presentation. The first of these models, that of an archival network,
is exemplified by the US National Digital Information Infrastructure
and Preservation Program (NDIIPP) which focuses on a distributed network
of partners containing various centres of expertise with various technological
solutions and economic models. This allows each centre to deal with specialist
data but also to create a relevant and efficient economic model. The
Portico project was used as the second digital preservation model example,
that of a centralised community-based trusted third-party repository.
Although centralised, the Portico repository has been built via collaborative
input form the library community. The Portico repository receives e-journals
in each publisher’s proprietary format but then migrate these to
a neutral archival format based on the open standard Journal Archiving
and Interchange DTD.
A number of early lessons that had emerged form the Portico project
were discussed, the most significant of which was the importance of high
quality ingest tools to the success and efficiency of digital preservation.
The tools developed by the Portico project have been developed to apply
to a wide range of journal formats in order to save developer time and
reduce costs. Collaboration with content providers was also highlighted
as being key to cutting costs.
Helen Shenton, Head of Collection Care, The British Library
‘The LIFEcycle model, from paper to digital’
Helen Shenton focussed on the definition of lifecycle collection management
and its implementation at the British Library. Lifecycle collection management
was defined as being intrinsic in the long term approach to the stewardship
of collections ‘in perpetuity’. The twelve stages of the
collection lifecycle were defined and their nature as being all interdependent
on one another as well as having national and international implications
was highlighted. The question ‘Why do it?’ was also raised
and the practical, economic, political and governance benefits were highlighted.
The time span of the lifecycle was also discussed and three key dates
of one, ten and one hundred years (or ‘long term’) were discussed
in terms of the changing proportion of costs incurred in each of the
phases of the lifecycle as the age of the collection increased. Finally,
the idea of a lifecycle as a way of thinking and of managing whole collections
was suggested but it was also made explicit that the cost predicted from
the use of such a model should not govern the ingest or selection of
collections.
Neil Beagrie, BL / JISC partnership Manager, The British Library
‘Lifecycle modelling, the background’
Neil Beagrie presented an historical background to the development and
uptake of lifecycle approaches from its early history and beginnings
in such documents as the Terotechnology Handbook (1978) which introduced
lifecycle costings and the concept of ‘total cost of ownership’ in
relation to physical assets through to approaches developed by the AHDS
and British Library in the 1990s and present day. A significant section
of the presentation was devoted to the AHDS and JISC approaches of early
intervention at the point of creation in order to reduce costs downstream
(e.g. advice and guidance on resource creation provided at the grant
application stage). This approach is evident in the publication by the
AHDS of a number of ‘Guides to Good Practice’. A number of
key reports and articles were also highlighted including Andy Stephens’ 1988
article on life cycle costing, Peter Hernon’s 1994 article on life
cycles and US government information resources, Greenstein and Beagrie’s
1998 report on defining a framework for managing digital resources and
Hendley’s 1998 development of this framework into a cost model.
Paul Wheatley, Digital Preservation Manager, Architecture and Development,
The British Library
‘Modelling the digital preservation costs’
Paul Wheatley’s presentation aimed to demonstrate the work currently
being carried out by the British Library into actually modelling and
characterising the costs involved in digital preservation – an
element of the overall collection lifecycle - rather than simply promoting
vague strategies to deal with digital preservation. The LIFE Generic
Preservation Model was presented in the form of a ‘preservation
equation’ which aims to estimate the preservation cost of a number
of objects of a specific file format over a period of time. The model
uses a mixture of fixed and user defined components and includes elements
representing various activities and concepts such as a technology watch
(to identify when a format becomes obsolete and therefore incurs preservation
costs). The complexity of the file format to be preserved was identified
as a key element in determining the final cost of preservation through
its implications on activities such as quality assurance and tool development
or acquisition. A case study looking at archiving 225,079 GIF files estimated
a total cost of preservation over ten years of £33,738 though a
number of issues concerning this final amount were raised. Paul Wheatley
also raised a number of other issues with the model such as the high
level of estimation involved in producing a final cost. He also added
that refinements to the model need to be made based on actual costings
and experience and that the overall level of detail needs refining. Issues
such as format complexity are also not fully captured by the linear style
of the present model, nor is the issue of re-ingest. Overall it was concluded
that estimation of cost is possible using such a model but that it is
not easy. Such an estimation does, however, provide a useful perspective
on performing preservation and doing this in a cost-effective manner.
Rory Mcleod, Digital Preservation Manager, Collection Care; The British
Library and LIFE Project Manager
‘Legal deposit of digital materials case study’
As the first of three case studies, Rory MacLeod outlined the implementation
at the British Library at Boston Spa of a lifecycle methodology in relation
to material arriving via the VDEP (Voluntary Deposit of Electronic Publications)
scheme. The amount of electronic material held on the servers at Boston
Spa in February 2006 was estimated at 230,000 items covering four categories
of material (Handheld monograph, electronic journal, handheld serial,
electronic serial) and twenty-two different file types. The lifecycle
model was applied to this material highlighting the relevant costs (mainly
storage and preservation) and areas where no cost was incurred such as
acquisition (due to voluntary deposit) and access (no access to materials
provided at present). Costs were able to be estimated for the digital
publications over a one year lifecycle and these ranged from £15
for an e-monograph to £206 for an e-journal.
Richard Boulderstone, Director of e-strategy and Information systems,
the British Library
‘Web archiving case study’
The second case study looked at the application of the lifecycle collections
management model to the archiving of web sites as carried out by the
British Library. The British Library is required to archive web pages
and has been working on collecting much of the UK domain for the last
three years. The web archiving programme is a collaborative effort and
is carried out via the British Library’s involvement in two separate
consortia: the UK Web Archiving Consortium (UKWAC) and the International
Internet Preservation Consortium (IIPC). The UKWAC approach to web archiving
is a selective site-by-site one which aims to procure a common web archiving
infrastructure. At present, the project only collects material using
the PANDAS software developed by the National Library of Australia and
has no preservation aspect. Again, the archive was applied step-by-step
to the lifecycle model and preservation elements were looked at in detail
using the preservation equation developed by Paul Wheatley. The presentation
concluded by stating that the LIFE model does provide a useful insight
into the costs involved in collection management and that, at present,
web archiving could be seen as a comparatively expensive activity but
that future work should look in reducing this.
Paul Ayris, Director of UCL Library Services and ICL Copyright Officer
‘UCL e-journals case study’
Paul Ayris began his presentation by outlining the difference between
the UCL case study and those presented prior to it. He argued that, as
an academic institution and not a national library, the focus of UCL
was more on research and digital preservation as a means or integral
part of e-journal dissemination but not an end in itself. The case study
revolved around UCL’s five year strategy (2005-10) in which they
must identify responsibilities for archiving both electronic and paper
journals as well as implement a practical system for doing so. The strategy
initially involved mapping current workflows to the elements identified
in the LIFE model in partnership with Blackwell Publishers, the Public
Library of Science and Ex Libris. This mapping was found to be a challenge
as activity tracking and costing was not seen to be embedded in the Higher
Education sector. Some parts of the LIFE model, such as ingest and storage,
were also found not to be applicable to the way in which Higher Education
institutions work with e-journals. The study concluded that the actual
acquisition of e-journal content was the most expensive aspect of the
lifecycle but that other costs which existed were also unknown and could
therefore not be taken into account. A significant point that emerged
was that Higher Education institutions need to start developing workflows
to deal with e-content and to address the issue of local archiving.
Michael Jubb, Director of the Research Information Network
‘HE perspective, Researchers perspectives’
Michael Jubb began his presentation by stating that digital preservation
is not high on the list of priorities for Higher Education researchers
despite the issues involved being interesting, significant and important
to them for a number of reasons. Not least among these reasons is the
fact that researchers in the HE sector are both the producers and users
of information and that, as a result, they place a high value on the
dissemination and transfer of knowledge and the importance of access.
This value can also be seen reflected at a governmental level in the
2004 OECD Science Ministerial’s Declaration on Access to Research
Data from Public Funding. The high number of digital publications and
outputs of research was highlighted in the presentation alongside the
fact that little is know about how this online material is used and how
much of it should be preserved and made accessible. Following on from
this, and relating directly to cost models, Michael Jubb argued that
work should be carried out on the definition of criteria for the selection
of digital research materials for preservation as these directly influence
the costs of preservation. The concluding argument was that any models
for preservation need to be built with the involvement of the research
community so that informed decisions can be made about exactly what to
preserve and for how long. It was made clear that data produced by one
sector of the research community may have long-term value whereas data
produced by another may move from being immediately significant to completely
valueless within a very short time frame and that cost models should
take such changes into consideration. |