DPC Members
LIFE Conference Summary
The LIFE Project: Bringing Digital Preservation to LIFE
Report on the LIFE conference at 20th April 2006
The report below presents a personal summary of each presentation given at the LIFE Project conference at the British Library, 20th April 2006. In addition to these summaries, details of the LIFE Project Conference can be found on the LIFE website events page.
Keynote: Eileen Fenton, Executive Director; Portico
‘New Economic Challenges: Lifecycle considerations for Electronic Resources’
Eileen Fenton provided an outline of the Portico project and how it is looking at the preservation of electronic journals and how the experience gained from this project may be applicable to other electronic resources. The presentation focussed on three main elements: the difference between print and electronic journals in terms of overall lifecycle cost, possible models for the digital preservation of electronic journals and the lessons gained from the project and suggestions for future work.
To further investigate the differences between print and e-journals, an initial study of eleven academic libraries in the US was carried out in order to track and assess the costs and activities associated with the management of both print and electronic journals. The study discovered that, on a title-by-title basis, the lifecycle management costs (not including subscription and licensing) were lower for e-journals that those associated with print journals and that these costs lay in different areas for each type of publication. The study also found that libraries were able to provide costs and activities for the actual preservation of print journals but not for those in an electronic format. This was not to say that libraries were not aware of costs being applicable to the preservation of e-journals but that the roles, responsibilities and methodologies involved in such preservation were often not clearly defined.
Two main possible models for digital preservation were highlighted during the presentation. The first of these models, that of an archival network, is exemplified by the US National Digital Information Infrastructure and Preservation Program (NDIIPP) which focuses on a distributed network of partners containing various centres of expertise with various technological solutions and economic models. This allows each centre to deal with specialist data but also to create a relevant and efficient economic model. The Portico project was used as the second digital preservation model example, that of a centralised community-based trusted third-party repository. Although centralised, the Portico repository has been built via collaborative input form the library community. The Portico repository receives e-journals in each publisher’s proprietary format but then migrate these to a neutral archival format based on the open standard Journal Archiving and Interchange DTD.
A number of early lessons that had emerged form the Portico project were discussed, the most significant of which was the importance of high quality ingest tools to the success and efficiency of digital preservation. The tools developed by the Portico project have been developed to apply to a wide range of journal formats in order to save developer time and reduce costs. Collaboration with content providers was also highlighted as being key to cutting costs.
Helen Shenton, Head of Collection Care, The British Library
‘The LIFEcycle model, from paper to digital’
Helen Shenton focussed on the definition of lifecycle collection management and its implementation at the British Library. Lifecycle collection management was defined as being intrinsic in the long term approach to the stewardship of collections ‘in perpetuity’. The twelve stages of the collection lifecycle were defined and their nature as being all interdependent on one another as well as having national and international implications was highlighted. The question ‘Why do it?’ was also raised and the practical, economic, political and governance benefits were highlighted. The time span of the lifecycle was also discussed and three key dates of one, ten and one hundred years (or ‘long term’) were discussed in terms of the changing proportion of costs incurred in each of the phases of the lifecycle as the age of the collection increased. Finally, the idea of a lifecycle as a way of thinking and of managing whole collections was suggested but it was also made explicit that the cost predicted from the use of such a model should not govern the ingest or selection of collections.
Neil Beagrie, BL / JISC partnership Manager, The British Library
‘Lifecycle modelling, the background’
Neil Beagrie presented an historical background to the development and uptake of lifecycle approaches from its early history and beginnings in such documents as the Terotechnology Handbook (1978) which introduced lifecycle costings and the concept of ‘total cost of ownership’ in relation to physical assets through to approaches developed by the AHDS and British Library in the 1990s and present day. A significant section of the presentation was devoted to the AHDS and JISC approaches of early intervention at the point of creation in order to reduce costs downstream (e.g. advice and guidance on resource creation provided at the grant application stage). This approach is evident in the publication by the AHDS of a number of ‘Guides to Good Practice’. A number of key reports and articles were also highlighted including Andy Stephens’ 1988 article on life cycle costing, Peter Hernon’s 1994 article on life cycles and US government information resources, Greenstein and Beagrie’s 1998 report on defining a framework for managing digital resources and Hendley’s 1998 development of this framework into a cost model.
Paul Wheatley, Digital Preservation Manager, Architecture and Development, The British Library
‘Modelling the digital preservation costs'
Paul Wheatley’s presentation aimed to demonstrate the work currently being carried out by the British Library into actually modelling and characterising the costs involved in digital preservation – an element of the overall collection lifecycle - rather than simply promoting vague strategies to deal with digital preservation. The LIFE Generic Preservation Model was presented in the form of a ‘preservation equation’ which aims to estimate the preservation cost of a number of objects of a specific file format over a period of time. The model uses a mixture of fixed and user defined components and includes elements representing various activities and concepts such as a technology watch (to identify when a format becomes obsolete and therefore incurs preservation costs). The complexity of the file format to be preserved was identified as a key element in determining the final cost of preservation through its implications on activities such as quality assurance and tool development or acquisition. A case study looking at archiving 225,079 GIF files estimated a total cost of preservation over ten years of £33,738 though a number of issues concerning this final amount were raised. Paul Wheatley also raised a number of other issues with the model such as the high level of estimation involved in producing a final cost. He also added that refinements to the model need to be made based on actual costings and experience and that the overall level of detail needs refining. Issues such as format complexity are also not fully captured by the linear style of the present model, nor is the issue of re-ingest. Overall it was concluded that estimation of cost is possible using such a model but that it is not easy. Such an estimation does, however, provide a useful perspective on performing preservation and doing this in a cost-effective manner.
Rory Mcleod, Digital Preservation Manager, Collection Care; The British Library and LIFE Project Manager
‘Legal deposit of digital materials case study’
As the first of three case studies, Rory MacLeod outlined the implementation at the British Library at Boston Spa of a lifecycle methodology in relation to material arriving via the VDEP (Voluntary Deposit of Electronic Publications) scheme. The amount of electronic material held on the servers at Boston Spa in February 2006 was estimated at 230,000 items covering four categories of material (Handheld monograph, electronic journal, handheld serial, electronic serial) and twenty-two different file types. The lifecycle model was applied to this material highlighting the relevant costs (mainly storage and preservation) and areas where no cost was incurred such as acquisition (due to voluntary deposit) and access (no access to materials provided at present). Costs were able to be estimated for the digital publications over a one year lifecycle and these ranged from £15 for an e-monograph to £206 for an e-journal.
Richard Boulderstone, Director of e-strategy and Information systems, the British Library
‘Web archiving case study’
The second case study looked at the application of the lifecycle collections management model to the archiving of web sites as carried out by the British Library. The British Library is required to archive web pages and has been working on collecting much of the UK domain for the last three years. The web archiving programme is a collaborative effort and is carried out via the British Library’s involvement in two separate consortia: the UK Web Archiving Consortium (UKWAC) and the International Internet Preservation Consortium (IIPC). The UKWAC approach to web archiving is a selective site-by-site one which aims to procure a common web archiving infrastructure. At present, the project only collects material using the PANDAS software developed by the National Library of Australia and has no preservation aspect. Again, the archive was applied step-by-step to the lifecycle model and preservation elements were looked at in detail using the preservation equation developed by Paul Wheatley. The presentation concluded by stating that the LIFE model does provide a useful insight into the costs involved in collection management and that, at present, web archiving could be seen as a comparatively expensive activity but that future work should look in reducing this.
Paul Ayris, Director of UCL Library Services and ICL Copyright Officer
‘UCL e-journals case study’
Paul Ayris began his presentation by outlining the difference between the UCL case study and those presented prior to it. He argued that, as an academic institution and not a national library, the focus of UCL was more on research and digital preservation as a means or integral part of e-journal dissemination but not an end in itself. The case study revolved around UCL’s five year strategy (2005-10) in which they must identify responsibilities for archiving both electronic and paper journals as well as implement a practical system for doing so. The strategy initially involved mapping current workflows to the elements identified in the LIFE model in partnership with Blackwell Publishers, the Public Library of Science and Ex Libris. This mapping was found to be a challenge as activity tracking and costing was not seen to be embedded in the Higher Education sector. Some parts of the LIFE model, such as ingest and storage, were also found not to be applicable to the way in which Higher Education institutions work with e-journals. The study concluded that the actual acquisition of e-journal content was the most expensive aspect of the lifecycle but that other costs which existed were also unknown and could therefore not be taken into account. A significant point that emerged was that Higher Education institutions need to start developing workflows to deal with e-content and to address the issue of local archiving.
Michael Jubb, Director of the Research Information Network
‘HE perspective, Researchers perspectives’
Michael Jubb began his presentation by stating that digital preservation is not high on the list of priorities for Higher Education researchers despite the issues involved being interesting, significant and important to them for a number of reasons. Not least among these reasons is the fact that researchers in the HE sector are both the producers and users of information and that, as a result, they place a high value on the dissemination and transfer of knowledge and the importance of access. This value can also be seen reflected at a governmental level in the 2004 OECD Science Ministerial’s Declaration on Access to Research Data from Public Funding. The high number of digital publications and outputs of research was highlighted in the presentation alongside the fact that little is know about how this online material is used and how much of it should be preserved and made accessible. Following on from this, and relating directly to cost models, Michael Jubb argued that work should be carried out on the definition of criteria for the selection of digital research materials for preservation as these directly influence the costs of preservation. The concluding argument was that any models for preservation need to be built with the involvement of the research community so that informed decisions can be made about exactly what to preserve and for how long. It was made clear that data produced by one sector of the research community may have long-term value whereas data produced by another may move from being immediately significant to completely valueless within a very short time frame and that cost models should take such changes into consideration.




































