DPC

Infrastructure and Development

Programme

9.30 - 10.00

Registration and Coffee

   

10.00 - 10.10

Welcome and Introduction
Gerry Slater, Chief Executive PRONI, Member of DPC Board of Directors

   
Session 1

Developing Infrastructure for Digital Preservation in the UK

   

10.10 - 10.40

The DPC Business Plan and the Proposed UK Needs Survey (PDF 105KB)
Duncan Simpson, Consultant, Responsible for preparation of the DPC Business Plan

10.40 - 11.10

The Digital Curation Centre (PDF 149KB)
Neil Beagrie, Programme Director, JISC

11.10 - 11.40

File Format Registries (PDF 126KB)
Derek Sergeant, Leeds University

11.40 - 12.10

The Needs of E-Science: Report Recommendations (PDF 407KB)
Phillip Lord, Director and Alison Macdonald, Senior Consultant
The Digital Archiving Consultancy

12.10 - 12.30

Discussion

   

12.30 - 1.30

Lunch

   
Session 2

Current Research and Developments

   

1.30 - 2.00

Conservation and Management of Digital Works of Art (PDF 1.08MB)
Pip Laurenson, Sculpture Conservator for Electronic Media, Tate

2.00 - 2.30

The AHDS Preservation Review (PDF 143KB)
Sheila Anderson, Director and Hamish James, Collections Manager AHDS

2.30 - 3.00

Archiving Subscription E-Journals (PDF 129KB)
Maggie Jones, JISC

   

3.00 - 3.30

Coffee

   

3.30 - 4.00

Report from DPC Web-archiving Special Interest Group (PDF 116KB)
Deb Woodyard, Digital Preservation Co-ordinator, The British Library

   

4.00 - 4.30

Concluding Discussion

Read More

Open Source and Dynamic Databases

Programme

In keeping with using the Forums as a means to keep participants up to date with the latest developments in digital preservation, the 6th DPC Forum focussed on Open Source Software and Dynamic Databases, both of which have been the subject of debate and speculation. A stimulating and thought provoking day began with four presentations on OSS.

Alan Robiette provided a comprehensive, historical overview of the development of OSS and its pros and cons. A key drawback (lack of support) was echoed in other presentations. The early stages of the MIT-Cambridge DSpace collaboration were described in Julie Walker and Anne Murray's presentation. William Nixon discussed the issues in building a network of institutional repositories as part of the DAEDALUS project at the University of Glasgow.

Jo Pettit described the work of the National Archives of their trials and pilots programme in addressing some of the practical issues they are facing. Open source software testing is a part of this programme. Demonstrations of OCLC'S Digital Archive and DSpace were provided during the lunch break.

The afternoon session had three presentations on archiving dynamic databases. Peter Bunemann's presentation was on archiving scientific data and referred to the tension between preserving scientific data frequently, which is space consuming, and infrequently, resulting in delays. Experimental work being undertaken on data structures offers some promise for affordable, persistent scientific archives.

Bryan Lawrence described the role of the British Atmospheric Data Centre, NERC's designated centre for atmospheric data and referred to the influence of the OASI model in recognising that data storage is part of a wider picture of consumers and producers, with data repositories acting as facilitators between the two. Finally Cathy Smith provided a lively update on the evolution of the BBS website and issues involved in archiving it.

9.30 - 10.00 

Registration and Coffee

10.00 - 10.10

Welcome and Introduction

 

Session 1 - Open Source and Digital Preservation

   

10.10 - 10.40 

Open Source & Commercial Software (PDF 194KB)
Alan Robiette, Programme Director, JISC

10.40 - 11.10

The DSpace at Cambridge Project (PDF 547KB)
Anne Murray, Cambridge University and Julie Walker, MIT

11.10 - 11.40

Experiences With E-prints and Dspace (PDF 943KB)
William Nixon, Glasgow University

11.40 - 12.10

The Open Source Evaluation Project at the National Archives (PDF 1.9MB)
Jo Pettitt, National Archives

   

12.10 - 12.30 

Discussion

   

12.30 - 2.00

Lunch and Demonstrations
(DSpace and OCLC Digital Archive)

 

Session 2 - Approaches to Archiving Dynamic Databases

   

2.00 - 2.30

Archiving Dynamic Databases (PDF 436KB)
Professor Peter Buneman, Edinburgh University

2.30 - 3.00

Experiences with Archiving Databases in BADC (PDF 2.42MB)
Bryan Lawrence, British Atmospheric Data Centre

   

3.00 - 3.30

Coffee

   

3.30 - 4.00

Ten Years on the Web: Archiving BBCi Online (PDF 632KB)
Cathy Smith, BBC

   

4.00 - 4.30

Concluding Discussion

Read More

Archives: adapting to the digital age

Report on the DPC Forum, Archives: adapting to the digital age

Held at the National Archives, Kew, Wednesday 24th September 2003

Around 40 participants attended the 7th DPC Forum, which was held at the National Archives, Kew. The Forum was timed to coincide with Archives Awareness month so it was appropriately held at TNA and focussed on archives in the digital age. It also coincided with the anticipated Autumn internet launch of TNA's PRONOM database (http://www.nationalarchives.gov.uk/preservation/
webarchive/default.htm/pronom/documentation.htm

Update 26 September 2007
This link no longer active; information on PRONOM can be found at
http://www.nationalarchives.gov.uk/aboutapps/pronom/ )
and the UK Central Government Web Archive
(http://www.nationalarchives.gov.uk/preservation/webarchive

Update 03 October 2007
New location:
http://www.nationalarchives.gov.uk/preservation/archivedwebsites.htm
).  Demonstrations of both of these were provided to participants in the afternoon sessions.

David Thomas, Director, Government and Archive Services at TNA, chaired the Forum and in his welcome and introduction, noted that TNA was in a process of change but now has "real stuff" to show, as opposed to abstract discussions. The following sessions, which preceded demonstrations of PRONOM, the Digital Archives, and tours of TNA, were informative, thoughtful, and stimulated lively discussion.

Session 1: Electronic Records Management

Richard Blake, Head of the TNA's Records Management Advisory Service, placed ERM in a strategic framework. He stressed that this was an issue affecting any business, not just archives, if material is to be held for more than five years, preservation issues will inevitably arise. It is necessary to ensure that records are kept useable, whether they are being kept for twenty years or taken into an archive for permanent retention.

He referred to BS ISO 15489, which is the first international standard for records management. There were however some problems in applying this standard in practice as it doesn't define in enough detail what each of the four key characteristics (authenticity; reliability; integrity; usability) is. The presentation looked at issues associated with each of these four characteristics and the major theme running through each of these was the need to ensure that records management systems function within a strong intellectual framework that articulates such details as, for example, what additions and annotations are permissible. It needs to go beyond "just buying the technology" in order to ensure that the authenticity, reliability, and integrity of electronic records are not to be challenged in future. The TNA provides a guidance role for and most of these are available from the TNA website. Finally, Richard noted that in such a new and rapidly evolving area, we have to accept that mistakes will be made but we should at least be able to understand why we made them.

Stuart Orr, Assistant Director in the Information and Workplace strategies Directorate of DTI provided a useful case study of how ERM was implemented in DTI. The problem was that there had been a practice of increasing devolvement in government departments during the previous government. This had led to difficulties in sharing information and storing it in non standard ways. The Secretary of State for DTI, Patricia Hewitt, recognised the need for improved means of sharing and storing information so that a better service could be provided for those seeking information from DTI. The Matrix project was developed to address this problem and was rolled out across 22 sites within the U.K, with c. 5,000 users. The presentation described what Matrix will and will not provide, for example it will be expected to support collaborative working but it will not introduce the paperless office. It will also not work unless there is investment of time and effort, people need to input quality information and this is difficult to control in a devolved environment. Around 60 people worked on the Matrix project, including a full-time communications manager. A step-by-step approach was taken, beginning in May 2000, with plan and prototype, leading on to testing, trailing, and finally leading to rollout in May 2002. Bringing staff on board and providing training were seen as key elements and Stuart said that they could have invested even more in training. In terms of long-term preservation, DTI still has a lot of questions. They are looking to TNA for advice and are conscious of the need for caution before investing in preservation infrastructure.

Session 2: Collecting and preserving digital materials

Kevin Schurer, Director of the UK Data Archive and the recently established Economic and Social Data Service provided a fascinating historical overview of the first 35 years of the UK Data Archive. Kevin noted that the UKDA is not a legal repository in the sense that TNA is but their service goes well beyond data delivery so there are synergies between the two. There are many changes that have occurred in the 35 years since the UKDA was established. The material has diversified so that not only survey data, but, for example, sound recordings and pictures are now included in data collections. The number of users has increased greatly and has doubled over the past few years. Formats have changed, in particular since the mid 1990's. Until then, magnetic tape was the dominant input and dissemination medium. CD-ROM's and web delivery have now become much more prevalent. There were still older forms, such as punched cards in the collection and a new punch card reader was purchased recently (though it had been difficult to locate!). While preservation was not seen as an issue when the UKDA was being set up in the 1960's, it has become an issue because of the emphasis on providing research material to the academic sector. This inevitably leads to preservation issues needing to be addressed in order to keep the material useable. The Data Exchange Initiative was seen as a potential bridge between the need to push material out in user friendly formats, while still retaining data in XML, which makes preservation simpler to manage. The XML schema will commence in early 2004 and will be a two year project. While the UKDA was working with a limited sub-set of digital information, it must still deal with most of the challenges which legal archives need to address, because of their remit to provide access to research materials. Kevin provided copies of the UKDA's preservation policy to participants [Note: this will be available from the member's pages on the DPC website in the near future].

David Ryan, Head of Archive Services at TNA gave the final presentation of the morning. David described the TNA's work on web archiving. There were two broad types of approach to web archiving, selective and harvesting, and David outlined the pros and cons of each before describing the approach being taken by TNA. This was to evaluate a number of technical approaches, develop a selection policy for websites, work with government departments to develop guidance, and develop long-term preservation strategies. The Modernising Government white paper provided the impetus for using the web as a communication mechanism with its target of all Government services being available online by 2005. The spin-off benefit of this is that preservation and presentation are brought together, people can see both the benefits and the limitations of what current technology can provide. Issues include the size of the domain, estimated at c.2,500, though this is difficult to track as not all are called .gov.uk; increasingly dynamic (and therefore more complicated) content; copyright; and legal deposit.

To open PDFs you will need Adobe Reader

Programme 

9.30  Registration and Coffee
10.00  Introduction and welcome, David Thomas
  Session 1 - Electronic Records Management - Chair - David Thomas
10.15 Electronic Records Management - the role of TNA (PDF 191KB)
Richard Blake
10.45 Introducing ERM at DTI (PDF 1.6MB)
Stuart Orr
11.15 Short break
  Session 2 - Collecting and preserving digital materials - Chair - David Thomas
11.30 The UK Data Archive and the Experience of Digital Preservation (PDF 311KB)
Kevin Schurer
12.00 Collecting Government websites at TNA (PDF 1.3MB)
David Ryan
12.30 Lunch
13.30 Stream 1:  Demonstrations of Digital Archive and PRONOM
by Adrian Brown and Jo Pettitt
  Stream 2:  Tour of TNA led by Kelvin Smith
14.30 Short Break
14.40 Stream 1:  Tour of TNA led by Kelvin Smith
  Stream 2: Demonstrations of the Digital Archive and PRONOM
  By Adrian Brown and Jo Pettitt
15.40 Discussion and final wrap-up
16.00 Close
Read More

Library of Congress and DPC sign agreement

Added on 23 June 2004

DPC signs Memorandum of Unverstanding with the The Library of Congress

ndiipp

The National Digital Information Infrastructure and Preservation Program of the Library of Congress (NDIIPP) was established after Congress gave approval to the Library of Congress to develop the program in December 2000. In January 2004, Congress approved the Library of Congress's plan for NDIIPP, which will enable the Library of Congress to launch the first phase of building a national infrastructure for the collection and long-term preservation of digital content. Funds released will allow testing various technical models for capturing and preserving content.

Read More

Digital Preservation: the global context

Report on the DPC Forum held at the British Library Conference Centre, Wednesday 23 June 2004.

The 8th DPC Forum attracted the biggest audience to date for a DPC Forum. Around 100 delegates were kept interested and informed by a very rich programme with presentations from several experts from the U.S and Europe. One key theme running throughout the day was the need for active collaboration at every level and across sectoral and geographic boundaries. Speaker after speaker illustrated how this collaboration was essential. Other consistent messages were the importance of trust (between partners and stakeholders in the emerging technical infrastructure), the need to find effective mechanisms to distribute responsibilities, developing standards and tools and above all, the need to develop and share practical experience.

speakers040623

Speakers from the 8th DPC Forum, Digital Preservation: the global context
L to R Taylor Surface, OCLC; Robin Dale, RLG; Nancy McGovern, Cornell; Peter Burnhill, DCC; Seamus Ross, HATII;
Eileen Fenton, Ithaka; David Seaman, DLF; Vicky Reich, LOCKSS; Tony Hey, eSCP; Laura Campbell, Library of Congress

Delegates received a sense of the broad range of activities going on, the progress that has been made, and the increasingly compelling need to accelerate progress. Feedback from the Forum, both formal and informal, has been overwhelmingly positive and is indicative of the consistently high quality of the presentations and a stimulating and thought provoking programme.

On the evening before the Forum, there was the presentation of the annual Conservation Awards, which included the inaugural DPC Award for Digital Preservation. This was won by the National Archives, for their Digital Archive. The CAMiLEON had received a special commendation. This event was regarded as another stepping-stone on the way to raising the profile of digital preservation. The Forum was equally important, bringing together people from all over the world, recognising the need for international collaboration, noting that no one can do this on their own.

Lynne Brindley, Director of the British Library and Chair of the DPC Board, chaired the day and noted in her welcome the importance the DPC placed on international links and the need to ensure that digital preservation issues are increasingly on the political and policy agendas. The DPC is committed to making practical progress and to sharing best practice through its membership.

Ms Brindley introduced the first speaker, Laura Campbell, Associate Director for Strategic Initiatives at the Library of Congress, who provided an up-to-date picture of what the Library of Congress was doing through their NDIIPP (PDF 772KB) (National Digital Information Infrastructure and Preservation Program) program. Ms Campbell described NDIIPP, which developed as a result of a report commissioned by the Library of Congress to assess whether it was prepared for the 21st century. Much experience in digital technology had come from building their Digital Library and they had learned the power of digital surrogates as well as their vulnerability to loss.

The $US175m plan consisted of $5m approved by Congress to produce a plan, $20m upon approval of the plan, and a further $75m which would be contingent on obtaining matching funds. Scenario planning helped show how a distributed effort might operate.

Key lessons and messages of NDIIPP to date include the belief that there will never be a single right way of doing things, so the architecture needs to be sufficiently modular and flexible to take account of this, the need for a distributed and decentralised approach and the need for new tools and technologies. NDIIPP needs to build partnerships and networks and then create a technical infrastructure to support the partners. Partnerships already forged included an alliance with the DPC, helping to establish the International Internet Preservation Consortium (IIPC), business model partnerships such as subscription and archiving services for e-journals, and technical partnerships, taking full advantage of the skilful technical talent which exists.

The next stage of NDIIPP would include testing architectures to support archive ingest and handling. In summing up, Ms Campbell indicated that during the next five years, the intention was to form a range of formal partnerships, encourage standards for digital preservation, establish a governing body, and make recommendations to Congress for funding.

Seamus Ross, Director of HATII (Humanities advanced Technology and Information Institute), provided an introduction to ERPANET (PDF 1087KB) (Electronic Resource Preservation and Network), the European Commission funded project which has brought together partners from Italy, the Netherlands, U.K and Switzerland. ERPANET has created a number of resources, organised seminars on several key topics, carried out an analysis of relevant literature and developed other tools, such as business cases for digital preservation and off-the-shelf policy statements. It was stressed that a lot of expertise already exists but there is a pressing need to bring it together and to work together.

Lessons learned were that the digital preservation community needs practical case studies and reports of "real world" experience. Simple tools for costing digital preservation exist but much more work needs to be done here. Guidance on digital repository design is also needed. ERPA E-prints (a repository of digital preservation papers and reports) is growing very slowly and needed to be marketed better. ERPANET have negotiated with the Swiss National Archives to preserve material held in this repository in perpetuity. In summing up, Dr Ross emphasised the great need for knowledge sharing so ERPANET events and DPC Forums were extremely important in helping to raise the level of awareness and understanding.

Presentations from David Seaman (Director of the Digital Library Federation); Robin Dale (Program Officer for the Research Libraries Group); and Taylor Surface (OCLC), described the work their organisations are doing in developing practical, collaborative tools, all of which will play a role in increasing trust in the developing infrastructure for digital preservation.

David Seaman's presentation, 'Towards a Global File Format Registry' (PDF 67KB) described the developing global file format registry, which is responding to an immediate need. The importance and value of linking to other relevant work, such as the National Archives' PRONOM system and the DCC (Digital Curation Centre) in the UK, was also stressed.

The title of Robin Dale's presentation, 'The Devil's in the Details- working towards global consensus for digital repository certification' (PDF 77KB), aptly summarised the challenge of articulating and reaching broad consensus on what elements and what process can be put in place to certify digital repositories against a commonly understood standard.

Taylor Surface described the work of OCLC's Digital Collections and Preservation Services in 'the OCLC Registry of Digital Masters' (PDF 464KB), which arose from a DLF Steering Committee recommendation. Taylor described how the registry linked to the OCLC's WorldCat service to provide enhanced discovery , encourage use of standards and limit duplication of effort of digitisation initiatives.

During the lunch break, Lynne Brindley and Laura Campbell signed an agreement between the DPC and the Library of Congress. A poster session on the Digital Curation Centre gave delegates the opportunities to ask specific questions before the afternoon presentations.

The afternoon session began with Tony Hey, Director of the e-Science Core programme. Tony's presentation was 'e-Science - preserving the data deluge' (PDF 543KB) . The e-science grid (or cyberinfrastructure as it's known in the U.S) has the vision enunciated by Licklider, of being able to bring together all material throughout the world and build a truly global, collaborative environment which enabled researchers to work together regardless of geographic location. Describing the impetus for the development of the DCC, Dr Hey said that over the next 5 years, e-science will produce more scientific data than has been collected in the whole of human history. The goal is to bring together the digital library community with the scientific community so that each can learn from the other.

Peter Burnhill, Interim Director of the DCC, described 'The Digital Curation Centre' (PDF 146KB)which has received funding of £1.3m p.a. from JISC and the e-Science Core programme. The DCC was not a digital repository, he said, but would provide services and research for the community involved in digital preservation. It is still very early days, in that the DCC has only been operational for a few months but progress has been made. A website has been launched, an e-journal is planned and focus groups would help to articulate who the user community for the DCC is and what their needs are. It was anticipated that a permanent Director would be in post by the official launch, scheduled for early November.

Nancy McGovern spoke of 'The Cornell Digital Preservation Online Tutorial and Workshop (PDF 419KB). This is yet another illustration of the pressing need to develop practical support for those already involved in, or about to embark on, digital preservation programmes. It was also another example of the strength of collaboration, as the curriculum had been developed collaboratively and Cornell looked forward to working closely with the DPC, who have been inspired by Cornell's work to develop a similar programme geared towards the U.K. Nancy described the five organisational stages of digital preservation which are: Acknowledge; Act; Consolidate; Institutionalise; Externalise. Nancy noted that none of these stages can be skipped and it was essential to realise that there is no on/off switch for digital preservation, it is something which needs to build over time. Cornell has now run four workshops which have received very positive feedback from participants. All have been oversubscribed, which illustrates the need for intensive training which provides a toolkit to enable participants to take practical short-term strategies appropriate to their own institutional settings. A fifth workshop is planned for November 2004.

The final session of the day provided an opportunity to hear two very different approaches to preserving e-journals. Vicky Reich described the 'LOCKSS Program approach' (PDF 804KB), applicable to any content available through http protocol, and which enables libraries to collect and preserve content in the same way as they do for print. Vicky stressed that LOCKSS preserves the content, not the services publishers provide (e.g. search buttons). LOCKSS has established contact with several publishers and it is essential to have the cooperation of publishers to allow LOCKSS crawlers to gather their content. Trust was also an issue here - publishers needed to trust that libraries would gather content they have purchased under licence. Key advantages of LOCKSS are its inbuilt redundancy and ease and cheapness to install. Vicky stressed that some institutions needed to have large, central repositories as well but this need not preclude the use of LOCKSS.

The final speaker of the day was Eileen Fenton, on 'Preserving e-journals, the JSTOR model' (PDF 58KB). The Electronic Archiving Initiative has involved working with publishers and is focused on preserving the source files. Archiving e-journals requires a significant investment in the development of organisational and technological infrastructure, it was not either/or. Eileen also described Ithaka, a not-for-profit company, supported by Mellon, Hewlett and Niarchos funding. This has the goal of filling gaps not being supplied by the free market. Both Eileen and Vicky agreed that at this nascent stage of development, the community needs multiple approaches.

In closing the Forum, Lynne Brindley thanked all of the speakers for the significant contribution that had made to the success of the Forum. The next DPC Forum will be a joint DPC/CURL event and will be held on Tuesday 19 October 2004. Further details will be available in the coming months.

Read More

UK Needs Assessment

A UK Needs Assessment was identified as a key priority in the DPC Business Plan for 2003-2006 in order to identify the volume and level of risk and assigning priorities for action. The first stage of this exercise was a DPC Members survey. This was carried out between August 2003 and March 2004, with a Workshop in November 2003 to discuss preliminary results and determine further action required. The survey form used is available below and the final report of the DPC Members survey and annexes, by Duncan Simpson, who was commissioned to undertake the survey on behalf of the DPC are also available. The report of the Workshop is also available below, and this Powerpoint slide presented by Duncan Simpson at the Workshop, indicates the proposed timeframe for the UK Needs Assessment, assuming funding for key initiatives.

Other deliverables from the survey are the map of DPC members, which provides details of each DPC member and their interest in digital preservation and (where appropriate) what material they have undertaken responsibility for. The table of DPC Member projects was also derived from the survey, and will be periodically updated. A related follow up task was Scenarios of Risks of Data Loss, real-life examples where data was either lost or at risk, provided by some DPC members and collated by Duncan Simpson. This is available below.

Read More

Digital Preservation in Institutional Repositories

The 9th DPC Forum was a collaboration between CURL and the British Library. The theme of institutional repositories was proposed by CURL as being very timely as the move from theory to practice is likely to accelerate, requiring more emphasis on sustainability and lessons learned from the practical experience of early adopters. Clifford Lynch's quote from a recent RLG DigiNews : 'An institutional repository needs to be a service with continuity behind it........Institutions need to recognize that they are making commitments for the long term.'

Clifford Lynch, 2004 http://www.rlg.org/en/page.php?Page_ID=19481#article0 was used in promoting the Forum and several presenters used other pertinent Lynch quotes. Themes emerging from the day were that there were many challenges, but it was important to continue to gain practical experience and build on experience and expertise. Some speakers also referred to the current need to provide mediation for depositors of content but that this was not scaleable. Ways and means of enhancing efficiency included shared tools and services, such as the PRONOM file format registry, and automating parts of the ingest processes.

In opening the Forum, Richard Ovenden, Keeper of Special Collections at the Bodleian Library, set the institutional repository scene, as one in which there is a gradual progression from theory to practice but uptake has been slow (Introduction PDF 108KB). The purpose of this Forum would be to hear from the early adopters, and listen and learn from them. The role and commitment of CURL to institutional repositories and digital preservation was seen at task force level, in individual CURL institutions, and through consortial activity. The role of the DPC in setting the digital preservation agenda was now well known and its value in training, information exchange and providing advice and guidance was a valuable asset.

Delegates were referred to the JISC press releases contained in their packs, which provided details of the successful proposals from the recent 4/04 Call on Digital Preservation and Institutional Asset management and also the forthcoming Repositories programme call, which will be the subject of two further calls in 2005 and indicated a major step forward and a major investment by JISC.

Session 1 was chaired by Paul Ayris, Director of Library Services, University College London, who introduced the first presentation by William Nixon, Deputy Head of IT Services at the University of Glasgow who presented a paper 'From ePrints to eSPIDA: Digital Preservation at the University of Glasgow' (PDF 822KB). A number of questions had been raised by the Glasgow experience, which had started as a pilot service in 2001. Digital preservation was not the primary focus as there was no content to preserve, but was becoming more of an issue and providing the greatest challenge. We need rigorous, robust preservation options if we are to move to the non-print world. William also suggested that this may well prove to be a selling point for academics in encouraging them to deposit their papers with the repository. In reviewing progress to date, Nixon said that there was a need to transition from project funding to embedding repositories into the bottom line of institutions so that they can make a stewardship commitment without dependence on project funding and move towards becoming a trusted digital repository.

John MacColl, Sub-Librarian, Digital Library, University of Edinburgh, and and Jim Downing, Preservation Development Manager, DSpace@Cambridge provided two perspectives of DSpace, as a manager of a repository service, and as a developer of the preservation aspects of DSPace. John MacColl drew attention to the services arising from project funding but which could potentially fall into disrepute unless they are properly managed over time (DSpace MacColl Presentation PDF 655KB). Digital preservation could be regarded as a high cost for individual institutions to undertake and it might be necessary to make use of other facilities. Advice and guidance were needed by the library community and the Edinburgh would be looking to the DCC as a source of that technical and practical guidance.

Jim Downing described the DSpace at Cambridge repository in which there are no mandates on type of material or file formats but they do actively provide advice on good practice (DSpace Downing Presentation PDF 166KB). Better preservation metadata was needed to support preservation planning. Tools such as PRONOM, which are already available, are proving valuable in helping to provide monitor technological obsolescence. Cambridge have been advised to retain human readable action plans and to add automation, wherever feasible/appropriate, but to retain human validation of automated steps. Currently DSpace at Cambridge records all item and metadata changes but this would not be scaleable. It would be necessary to refine policy and implementation.

The final session of the morning was a joint presentation on Storage Resource Broker (SRB) at the AHDS (SRB Presentation PDF 1.2MB). Hamish James provided an overview of what SRB is and its role at AHDS. The SRB software assists in managing digital objects scattered around multiple locations, a clear benefit for a distributed service such as AHDS, which was moving from a loose federation of repositories to a much more centralised preservation service, while still maintaining its distributed nature. The collection was expected to grow to 10 TB within the next two years, so any service must be scaleable. Andrew Speakman then outlined some of the practical issues involved in installing SRB. Andrew drew attention to a frequently recurring them in any discussion of digital preservation, that of collaboration and the need to take advantage of related effort which has already occurred. He also went on to outline the pros and cons of SRB, pros included the ability to handle large networked data volumes and high user acceptance. On the negative side, technical support is not well advanced so there is a requirement for significant in-house expertise as it is quite complex to install. In concluding Andrew said that SRB has the potential to simplify day-to-day operations and also to simplify distributed management of data and indicated that the AHDS was looking for partners using SRB.

The afternoon session was chaired by Richard Boulderstone, Director eStrategy, the British Library and began with a presentation 'Preserving EPrints:Scaling the Preservation Mountain' (PDF 144KB) on the SHERPA project presented by Sheila Anderson and co-authored with Stephen Pinfield. Sheila outlined the SHERPA project objectives and partners Nottingham (lead), Edinburhg, Glasgow, Leeds, Oxford, Sheffield, York, the British Library, and AHDS. SHERPA is primarily concerned with e-prints, i.e. a digital duplicate of an academic research paper that is made available online as a means of improving access to the paper.

Differing views have been expressed on whether it is necessary to preserve these documents but there is an opportunity here to move beyond saving and rescuing digital objects to building the infrastructure required to manage them from the start. A good start has been made in identifying properties of e-prints, looking at selection and retention criteria, preferred formats, rights issues etc. but none of these are 'doing' preservation. Using the OAIS model as a guide, a preservation storage layer and preservation planning (e.g. policies and procedures, risk assessment) needs to be added, with preservation and administration metadata and preservation protocols and processes in place.

A new two-year project, known as SHERPA DP, which is being led by AHDS in partnership with Nottingham and 3-4 SHERPA partners and funded under the recent JISC 4/04 Call has recently been announced. The aim of SHERPA DP will be to develop a persistent preservation environment for SHERPA partners based on the OAIS model and to explore the use of METS for packaging and transferring metadata and content. A Digital Preservation User Guide would be another practical deliverable from this project. The preservation community would be looking to the DCC for support, particularly in functions which are most appropriately centralised, such as technology watch.

The final presentation was from David Ryan, Head of Archives Services and Digital Preservation at the National Archives, 'Delivering digital records: towards a seamless flow'. David described the development of the Digital Archive and key points needed for its success (TNA Presentation Part 1 PDF 96KB), which were a strong business case linked to core organisational aims, a good team, and the need to sell the fact that this is not an insuperable problem. It has taken three years for the Digital Archive to become a comprehensive service delivery and all business targets have been met but it is critical to recognise that stewardship is a long-term evolving business. In recruiting staff it was essential to have the right technical skills, combined with the ability to sell the work to others within the organisation (TNA Presentation Part 2 PDF 90KB). The reality is that we must collect e-records. The Digital Archive should be scaleable to 100TB, which is way beyond current storage requirements though it is rapidly growing (TNA Presentation Part 3 PDF 1MB). TNA works with government departments but the current procedures, which tend to be case-by-case and handcrafted, was not scaleable (Editor's note: a similar point was made by William Dixon in Glasgow's experience of building their repository). Preservation planning is a key feature of the Digital Archive, which must be able to accommodate changes in preservation management over time. The main thing is to ensure that the bitstream remains unharmed incase a different preservation strategy is adopted (the current strategy is migration). Other TNA digital preservation effort includes the PRONOM service (TNA Presentation Part 4 PDF 613KB), which is now on Version 4 and is designed to be the primary file format registry. PRONOM can be used to help decisions about migration planning because it can indicate when a file format is likely to become unsupported. The UK Central Government Web Archive has captured c. 60 web sites to date and is currently held separately from the Digital Archive but it was intended to bring the two together. An issue is the size of the government website domain. Finally the work of NDAD was described, and their role as contractor for TNA in preserving data sets. Next steps would include a comparison of the NDAD data model and the digital Archive data model. In closing, David said that trusted digital repository certification was a key issue and there was a need for a process to allow a federated system of preservation and access.

A final panel session allowed delegates to put questions to all the speakers.

Read More

Report on the DPC Meeting on the large-scale archival storage of digital objects

The DPC Meeting on Mass Storage Systems was held in York on 22nd April. The meeting was open to DPC members only and was intended to be an informal discussion of mass storage systems, structured around the latest DPC Technology Watch report, Large Scale Archival Storage  and authored by four members of the DOM team at the British Library. Richard Masters, Sean Martin, Jim Linden, and Roderic Parker led discussion of the decision-making and planning which led to development of their storage system. The PP slides (in PDF 433KB) for the meeting are available.

The presentation on the storage system included the importance of having a clear mission statement for the DOM Programme, and the pragmatic decision to adopt a generic, cost-effective, and incremental approach. Major drivers for the programme were discussed, including legal and voluntary deposit and Richard Masters referred to the e-journal pilot being undertaken with volunteer publishers, to test how legally deposited e-journals will be delivered to the BL. Other categories of material includes the BL's digitised collections, sound archive, web archiving, and Ordnance Survey material. This comprises both a large volume of digital material and also a wide variety of formats.

While the decision was to purchase off-the-shelf products wherever possible, it had not been possible to purchase a storage system which met all of the BL's requirements. Principles which needed to be considered included the need for material to be invariant over time (which proved to be a fundamental difference with many commercial approaches); the need to assign an internal, unique identifier; the need to ensure that there would be no extended loss of service; and the need to ensure both integrity and authenticity. The latter needs to be more than simply checking that a file hasn't changed and the team had conducted a key generation ceremony to ensure this condition was met. This provides a trust model which ensures that a bit-stream remains unchanged after decades, despite changes of hardware during that timeframe.

Resilience of the system will be provided by having multiple sites (initially there will be one at Boston Spa, one at St Pancras), which can currently hold 12TB of storage, and a third "dark archive" to be held in another location. The multiple site design provides disaster tolerance by enabling the service to continue despite the loss of a storage site. The role of the dark archive is to provide the ability to recreate the DOM store in the extreme case that all sites are destroyed - this would be done by re-ingesting all objects from the dark archive into a new site.

The concept of total cost of ownership was outlined, Jim Linden led the meeting through elements of total cost, including initial purchase, the cost of operations (where staff costs are significant), data centre costs and application support and enhancement. It was decided that performance of commodity storage was adequate for preservation storage. It had been necessary to plan and decide on features that did not add value for the BL's needs (even though several commercial vendors felt they would provide benefits, it was necessary to articulate the BL's specific requirements, where many of these added extras were not required). Issues still needing to be considered were emerging technologies, such as the MAID concept of power saving. There are also a number of placeholders for future work, for example the assumption that the same 80/20 rule for accessed material which holds true in the print world, needs to be tested in the digital world.

It was a very informative and stimulating session and I'm grateful to the authors for taking the time to talk through their approach. One suggestion on the feedback forms for additional themes for similar meetings was preservation metadata and it may be of interest that the next Technology Watch report has recently been commissioned from Brian Lavoie of OCLC and Richard Gartner and Michael Popham of the University of Oxford and is on Preservation Metadata. This report should be ready for peer review in July 2005.

Read More

Report on IS & T Archiving 2005 Conference, Washington, 26 - 29 April 2005

Sarah Middleton

Sarah Middleton

Last updated on 30 September 2016

By Hugh Campbell, PRONI

1. I attended the Imaging Science & Technology (IS&T) Archiving 2005 conference at the Washington Hilton. This is my report on the conference.

2. Washington is quite a long way away – home to hotel was about 20 hours with hotel to home about 18 hours. This needs to be borne in mind when planning travel to such a conference and return to work - the body needs time to recover.

3. The conference itself started on Tuesday, 26 April with a number of tutorials. I attended the Long-Term Archiving of Digital Images tutorial – see attached summary. The conference proper ran from Wednesday 27 April – Friday 29 April, kicking off at 0830 each morning (and finishing at 1700 on Wednesday and Thursday and 1500 on Friday). Wednesday featured a 40-minute keynote address and 15 20-minute sessions; Thursday featured a 40-minute keynote address, 10 20-minute sessions and approximately 20 90second poster previews followed by the opportunity to visit the poster presentations. Friday featured a 40-minute keynote address and 10 20-minute sessions. I felt that there were too many sessions, cramming too much into a short space of time.

Read More

Subcategories

Unless otherwise stated, content is shared under CC-BY-NC Licence


Scroll to top