![]() |
|
|
DPC Featured ProjectThe aim of this series is to highlight some of the DPC member projects listed in the DPC Members' Projects Table The fourth interview, carried out in June 2006, sees Steve Hitchcock, Preserv Project Manager at the University of Southampton, talk to Kieron Niven about the Preserv project. Published 25 July 2006 PRESERV ProjectWhat were the motivating factors for starting the PRESERV project? One factor motivating Preserv was the increasing number of repositories being set up, partly through programmes such as JISC's FAIR programme. Recognising the growth of content in repositories, there is an opportunity to support long term access without detracting from that growth. In addition, from the perspective of the team developing EPrints software at Southampton University we wanted to address some perceptions that EPrints did not do preservation or that we did not care about preservation. The existence of the Preserv project makes a strong statement about the latter view, but more importantly we wanted to demonstrate a different approach to repository preservation - a distributed preservation service - that was consistent with the origins of EPrints. EPrints emerged as the first software to support the Open Archives Initiative (OAI). Note the term 'archives'. In fact what we now commonly refer to as institutional repositories (IRs) were originally called institutional archives. The term has evolved with good reason, and suits Preserv well, as we shall see. The critical feature of OAI is its distinction between data providers and service providers, and the use of the resulting protocol (OAI-PMH) to enable the two sides to interact and share data. This seems like a good basis for repository preservation, where preservation is viewed as a separate service. That is the essential, underlying model that informs Preserv. Preservation is not easy, and requires specialist skills for the most critical data. This is not exclusive to digital data. There has long been a relation between content providers, publishers and preservation, which was provided by, well, archives. It is odd, therefore, that in the case of the IR as digital content provider, it has been argued that the IRs, supported by IR software, should be responsible for implementing preservation. This is fine as a starting point providing it is recognised that preservation skills are not as widely available to IRs as content management skills, and therefore the role of software is to support an interface to a range of preservation services that the repository managers can choose based on their own preservation policies. That is where the responsibilities lie. We also have to recognise that the preservation services we envisage don't actually exist yet, other than as tests in this and other projects. From being a technical investigation of how we can support preservation interfaces in IR software, particularly in EPrints, Preserv has widened to explore IR policies in order that we can help shape and inform the emergent services of the needs of repositories. The project obviously follows on from the University of Southampton's development of the Eprints software, how did the other project partners get involved and what do you think they bring to the project? The partners bring preservation expertise (The British Library and The National Archives), repository content and content management expertise (the IR teams at Oxford University and Southampton University). From the outset there was great anticipation about the use in this project of TNA's PRONOM file format identification tool, as we were extending the use of this tool to new areas, and it was seen as a very practical development for IRs. In preservation terms, this is perhaps a front-end process in that it helps us understand what type of data we have to preserve. The harder, and bigger process at the back-end is the preservation service, and this is where the BL - through Adam Farquhar, Richard Masters and Paul Wheatley - has been a tremendous influence on the models we have identified for investigation. Our project poster (http://preserv.eprints.org/talks/preserv-jiscndiipp-v1-2.ppt) at the recent NDIIPP-JISC meeting in Washington outlined a number of preservation service provider models, but even within these models we now understand how a range of services - from byte preservation through transformation and others services - can be provided on a more interactive basis between content provider and preservation service provider. This contrasts with, and is more flexible than, the simple idea of an archive in which one provides content and the other stores it. For a project none of this is any use without real content, and users, to work with, and this is what the content partners bring to the project. Ironically, in terms of file format profiling using PRONOM we have broadened this approach to hundreds of repositories and to over 20 repositories in terms of a survey of preservation policies, but these activities were formed and tested initially by working directly with our content partners. The results of the policy survey are currently being collected and analysed for publication. Some initial results were presented at the recent DPC briefing on Policies for Digital Repositories A significant element of the PRESERV project focusses on the collection of preservation metadata. Have you created, or do you aim to create, your own metadata specifications? We all have a remarkable resource in the form of the PREMIS metadata set, and we have based a Preserv metadata subset on PREMIS. Essentially we have mapped the metadata set to our IR preservation service models, to identify what metadata we could generate and where in a typical process for a data object deposited in an IR and harvested by a preservation service provider. As a result we haven't discarded a lot of the original PREMIS set, and we understand that the resulting metadata set may not be as ruthlessly pared down as other examples, but that might happen with more implementation experience. We will publish the initial Preserv preservation metadata set soon. The PRESERV project makes many references to the OAIS model, how important has the model been in the development of the project? At a broad level, conceptually we have tried to relate each of our preservation service models to the OAIS model. This is important in any model that involves more than one party, as in the Preserv case, as it gives us a common basis for understanding and allocating appropriate responsibility between partners, according to the model and service adopted. It is important to recognise too that this extends beyond the technical elements of OAIS to the administrative and management roles. How easy has it been to incorporate the existing PRONOM registry into your work? Is full automation of the file identification process possible? PRONOM has worked very well in our application, where it has been used to produce format profiles for over 200, and growing, OAI-compliant repositories. These profiles are all public through Tim Brody's Registry of Open Access Repositories (ROAR), where they are linked as 'Preserv profiles' ( http://archives.eprints.org/) ROAR is widely known and used in the IR community and gives the project and PRONOM a high degree of visibility. It has also been a constructive process as far as PRONOM development is concerned. From an early test with the project partner repositories, there has been ongoing dialogue with Adrian Brown at TNA about the refinement of file formats and versions recognised by PRONOM, to the extent that it can now effectively and automatically handle the wide array of file formats and sources presented by IRs around the world. From TNA's perspective, work with DROID, the tool that performs format identification with reference to the PRONOM registry, has demonstrated that effective automation of format identification within the ingest and profiling processes is possible. Refinement of individual format signatures is an ongoing process and, according to Adrian Brown, this work has benefitted greatly from the availability of a large volume of repository content, and feedback from the ROAR service. As well as being important in designing preservation services, repository format profiles have assisted us in working directly with repository managers. The profiles provide an objective basis for our survey of repository preservation policies, helping us both to frame the questions and to interpret the replies. Part of the project is also aimed at developing a technology watch service for PRONOM, how will this work? This will work in several ways. First, ROAR enables us to identify gaps in the current content of PRONOM, and to prioritise research accordingly. Second, TNA is developing a number of preservation planning services which will be made available externally. These will include a risk assessment service for formats, enabling high-risk formats to be identified by repositories, and a preservation plan generation service, which will enable preservation service providers to identify target formats for migrating at-risk content, and for identifying and evaluating potential migration pathways. TNA is also developing a range of external Web service interfaces for PRONOM, to enable machine access to these services, and integration with repository systems. Aside from Eprints, are there any plans to apply your work to other IR software such as Dspace or Fedora? We can't apply our work directly to IR software other than EPrints as we are not the developers of those. It is fair to say, however, that two of our models - the institutional model and the repository model in the Washington poster - have been strongly influenced in different ways by Fedora and DSpace, respectively. This is a good way forward, as it recognises that these softwares are different and yet we can learn and adapt from the experiences of others. Another way forward is for the different IR softwares to support an OAIS DIP (dissemination) format that can then be harvested for profiling/preservation/conversion, etc. We are not aware of repositories that provide OAI access to what we might call a 'DIP'. How do you see PRESERV fitting in with other projects such as SHERPA DP? We have regular contact with SHERPA DP and have a couple of joint works planned, including contributions to SHERPA DP's Digital Repository Handbook, and perhaps an end-of-projects event early next year. Our basic models are quite similar: distributed preservation services for IRs. The respective lead partners - Southampton with Preserv, and AHDS with SHERPA DP - see the model from different perspectives, as IR software developer and service provider, and that could lead to some interesting findings. It is clear that informed collaboration is growing as more information emerges from the projects. More information about the Preserv project can be found at the project website http://preserv.eprints.org/ |
|||||||||||||||||||||||||||||||||||||||||||||||||||
HOME | ABOUT | DPC EVENTS | TRAINING | DP AWARD | WHAT'S
NEW IN DP
HANDBOOK | REPORTS | PRESS COVERAGE | LINKS | CONTACT MEMBERSHIP | MEMBERS' PROJECTS | MEMBERS' LOGIN DPC is a company limited by Guarantee. Our Company Number is 4492292 ©
Digital Preservation Coalition 2002
Copyright and Disclaimer |