![]() |
![]() |
![]() |
|
|
DPC Featured ProjectThe aim of this series is to highlight some of the DPC member projects listed in the DPC Members' Projects Table A project will be highlighted on the Home Page of the DPC website every 6-8 weeks and older versions will be retained in the table of DPC Member projects. In the second of the series, Philip Beresford, Web archiving project manager, was interviewed by Maggie Jones about the UKWAC project. Published 08 March 2006 UK WEB ARCHIVING CONSORTIUMHow did the project come about? In March 2002, the Wellcome Trust and the Joint Information Systems Committee (JISC) awarded a contract to UKOLN to undertake a feasibility study into web archiving. This study recommended a pilot project in collaboration with the British Library. The Consortium was formed in October 2003, and Magus Research was contracted to provide infrastructure and operational support in May 2004. Live public access to the archive was launched in May 2005. The archive is now over 1000 sites strong (Jan 2006). Most archived sites are re-visited to gather an updated snapshot at least 6-monthly – some more frequently whilst they are in an active phase. Who are the partners and what responsibilities does each partner have? The British Library archives sites of research value reflecting national culture, events of historical importance, and a selection demonstrating innovative website design., funded by JISC. The National Archives will focus on archiving selected materials from six main clusters of government departments:
JISC will preserve websites from leading-edge, innovative ICT projects in UK Higher and Further Education. The National Library of Scotland will collect material reflecting the knowledge, culture and history of Scotland and the Scots. The National Library of Wales will collect material of current and future research value reflecting life in contemporary Wales. Wellcome Trust will collect health and medical-related UK websites. How will others benefit from this project? Researchers already benefit as access is free. Already, some of the sites archived by UKWAC have disappeared from the ‘live’ web. UKWAC will publish a regular bulletin listing these to draw attention to this unique strength. Many other sites have changed significantly several times since UKWAC started taking periodic snapshots of them. All these instances will be preserved in the UKWAC archive. Under future retrieval software it should be possible to step back to observe how sites have changed over time. Content providers have also benefited from being able to revisit previous instances of their own websites. What do you see as the major advantages of a collaborative model? It is unlikely that any one of the partners could have resourced this project independently - collaborating on collection policies has made it possible to offer researchers a combined resource of useful breadth, predictable depth, and reliable quality. What would you see as the major challenges associated with a distributed model such as this? Longer-term issues such as how and when to bring in new partners, and investment in infrastructure renewal, are difficult within a limited-term agreement. How is the project funded? By equal contributions from all partners – aimed at covering costs. Is it expected to continue indefinitely or is there a timeframe you’re working toward? The aim of the pilot is to anticipate extension of Legal Deposit to digital materials, to gain practical experience of new techniques, and build the core of an archive as a basis for a national repository of research-level web material. Would you want to involve other partners at some stage? There have been expressions of interest from other potential partners, which could result in expansion at some stage. What is the link between this project and the International Internet Preservation Consortium? (IIPC, http://netpreserve.org/about/index.php) The Consortium would wish to base its next generation infrastructure around the standards and software being developed through IIPC projects, as soon as new tools are available and proven in practical use. How does UKWAC complement IIPC? UKWAC is a pilot project building an archive of selected UK websites with first generation tools, in anticipation of much wider coverage of the UK webspace under legal deposit. IIPC is a forum for research and exchange of ideas on standards, technologies and future tools for web archiving. Consortium partners are mostly national libraries. How do you define what you will capture in terms of the physical limits of the website and is there any noticeable difference between various subject areas? UKWAC is a selective pilot; and does not attempt a full scale domain level representative capture. It is also constrained by the current PANDAS 2 technology, so a) as a pilot it is growing rather slowly, and b) there are problems with some types of content – notably Flash-enriched presentation. The UKWAC pilot needed significant curatorial effort in QA – fixing broken links or investigating content that didn’t capture correctly first time. A new generation of tools is needed to reduce the amount of QA to be done, and facilitate the work involved. How are you intending to preserve the websites you have selected? It is likely that UKWAC will be migrated into the long-term, large scale national web archive as Legal Deposit comes on-stream. This will be expected to provide for the active preservation of all content types. Some people say that capturing web content isn’t preserving them, what would be your response to that? UKWAC is primarily concerned with capturing material that already is starting to disappear from the live web, adding updates as often as necessary, storing website instances securely, and providing a simple means of access. Currently for the pilot project no specific pro-active preservation processes are being applied. In future, under Legal Deposit, there will be an obligation on collecting agencies to ensure that obsolescent file formats will be accessible in the long-term future. The UKWAC pilot has helped in raising awareness in both information and publishing communities that websites are ephemeral in nature, and that there is much original material here that should be preserved, in its original presentation as far as possible, for future research purposes. The current archive is available for free public access at: It contains over 1000 fully-processed web-sites (and many of these have been updated with 6-monthly or more frequent revisits). Special intensive ‘collections’ are occasionally added – for instance, web sites relating to, and dating from the time of:
A new collection is being started to archive websites dealing with women’s issues. Further information about the UK Web Archiving Consortium can also be found at the link above. |
|||||||||||||||||||||||||||||||||||||||||||
HOME | ABOUT | MEMBERSHIP | MEMBERS'
PROJECTS | ADVOCACY | FORUMS/MEETINGS DPC GUIDES | HANDBOOK | WHAT'S NEW IN DP | REPORTS | DIARY | DP AWARD | LINKS | CONTACT |
|
DPC is a company limited by Guarantee. Our Company Number is 4492292 |
|