Sarah Middleton

Sarah Middleton

Last updated on 10 May 2017

July - December 2004

A joint service of the Digital Preservation Coalition and the PADI (Preserving Access to Digital Information) gateway

To open PDFs you will need Adobe Reader

Compiled by Gerard Clifton (National Library of Australia) and Michael Day (UKOLN, University of Bath)

22 December 2004

This is an archived issue of What's New.

Also available as a print-friendly PDF (399KB).

Known problem links in online versions and PDFs are disabled (or updated when the issue was current) but it is not always possible to annotate the amendments in PDFs with a date or other information which may appear in the online version.


This is a summary of selected recent activity in the field of digital preservation compiled from the Preserving Access to Digital Information (PADI) Gateway and the digital-preservation and padiforum-l mailing lists. Additional or related items of interest may also be included.

Contents:

  1. News from organisations and initiatives

    1.1 US National Archives and Records Administration (NARA)

    1.2 US Library of Congress and the National Digital Information Infrastructure and Preservation Program (NDIIPP)

    1.3 UK Joint Information Systems Committee (JISC)

    1.4 UK Digital Curation Centre (DCC)

    1.5 UK Digital Preservation Coalition (DPC)

    1.6 ERPANET

    1.7 UK House of Commons Select Committee on Science and Technology

    1.8 Harvard-MIT Data Center
  2. Activities and outcomes

    2.1 Web Archiving

    2.2 Preservation Metadata

    2.3 File Formats and Tools

    2.4 Audiovisual Archiving
  3. Other publications

    3.1 Evolving Issues and Digital Preservation

    3.2 Other recent publications
  4. Events

    4.1 Recent events

    4.2 Forthcoming events

 

1. News from Organisations and Initiatives

 

1.1 US National Archives and Records Administration (NARA)

In August, 2004, NARA announced the award of US$20.1 M to two contractors who would compete in a one-year design competition to develop a technical solution for the preservation of electronic information. Lockheed Martin and Harris Corporation will receive US$9.5 M and US$10.6 M respectively to build the Electronic Records Archives (ERA), which will capture and provide a means of preserving virtually any kind of electronic information independent of specific hardware or software requirements.

The NARA press release of the announcement, Retrieved December 22, 2004, may be found at:
http://www.archives.gov/media_desk/press_releases/nr04-74.html


 

1.2 US Library of Congress and the National Digital Information Infrastructure and Preservation Program (NDIIPP)

In September, 2004, the Library of Congress announced the award of more than US$14.9 M to eight US institutions to "identify, collect and preserve digital materials within a nationwide digital preservation infrastructure." The awards are the result of a call by NDIIPP for submissions in August, 2003, which closed in November, 2003, and which were subjected to peer review, before final selection by the Librarian of Congress.

The institutions and their projects are detailed in NDIIPP press release, Retrieved December 22, 2004, at:
http://www.digitalpreservation.gov/about/pr_093004.html


 

1.3 UK Joint Information Systems Committee (JISC)

In October, 2004, JISC announced the award of grants totalling £1 M to nine UK educational institutions and their partners to support digital preservation and asset management in UK Higher and Further Education institutions. Eleven projects will be funded under the JISC Circular 4/04 programme and will address the areas of institutional management support, institutional repository infrastructure development, and digital preservation assessment tools. Project funding will run from October 2004 until September 2006. Details of the projects, Retrieved December 22, 2004, are available from:
http://www.jisc.ac.uk/index.cfm?name=programme_404

The ongoing work of JISC and outcomes to date within the context of the Continuing Access and Digital Preservation Strategy for the Joint Information Systems Committee (JISC) 2002-05 and its associated implementation plan is described in a recent article by Beagrie (2004). Among the outcomes is a report by Barker et al. (2004) of a feasibility study into the preservation of e-learning materials. The report provides a list of recommendations, a discussion of long term risks to e-learning digital content and the infrastructure necessary for retention of these materials.

Bailey (2004) reported on JISC's launch of an online training package on the management of electronic records. This was primarily designed for the UK further and higher education sectors, and was created by the University of Northumbria as part of JISC's Supporting Institutional Records Management programme.

Bailey, S. (2004). The Electronic Records Management Training Package. D-Lib Magazine, 10(7/8), July/August 2004.Retrieved December 22, 2004, from
http://www.dlib.org/dlib/july04/07inbrief.html#BAILEY

Barker, E., James, H., Knight, G., Milligan, C., Polfreman, M. and Rist, R. (2004). Long Term Retention and Reuse of E-Learning Objects and Materials. Joint Information Systems Committee, November 2004. Retrieved December 22, 2004, from
http://www.jisc.ac.uk/index.cfm?name=project_elo

Beagrie, N. (2004). Continuing Access and Digital Preservation Strategy for the UK Joint Information Systems Committee (JISC).D-Lib Magazine, 10(7/8), July/August 2004. Retrieved December 22, 2004, from
http://www.dlib.org/dlib/july04/beagrie/07beagrie.html


 

1.4 UK Digital Curation Centre (DCC)

The UK Digital Curation Centre (DCC) was officially launched on 5 November 2004 in a ceremony held at the National e-Science Centre in Edinburgh. The launch marked the transition from the setup phase, which began in March 2004, to operational status. In December, the DCC announced that the director-designate of the centre was Chris Rusbridge (Director of Information Services at the University of Glasgow), who will take up his position in February 2005. The main aim of the DCC is to support UK institutions to actively manage, as well as to store and preserve digital data for long-term access.

Presentations from the launch event, Retrieved December 22, 2004, are available at: http://www.dcc.ac.uk/launch/

Two papers regarding the need for digital curation, the issues involved and the work of the DCC were presented at the UK e-Science All Hands Meeting.

Lord, P., Macdonald, A., Lyon, L. and Giaretta, D. (2004). From Data Deluge to Data Curation. Proceedings of the UK e-Science All Hands Meeting 2004, 31st August - 3rd September, Nottingham UK. PDF retrieved December 22, 2004, from
http://www.allhands.org.uk/2004/proceedings/papers/150.pdf

Giaretta, D., Lyon, L. and Robinson, B. (2004). Curating for the Future - The work of the Digital Curation Centre. Proceedings of the UK e-Science All Hands Meeting 2004, 31st August - 3rd September, Nottingham UK. PDF Retrieved December 22, 2004, from
http://www.allhands.org.uk/2004/proceedings/papers/167.pdf


 

1.5 UK Digital Preservation Coalition (DPC)

The US Library of Congress and the UK Digital Preservation Coalition have signed an agreement to work together. The agreement was signed by Laura Campbell, Associate Librarian for Strategic Initiatives, Library of Congress, and Lynne Brindley, Director of the British Library and Chair of the DPC, at the DPC Forum Digital Preservation: The Global Context, held at the British Library on 23 June 2004

Reports and presentations from the DPC Forum Digital Preservation in Institutional Repositories held at the British Library on 19 October 2004 are now available from the DPC Web site, Retrieved December 22, 2004, from:
http://www.dpconline.org/events/past-events/digital-preservation-in-institutional-repositories

The DPC have also recently released a directory to predominately UK resources that provide services in digital storage and preservation (Simpson, 2004a). The types of resources listed include archives, data services, support services and consultancies. A complementary guidance leaflet is also available (Simpson, 2004b), designed for smaller organisations who lack the infrastructure to undertake digital preservation in-house. The leaflet is intended to provide practical guidance and a checklist of issues to consider before drawing up a service contract.

Simpson, D. (2004a). Directory of Digital Preservation Repositories and Services in the UK. Digital Preservation Coalition, 2004. PDF Retrieved December 22, 2004, from
http://www.dpconline.org/docs/guides/directory.pdf

Simpson, D. (2004b). Contracting Out for Digital Preservation Services : Information Leaflet and Checklist. Digital Preservation Coalition, 2004. PDF Retrieved December 22, 2004, from
http://www.dpconline.org/docs/guides/outsourcing.pdf


 

1.6 ERPANET

ERPANET continues to provide a range of resources, services and events to promote and facilitate preservation of digital materials, and have redesigned the group's Web site to make resources and information easier to locate and use. ERPANET's range of activities and the resources available on the Web site are described in a recent article by Ross (2004).

An analysis of ERPANET services and products is also part of a recent report which explores viable models for sustaining ERPANET's key functions (Maplehurst Consultants, 2004). The report examines strategic partnerships, funding options, merger models, new services and marketing plans, and concludes that a hybrid funding mix is likely to afford future sustainability.

A new ERPANET guidance tool for evaluating ingest technologies provides an introduction to ingest and its role in the development of a digital repository system. The appendix contains a companion guide and checklist of factors for consideration when defining or selecting an ingest strategy.

Selected papers from the ERPANET Training Seminar on metadata in preservation held in late 2003 have now been published in the series Veroffentlichungen der Archivschule Marburg (Bischoff, Hofman & Ross, 2004).

Bishoff, F. M., Hofman, H., & Ross, S., eds., (2004). Metadata in preservation: selected papers from an ERPANET Seminar at the Archives School Marburg, 3-5 September 2003. Veroffentlichungen der Archivschule Marburg Institut für Archivwissenschaft, Nr. 40. ISBN 3-923833-77-6. Book details Retrieved December 22, 2004, from http://www.uni-marburg.de/archivschule/fv21.html

ERPANET (2004). ERPA Guidance: Ingest Strategies. ERPANET, September 2004. PDF Retrieved December 22, 2004, from
http://www.erpanet.org/guidance/docs/ERPANETIngestTool.pdf

Maplehurst Consultants SARL (2004). Sustainability and Exploitation Plan for ERPANET. ERPANET, 2004. PDF Retrieved December 22, 2004, from
http://www.erpanet.org/documents/ERPANETSustainabilityFinalDocument.pdf

Ross, S. (2004). The Role of ERPANET in Supporting Digital Curation and Preservation in Europe. D-Lib Magazine, 10(7/8), July/August 2004. Retrieved December 22, 2004, from
http://www.dlib.org/dlib/july04/ross/07ross.html

Recent ERPANET events are noted in section 4.1 Recent Events, below.


 

1.7 UK House of Commons Select Committee on Science and Technology

In July, the UK House of Commons Select Committee on Science and Technology (2004a) published an important report on scientific publishing. The committee had previously taken evidence from researchers, funding bodies, publishers and librarians, and the report strongly supported the concept of open access to the results of publicly-funded research. Amongst its conclusions, for example, was a recommendation that the research councils and other UK funding bodies should make depositing a copy of published output in institutional repositories a condition of being awarded a grant (a similar appoach has recently been proposed by the US National Institutes of Health (Retrieved December 22, 2004, from http://www.nih.gov/about/publicaccess/)). While much of the report dealt with institutional repositories and open access issues, it also contained a chapter outlining the challenges of preserving digital content. It recommended that the British Library should receive additional funding to deal with the technical challenges of digital preservation; also the speeding-up of the development of regulations for the legal deposit of non-printed publications. The UK Government's response to the report was published in November 2004 (House of Commons Select Committee on Science and Technology, 2004b).

UK, House of Commons, Select Committee on Science and Technology. (2004a). "Tenth Report: Scientific publications: free for all?" HC 399. London: The Stationery Office. Retrieved December 22, 2004, from:
http://www.publications.parliament.uk/pa/cm200304/cmselect/cmsctech/cmsctech.htm

UK, House of Commons, Select Committee on Science and Technology. (2004b). "Fourteenth Report: Responses to the Committee's tenth report, Session 2003-04, scientific publications, free for all?" HC 1200. London: The Stationery Office. Retrieved December 22, 2004, from:
http://www.parliament.the-stationery-office.co.uk/pa/cm200304/cmselect/cmsctech/cmsctech.htm


 

1.8 Harvard-MIT Data Center

The Virtual Data Center (VDC) development team announced the release of version 1.01 of VDC, an open-source system for the management, dissemination, exchange, and citation of quantitative data. VDC Web site Retrieved December 22, 2004, from http://thedata.org/index.php/Main/HomePage


 

2. Activities and Outcomes

 

2.1 Web Archiving

There have been a number of recent reports on the development of technologies and tools to support Web archiving.

Two reports from the Metrics and Testbed Working Group of the International Internet Preservation Consortium (IIPC) were released in July 2004. The Web Harvesting Survey (Marill, et al., 2004) is a survey of general conditions, such as the presence of forms, scripts or media, that influence the harvest process and the quality of archival crawls. A companion document, Test Bed Taxonomy for Crawler (Boyko, 2004), provides a more detailed listing of issues affecting crawlers, such as issues in parsing, link detection and presentation.

Clausen (2004) reports on the reliability of the Etag and Last-Modified-Date fields of HTTP-headers as indicators of changed Web content, without having to download it, in order to avoid archiving of duplicate content over repeated harvests. An investigation at Netarkivet.dk examined the front pages of 361,408 websites on the Danish domain, which were harvested every other night over a period of one month. Checksums were used to determine if pages had changed, and these results were compared to the Etag and Last-Modified-Date fields in order to determine the reliability and usefulness of these header fields as indicators.

Recent presentations at the 4th International Web Archiving Workshop (IWAW'04), held in conjunction with ECDL 2004, in Bath, cover a range of tools and techniques in development. Mohr et al. (2004) provide an introduction to Heritrix, the archival crawler developed by the Internet Archive, and a distributed system for Web crawling, Dominos, is described by Hafri and Djeraba (2004a; 2004b). The WATSON project and the use of language engineering techniques for focusing web archiving and aiding analysis are described by Coch and Masanès (2004), while Christensen (2004) examines the requirements for a format registry for web archives.

The nature and ephemeral life of weblogs (blogs) and the nascent efforts to preserve them is described in a recent article by Entlich (2004).

The challenges of technological obsolescence for archived Web materials is explored in a paper by Rosenthal et al. (2004). Building on the content-negotiating capabilities of HTTP, the LOCKSS program (Lots of Copies Keeps Stuff Safe) have demonstrated a method of on-the-fly format migration on delivery to overcome problems of browser incompatibility with archived formats.

Following their influential study published in Science in 2004 (vol. 302, pp. 787-788), Dellavalle and colleagues have produced more evidence of the volatility of Web references in the medical literature. Their recent studies have included references in oncology journals (Hester, et al., 2004) and the Internet citation policies of high-impact STM journals (Schilling, et al., 2004). These support other recent studies of URL references in the the biomedical literature (Crichlow, Davies & Winbush, 2004; Wren, 2004); the stability of Web references has also been cited as being a critical issue for the publication of clinical trials (Tumber and Dickersin, 2004, pp. 278-279). Bar-Ilan and Peritz (2004) have provided a similar analysis of Web documents in the informetrics sub-discipline of information science.

Susan Westerberg Prager of UCLA School of Law provides a useful analysis of the importance of digital preservation for the future of academic legal research, concentrating on the stability of Internet-based information (Prager, 2004).

Bar-Ilan, J., & Peritz, B.C. (2004). Evolution, continuity, and disappearance of documents on a specific topic on the web: A longitudinal study of 'informetrics.' Journal of the American Society for Information Science and Technology, 55(11), 980-990.

Boyko, A. (2004). Test Bed Taxonomy for Crawler.Version 1. International Internet Preservation Consortium, July 2004. PDF Retrieved December 22, 2004, from
http://www.netpreserve.org/publications/iipc-r-002.pdf

Christensen, N. H. (2004). Towards format repositories for web archives. 4th International Web Archiving Workshop (IWAW'04). Eds. Julien Masanes and Andreas Rauber. Bath (UK), 2004. PDF Retrieved December 22, 2004, from
http://www.iwaw.net/04/index.html

Clausen, Lars R. (2004). Concerning Etags and Datestamps. The State and University Library, Arhus, and the Royal Library, Copenhagen, Denmark, July 2004. PDF Retrieved December 22, 2004, from http://www.netarchive.dk/website/publications/Etags-2004.pdf

Coch, J. and Masanes, J. (2004). Language engineering techniques for web archiving. 4th International Web Archiving Workshop (IWAW'04). Eds. Julien Masanes and Andreas Rauber. Bath (UK), 2004. PDF Retrieved December 22, 2004, from
http://www.iwaw.net/04/index.html

Crichlow, R., Davies, S., & Wimbush, N. (2004). Accessibility and accuracy of Web page references in 5 major medical journals. JAMA: the Journal of the American Medical Association, 292(22), 2723-2724.

Entlich, R. (2004). Blog Today, Gone Tomorrow? Preservation of Weblogs. RLG DigiNews, 8(4), August 2004. Retrieved December 22, 2004, from
http://www.rlg.org/en/page.php?Page_ID=19481#article3

Hester, E. J., Heilig, L. F., Drake, A. L., Johnson, K. R., Vu, C. T., Schilling, L. M., & Dellavalle R. P. (2004). Internet citations in oncology journals: A vanishing resource? Journal of the National Cancer Institute, 96(12), 969-971.

Hafri, Y. and Djeraba, C. (2004a). Dominos: A New Web Crawler's Design. 4th International Web Archiving Workshop (IWAW'04). Eds. Julien Masanes and Andreas Rauber. Bath (UK), 2004. PDF Retrieved December 22, 2004, from
http://www.iwaw.net/04/index.html

Hafri, Y., & Djeraba, C. (2004b). High performance crawling system. In: Proceedings of the 6th ACM SIGMM International Workshop on Multimedia Information Retrieval, New York, USA, 15-16 October 2004, New York: ACM Press, pp. 299-306.

Mohr, G., Kimpton, M., Stack, M. and Ranitovic, I. (2004). Introduction to Heritrix, an archival quality web crawler. 4th International Web Archiving Workshop (IWAW'04). Eds. Julien Masanès and Andreas Rauber. Bath (UK), 2004. PDF Retrieved December 22, 2004, from http://www.iwaw.net/04/index.html

Marill, J., Boyko, A. and Ashenfelder, M. (2004). Web Harvesting Survey.Version 1. International Internet Preservation Consortium, July 2004. PDF Retrieved December 22, 2004, from http://www.netpreserve.org/publications/iipc-r-001.pdf

Prager, S.W. (2004). Law libraries and the scholarly mission. Law Library Journal, 96(3), 513- 524. PDF Retrieved December 22, 2004, from: http://www.aallnet.org/products/2004-31.pdf

Rosenthal, D. S. H., Lipkis, T., Robertson, T. and Morabito, S. (2004). Transparent Format Migration of Preserved Web Content. ArXiv.org at Cornell University Library, 22 November 2004. Retrieved December 22, 2004, from http://arxiv.org/abs/cs.DL/0411077

Schilling, L. M., Kelly, D. P., Heilig, L. F., Hester, E. J., & Dellavalle, R. P. (2004). Digital information archiving policies in high-impact medical and scientific periodicals. JAMA: the Journal of the American Medical Association, 292(22), 2724-2726.

Tumber, M. B. & Dickersin, K. (2004). Publication of clinical trials: accountability and accessibility. Journal of Internal Medicine, 256(4), 271-283.

Wren, J. D. (2004). 404 not found: the stability and persistence of URLs published in MEDLINE. Bioinformatics, 20(5), 668-672.

The state of web archiving in various countries has also been the subject of recent papers.

Among papers presented at the 70th IFLA Conference in August 2004, Gatenby (2004) details the National Library of Australia's work on web harvesting and the preservation of digital materials as part of its responsibilities within the IFLA/CDNL (Conference of Directors of National Libraries) Alliance for Bibliographic Standards (ICABS), and its involvement in the International Internet Preservation Consortium's (IIPC) Deep Web Working Group.

Van Nuys et al. (2004) describe the National Library of Norway's Paradigma Project, which is engaged in the harvesting and archiving of the Norwegian domain as part of the Library's legal deposit responsibilities. Exploration of appropriate metadata solutions, the incorporation of the Functional Requirements for Bibliographic Records (FRBR) model into the archive design, and the implementation of a proposed authentication service for document verification are discussed.

Presentations during the afternoon session of IWAW'04 included experiences in archiving the Greek Web (Lampos et al., 2004) and the Slovenian Web (Kavcic-Colic and Grobelnik, 2004). The Chinese Web archiving project, InfoMall, including detailed descriptions of the service model and the Tianwang archival storage format for web pages, is described by Yan et al. (2004). Case studies on the National Library of Australia's PANDORA Archiving System (Koerbin, 2004) and the use of sampling techniques in harvesting the entire University of Michigan domain (umich.edu) (Lyle, 2004) were also presented.

Gatenby, P. (2004). Collecting and Managing Web Resources for Long-Term Access : Web Harvesting and Guidelines to Support Preservation (ICABS Actions 3.3 and 3.4).Proceedings of the 70th IFLA General Conference and Council, August 22 - 27, 2004, Buenos Aires, Argentina. PDF retrieved December 22, 2004, from http://www.ifla.org/IV/ifla70/papers/026e-Gatenby.pdf

Kavcic-Colic, A. and Grobelnik, M. (2004). Archiving the Slovenian Web: Recent Experiences. 4th International Web Archiving Workshop (IWAW'04). Eds. Julien Masanes and Andreas Rauber. Bath (UK), 2004. PDF Retrieved December 22, 2004, from http://www.iwaw.net/04/index.html

Koerbin, P. (2004). The PANDORA Digital Archiving System (PANDAS) and Managing Web Archiving in Australia: a Case Study. 4th International Web Archiving Workshop (IWAW'04). Eds. Julien Masanes and Andreas Rauber. Bath (UK), 2004. PDF Retrieved December 22, 2004, from http://www.iwaw.net/04/index.html

Lampos, C., Eirinaki, M., Jevtuchova, D. and Vazirgiannis, M. (2004). Archiving the Greek Web. 4th International Web Archiving Workshop (IWAW'04). Eds. Julien Masanes and Andreas Rauber. Bath (UK), 2004. PDF Retrieved December 22, 2004, from http://www.iwaw.net/04/index.html

Lyle, J.A. (2004). Sampling the Umich.edu Domain. 4th International Web Archiving Workshop (IWAW'04). Eds. Julien Masanes and Andreas Rauber. Bath (UK), 2004. PDF Retrieved December 22, 2004, from http://www.iwaw.net/04/index.html

Van Nuys, C., Albertsen, K., Pedersen, L. and Stenstad, A. (2004). The Paradigma Project and its Quest for Metadata Solutions and User Services. Proceedings of the 70th IFLA General Conference and Council, August 22 - 27, 2004, Buenos Aires, Argentina. PDF Retrieved December 22, 2004, from http://www.ifla.org/IV/ifla70/papers/009e-Nuys.pdf

Yan, H., Huang, L., Chen, C. and Xie, Z. (2004). A New Data Storage and Service Model of China Web InfoMall. 4th International Web Archiving Workshop (IWAW'04). Eds. Julien Masanès and Andreas Rauber. Bath (UK), 2004. PDF Retrieved December 22, 2004, from http://www.iwaw.net/04/index.html


 

2.2 Preservation Metadata

PREMIS (Preservation Metadata Implementation Strategies) Working Group

The final report of the Implementation Strategies Subgroup of the joint-sponsored OCLC/RLG PREMIS Working Group (2004) was released in September, 2004, and details the findings of a survey focused on current implementations and preservation metadata management practices of digital preservation repositories. The survey drew respondents from 48 libraries, archives and other institutions from Europe, the United States, Australia and New Zealand, and covered repository services, models and policies, as well as architecture, storage, preservation processes and metadata. The report summarises and analyses the range of responses and notes a number of trends emerging in practice. Caplan (2004) has provided a summary of the main findings of the report in an article published in the October issue of RLG DigiNews

An update on the work of the PREMIS Core Elements Sub-group is reported in the December issue of RLG DigiNews (Guenther, 2004). The article describes the development of a data model and data dictionary for core preservation metadata elements, drawn from further analysis of the recommendations of the earlier OCLC/RLG Preservation Metadata Framework Working Group (2002). The data model comprises five types of entity: intellectual entities, objects, agents, rights and events, and the majority of the data dictionary relates to objects and events, covering technical characteristics and digital provenance. Three subtypes of object entity are also described: representation, file, and bitstream, which represent different levels of organisation to which preservation metadata may apply. Further outcomes from the working group will be a set of METS-compatible XML schemas to facilitate implementation.

Caplan, P. (2004). PREMIS - Preservation Metadata Implementation Strategies Update 1: Implementing preservation repositories for digital materials: current practice and emerging trends in the cultural heritage community. RLG DigiNews, 8(5), October 2004. Retrieved December 22, 2004, from http://www.rlg.org/en/page.php?Page_ID=20462#article2

Guenther, R. (2004). PREMIS - Preservation Metadata Implementation Strategies Update 2: Core Elements for Metadata to Support Digital Preservation. RLG DigiNews, 8(6), December 15 2004. Retrieved December 22, 2004, from http://www.rlg.org/en/page.php?Page_ID=20492#article2

OCLC/RLG PREMIS Working Group (2004). Implementing Preservation Repositories for Digital Materials: Current Practice and Emerging Trends in the Cultural Heritage Community. Report by the joint OCLC/RLG Working Group Preservation Metadata: Implementation Strategies (PREMIS). Dublin, O.: OCLC Online Computer Library Center, Inc. PDF Retrieved December 22, 2004, from http://www.oclc.org/research/projects/pmwg/surveyreport.pdf

National Library of New Zealand Metadata Extraction Tool

As part of its development of a Metadata Standards Framework, the National Library of New Zealand (Te Puna Mātauranga o Aotearoa) has commissioned a software tool to extract preservation metadata from a range of file formats and to output the metadata to XML for use in digital repositories. Using Java and XML, the tool comprises a generic application and a number of adaptors for metadata extraction from specific file formats. Adaptors are currently available for MS Word 2, MS Word 6, Word Perfect, Open Office, MS Works, MS Excel, MS PowerPoint, TIFF, JPEG, WAV, MP3, HTML, PDF,GIF, and BMP.

The tool was shortlisted for the 2004 DPC Digital Preservation Award, which was included in this year's Pilgrim Trust Conservation Awards.

Version 1.0 of the tool and its documentation is now available for download from the National Library of New Zealand's Web site, Retrieved December 22, 2004, from: http://www.natlib.govt.nz/en/whatsnew/4initiatives.html#extraction


 

2.3 File Formats and Tools

Several recent articles focus on the management of file formats with respect to long-term preservation.

The development of the GDFR (Global Digital Format Registry), a central source for format information for the support of long-term preservation, and JHOVE (JSTOR/Harvard Object Validation Environment), a tool to automate the identification and validation of file formats are described by Abrams (2004).

Results of the Dutch Digital Preservation Testbed research project into long-term preservation of text documents, emails, spreadsheets and databases are discussed in a paper by Verdegem and Slats (2004). The research methodology focused on three preservation approaches - migration, emulation and the use of XML.

Long-term preservation of databases is also the subject of a recent paper by Ashley (2004), covering underlying concepts and issues for database preservation, dynamic and static systems, issues unique to proprietary GIS systems and lack of data storage standards. Two approaches to preservation are discussed: XML and emulation.

The requirements of Denmark's Netarkivet in terms of file formats and metadata are discussed in a paper by Christensen (2004). The paper outlines the Netarkivet's basic requirements for storage formats for long term preservation. Different storage formats are then evaluated in relation to Netarkivet specifications and a recommendation for a suitable long term storage format is made. An earlier related report by Clausen (2004) suggests strategies for dealing with file formats in an archival system, file preservation workflows and general recommendations for handling file formats.

Stanescu (2004) presents a method for measuring the durability of various formats. The INFORM methodology (INvestigation of FOrmats based on Risk Management) describes six classes of risk, including software, hardware, digital object formats, migration processes, the digital archive and associated organisations, and identifies specific risks for evaluation in each of these classes. The methodology of the measuring process is also described.

Preservation functions for managing formats in the e-Depot at the Koninklijke Bibliotheek, the National Library of the Netherlands are described in several papers (Bellekom, 2004; Oltmans, Van Diessen and Van Wijngaarden, 2004; Van Wijngaarden and Oltmans, 2004). Two components of the preservation subsystem are the Preservation Manager and the Universal Virtual Computer (UVC) for images. The Preservation Manager monitors the technical environment required to render stored digital objects. The UVC offers a method for permanent access to images, with the future aim being to support PDF for the long term.

The National Archives of Australia (2004) has released Version 1.0 of the XENA (XML Electronic Normalising of Archives) tool, which converts a range of file formats to XML representations for longer term access. Released as open source software, the application uses a 'plug-in' architecture to convert a range of formats to XML packages. Currently supported formats include MS Word, Excel and Powerpoint, OpenOffice formats, Rich Text Format, several email formats, comma separated files, relational databases, JPEG, GIF, TIFF, PNG and BMP, HTML and Web sites, and plain text. The software is available for download for comment, Retrieved December 22, 2004, from: http://xena.sourceforge.net

The National Library of New Zealand's Preservation Metadata Extraction tool is also available for download. (See section 2.2 Preservation Metadata, above.)

Abrams, S. L. (2004). The Role of Format in Digital Preservation. VINE 34(2): 49-55. Available online via subscription, Retrieved December 22, 2004, from http://www.emeraldinsight.com/rpsv/cw/mcb/03055728/v34n2/s1/p49

Ashley, Kevin (2004). Preservation of Databases. VINE 34(2): 66-70. Available online via subscription, Retrieved December 22, 2004, from http://www.emeraldinsight.com/rpsv/cw/mcb/03055728/v34n2/s3/p66

Bellekom, C. (2004). Building Preservation Functionality in a Digital Archive: the National Library of the Netherlands. Learned Publishing17(4), October 2004. ISSN 0953-1513. pp. 275-280. Available online via subscription, Retrieved December 22, 2004, from http://www.ingentaselect.com/rpsv/~885/v17n4/s3/p275

Christensen, S. S. (2004). Archival Data Format Requirements. The Royal Library, Copenhagen, and the State and University Library, Arhus, Denmark, July 2004. PDF Retrieved December 22, 2004, from
http://www.netarchive.dk/website/publications/Archival_format_requirements-2004.pdf

Clausen, L. R. (2004). Handling File Formats. The Royal Library, Copenhagen, and the State and University Library, Arhus, Denmark, May 2004. PDF Retrieved December 22, 2004, from http://www.netarchive.dk/website/publications/FileFormats-2004.pdf

National Archives of Australia (2004). XENA: Open Source Digital Preservation Software from the National Archives of Australia. Retrieved December 22, 2004, from http://xena.sourceforge.net

Oltmans, E., Van Diessen, R. and Van Wijngaarden, H. (2004). Preservation Functionality in a Digital Archive.Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries. June 07 - 11, 2004 Tuscon, AZ, USA.. Available online via subscription, Retrieved December 22, 2004, from http://csdl.computer.org/comp/proceedings/jcdl/2004/2493/00/24930279abs.htm

Stanescu, A. (2004). Assessing the Durability of Formats in a Digital Preservation Environment: the INFORM Methodology. D-Lib Magazine, 10(11), November 2004. Retrieved December 22, 2004, from
http://www.dlib.org/dlib/november04/stanescu/11stanescu.html

Van Wijngaarden, H. and Oltmans, E. (2004). Digital Preservation and Permanent Access : the UVC for Images. The Hague: Koninklijke Bibliotheek, 2004. PDF Retrieved December 22, 2004, from
http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/uvc-ist.pdf

Verdegem, R. and Slats, J. (2004). Practical Experiences of the Dutch Digital Preservation Test-Bed. VINE 34(2): 56-65. Available online via subscription, Retrieved December 22, 2004, from http://www.emeraldinsight.com/rpsv/cw/mcb/03055728/v34n2/s2/p56


 

2.4 Audiovisual Archiving

The second edition of Audiovisual Archiving : Philosophy and Principles (Edmondson, 2004) has been published by UNESCO. The document provides a broad introduction to the basic concepts and issues in audiovisual archives and archiving, and includes an overview of the technical and management issues in preserving audiovisual material.

The International Association of Sound and Audiovisual Archives (IASA) released its Guidelines on the Production and Preservation of Digital Audio Objects in August, 2004. The guidelines discuss key principles and standards, metadata, signal extraction from originals and target formats in the production and preservation of digital audio objects, and are recommended as best practice for audiovisual archives by the Sub-Committee on Technology of the Memory of the World Programme of UNESCO.

Richard Ranft of the British Library Sound Archive has produced an overview of the curatorial principles underlying the preservation of recordings of the sounds of nature (animal sounds, birdsong, etc.). He provides information on collection policies and advice on the transfer of natural sound to digital forms of storage. An appendix provides separately authored case-studies of natural sound archives in Mexico, Columbia and Brazil.

Edmondson, R. (2004). Audiovisual Archiving : Philosophy and Principles, 2nd edition. CI/2004/WS/2. Paris: UNSECO, April 2004. PDF retrieved December 22, 2004, from http://unesdoc.unesco.org/images/0013/001364/136477e.pdf

International Association of Sound and Audiovisual Archives (IASA) (2004). TC-04 - Guidelines on the Production and Preservation of Digital Audio Objects. Aarhus, Denmark: International Association of Sound and Audiovisual Archives (IASA), August 2004. ISBN 87 990309 1 8.

Ranft, R. (2004). Natural sound archives: past, present and future. Anais da Academia Brasileira de Ciencias, 76(2), 455-465. PDF Retrieved December 22, 2004, from http://www.scielo.br/pdf/aabc/v76n2/a41v76n2.pdf


 

3. Other publications

 

3.1 Evolving Issues and Digital Preservation

A number of recent reports and articles have described the changing landscape of digital information and users, and the evolving issues for digital preservation.

OCLC (2004) have released the report 2004 Five Year Information Format Trends: Content, Not Containers, which explores "unbundling of information content" and that users care little about format containers (books, CDs, files etc), but focus on content. The report discusses projected shifts in the volume and distribution of digital content and the rise of social publishing such as blogs and wikis. Such shifts have implications for the planning and implementation of digital preservation strategies.

The August 2004 issue of RLG DigiNews features an interview with Clifford Lynch (2004) of the Coalition for Networked Information (CNI) that touches on many aspects of digital preservation including stewardship, technical aspects and current and future research. It also discusses developing issues such as preservation of courseware, weblogs and the concept of deep time preservation.

Lavoie and Dempsey (2004) examine various aspects of digital preservation within wider social and cultural contexts from the perspectives of curation, maintenance, incentives and funding processes, rather than the technical issues which are often the focus of digital preservation.

Geser and Pereira (2004) of Salzburg Research have compiled a report for DigiCULT looking at the next decade of developments in information and communication technologies for the management of digital heritage. The report is intended to be used as a management tool, e.g. to provide an overview of likely technological developments over the next 10 to 15 years and to identify requirements for research and technological development.

Geser, G., & Pereira, J., eds., (2004). The future digital heritage space: an expedition report. DigiCULT Thematic Issue, 7, December. PDF Retrieved December 22, 2004, from: http://www.digicult.info/downloads/dc_thematic_issue7.pdf

Lavoie, B. and Dempsey, L. (2004). Thirteen Ways of Looking at...Digital Preservation.D-Lib Magazine10(7/8), July/August 2004. Retrieved December 22, 2004, from http://www.dlib.org/dlib/july04/lavoie/07lavoie.html

Lynch, C.A. (2004). Editor's Interview with Clifford A. Lynch.RLG DigiNews 8(4), August 2004. Retrieved December 22, 2004, from http://www.rlg.org/en/page.php?Page_ID=19481#article0

OCLC (2004). 2004 Information Format Trends : Content, Not Containers. Dublin, Ohio: OCLC Online Computer Library Centre, 2004. Retrieved December 22, 2004, from http://www.oclc.org/reports/2004format.htm


 

3.2 Other recent publications

Barata, K. (2004). Archives in the digital age. Journal of the Society of Archivists, 25(1), 63-70.

Kimberley Barata of Missenden Consulting LLP summarises the state-of-the art in digital recordkeeping in the UK, based on a report undertaken for Resource (now the Museums, Libraries and Archives Council).

CLIR. (2004). Access in the Future Tense. Washington, D.C.: Council on Library and Information Resources. ISBN 1-932326-09-X. Retrieved December 22, 2004, from: http://www.clir.org/pubs/abstract/pub126abst.html

Invited papers by Daniel Greenstein (University of California), Anne Kenney (Cornell University), Bill Ivey (Vanderbilt University) and Brian Lavoie (OCLC Office of Research) on the key features of the changing information environment. Abby Smith provides a general overview and a look at the implications of their findings.

Duff, W., Craig, B., & Cherry, J. Historians' use of archival sources: promises and pitfalls of the digital age. Public Historian, 26(2), 7-22. PDF Retrieved December 22, 2004, from: http://www.slis.indiana.edu/faculty/meho/L625/duff01.pdf

Reports on a survey of history department faculty in Canadian degree-awarding insitutions, highlighting the need for complete and full finding aids and good-quality digitisation of historical documents.

Kristine Fallon Associates, Inc. (2004). Collecting, archiving and exhibiting digital design data. Chicago, Ill.: Art Institute of Chicago, Department of Architecture. Individual chapters in PDF and video-clips Retrieved December 22, 2004, from:
http://www.artic.edu/aic/collections/dept_architecture/ddd.html

The product of a major study of the collection, archiving and use of digital design data undertaken for the Art Institute of Chicago in a project jointly funded by the Schiff Foundation and the Graham Foundation for Advanced Studies in the Fine Arts. The report includes a description of the current state-of-the-art for digital design tools and data (including case studies) and an OAIS-based workflow for moving data from producers to repository, and making it accessible.

Rechtsanwalte Goebel und Scheller (2004). Digitale Langzeitarchivierung und Recht. nestor - Kompetenznetzwerk Langzeitarchivierung und Langzeitverfugbarkeit Digitaler Ressourcen fur Deutschland. PDF Retrieved December 22, 2004, from:
http://www.langzeitarchivierung.de/downloads/nestor_mat_01.pdf
Also available at: http://nbn-resolving.de/urn/resolver.pl?urn=urn:nbn:de:0008-20040916022

A German language study by Goebel und Scheller of the legal context of long-term preservation in Germany.

Lekkas, D., & Gritzalis, D. (2004). Cumulative notarization for long-term preservation of digital signatures. Computers & Security, 23(5), 413-424. PDF Retrieved December 22, 2004, from: http://www.syros.aegean.gr/users/lekkas/pubs/j/2004COMPSEC.pdf

A technical paper on the long-term validation of digital signatures. Some preservation approaches will depend on trust in digital signatures, however these will often have a shorter life-expectancy than the documents they are associated with. The authors propose a means of basing signature verification on only those trust relationships, data and technologies available at the time of verification.

Slattery, O., Lu, R., Zheng, J., Byers, F. and Tang, X. (2004). Stability Comparison of Recordable Optical Discs : a Study of Error Rates in Harsh Conditions. Journal of Research of the National Institute of Standards and Technology, 109(5):517-524, September-October 2004. PDF Retrieved December 22, 2004, from: http://nvl.nist.gov/pub/nistpubs/jres/109/5/j95sla.pdf

This NIST report documents CD and DVD recordable disc tests which were conducted in collaboration with the Library of Congress. The basis of the test was accelerated aging using humidity and temperature and prolonged exposure to light. The report discusses dye types, reflective surfaces and variations in storage conditions as influences on reliability and longevity, and concludes that archive grade parameters should be based on quality measures rather than brand or manufacturer.

Van de Sompel, H., Nelson, M. L., Lagoze, C., & Warner, S. (2004). Resource harvesting within the OAI-PMH framework. D-Lib Magazine, 10(12), December. http://www.dlib.org/dlib/december04/vandesompel/12vandesompel.html

Investigates the use of the Open Archives Protocol for Metadata Harvesting for the harvesting of digital resources, not just metadata.

Van Drimmelen, W. Universal access through time: archiving strategies for digital publications. Libri, 54 (2), 98-103.

The director of the National Library of the Netherlands (Koninklijke Bibliotheek) provides his analysis of the digital preservation issue, based on a presentation made at the STM Conference in Amsterdam, May 2003.


 

4. Events

 

4.1 Recent events

Chinese-European Workshop on Digital Preservation, Beijing

The program and presentations are available from this workshop, which was held at the Library of Chinese Academy of Sciences, Beijing, China, on July 14-16, 2004, and which brought together both Chinese and European experts in digital preservation. Retrieved December 22, 2004, from
http://www.csdl.ac.cn/meeting/cedp/index_en.html

8th European Conference on Research and Advanced Technologies for Digital Libraries (ECDL 2004), Bath

The 8th European Digital Library Conference was held at the University of Bath on 12-17 September 2004, and was organised by UKOLN.

All peer-reviewed papers accepted for the conference were published by Springer, in the series Lecture Notes in Computer Science (Vol. 3232, ISBN 3-540-23013-0, book details Retrieved December 22, 2004, from:
http://www.springeronline.com/sgw/cda/frontpage/0,11855,3-40109-22-34393454-0,00.html).
The programme and selected presentations are available from the ECDL 2004 Web site, Retrieved December 22, 2004, from
http://www.ecdl2004.org/

Reports of the conference have been provided by Hey (2004) and Holmstrom (2004).

Hey, J. (2004). ECDL 2004: A Digital Librarian's Report. Ariadne, Issue 41, October 2004. Retrieved December 22, 2004, from:
http://www.ariadne.ac.uk/issue41/ecdl2004-rpt/

Holmstrom, J. (2004). Report on the 8th European Conference on Digital Libraries (ECDL 2004), 12-16 September 2004, Bath, United Kingdom. D-Lib Magazine, Issue 10(10), October 2004. Retrieved December 22, 2004, from:
http://www.dlib.org/dlib/october04/holmstrom/10holmstrom.html

4th International Web Archiving Workshop (IWAW'04) , Bath

Held in conjunction with ECDL 2004, the 4th International Web Archiving Workshop was held on 16 September 2004 at the University of Bath. Full papers from the Workshop are available from the IWAW'04 Web site, Retrieved December 22, 2004, from
http://www.iwaw.net/04/.

A range of tools and techniques were presented in the morning session, with the afternoon session dedicated to experiences and progress in a number of Web archiving initiatives from around the world. (See section 2.1 Web Archiving, above, for a brief description of individual papers.)

Short reports of the Workshop are also provided by Day (2004) and Masanes and Rauber (2004).

Day, M. (2004). The 4th International Web Archiving Workshop. Ariadne, Issue 41, October 2004. Retrieved December 22, 2004, from http://www.ariadne.ac.uk/issue41/ecdl-web-archiving-rpt/

Masanes, J. and Rauber, A. (2004). Report on the 4th International Web Archiving Workshop (IWAW): 16 September 2004, Bath, United Kingdom. D-Lib Magazine, 10(11), November 2004. Retrieved December 22, 2004, from
http://www.dlib.org/dlib/november04/masanes/11masanes.html

Digital Preservation in Institutional Repositories, London

The challenges of institutional repositories and the need to build on experience and expertise were the emerging themes of the 9th DPC forum, jointly organised by CURL and the British Library, which was held at the British Library Conference Centre on 19 October 2004.

A report on the 9th DPC forum and its associated presentations are available from the Digital Preservation Coalition Web site. Retrieved December 22, 2004, from: http://www.dpconline.org/graphics/events/041019forum.html

DC-2004: International Conference on Dublin Core and Metadata Applications, Shanghai

The program and papers for the DC-2004 conference, held in Shanghai, China on 11-14 October 2004, are now available from the conference Web site. Retrieved December 22, 2004, from: http://dc2004.library.sh.cn/

Of particular interest may be the following papers:

Kebbell, A., & Campbell, D. (2004). Managing digital objects and their metadata: challenges and responses. Retrieved December 22, 2004, from http://students.washington.edu/jtennis/dcconf/Paper_09.pdf

Aschenbrenner, A. (2004). A methodology for metadata modelling - depth for a flat world. Retrieved December 22, 2004, from http://students.washington.edu/jtennis/dcconf/Paper_22.pdf

19th International CODATA Conference - The Information Society: New Horizons for Science, Berlin

The program and abstracts for the 19th International CODATA Conference, which was held in Berlin on 7-10 November 2004, are now available from the Conference Web site. Full papers from the proceedings are expected to be available online soon.

Of particular interest may be papers from the Data Archiving stream, Retrieved December 22, 2004, from:
http://www.codata.org/04conf/abstracts/DataArchiving/index.html

Archiving Web Resources: Issues for Cultural Heritage Institutions, Canberra

The program and abstracts are available for the international conference Archiving Web Resources: Issues for Cultural Heritage Institutions, which was held at the National Library of Australia on 9-11 November 2004, along with those from the associated Information Day, held on 12 November 2004. Retrieved December 22, 2004, from http://www.nla.gov.au/webarchiving/

Among papers presented is a paper by Cathro (2004) which examines key issues and challenges in preserving both the primary and secondary outputs of research.

Cathro, W. (2004). Preserving the Outputs of Research. Paper presented at Archiving Web Resources: Issues for Cultural Heritage Institutions, National Library of Australia, Canberra, 9-11 November 2004. Retrieved December 22, 2004, from:
http://www.nla.gov.au/nla/staffpaper/2004/cathro1.html

ERPANET Events

Notes and presentations are available from three recent ERPANET events:

Seminar on Business Models, Amsterdam (September 20-22, 2004). Retrieved December 22, 2004, from
http://www.erpanet.org/events/2004/amsterdam/

Workshop on Workflow, Budapest (October 13-15, 2004). Retrieved December 22, 2004, from
http://www.erpanet.org/events/2004/budapest/

Workshop on the Preservation of Digital Art, Glasgow (October 8, 2004). Retrieved December 22, 2004, from
http://www.erpanet.org/events/2004/glasgowart/


 

4.2 Forthcoming Events

2005

February

Information Online 2005 : 12th Conference and Exhibition. 1 - 3 Feb 2005 , Sydney, Australia.
Retrieved December 22, 2004, from: http://conferences.alia.org.au/online2005/

International Conference on Information Management in a Knowledge Society. 21 - 25 February 2005, Mumbai, India.
Retrieved December 22, 2004, from: http://www.icim2005.org/index.htm

ECURE : Preservation and Access for Digital College and University Resources, 28 February - 2 March 2005, Tempe, Arizona, USA.
Retrieved December 22, 2004, from: http://www.asu.edu/ecure/

March

Seventh Asia Pacific Web Conference 2005, 29 March - 1 April 2005 , Shanghai, China.
Retrieved December 22, 2004, from:
http://apweb05.csm.vu.edu.au/index.asp

April

Museums and the Web 2005 : the International Conference for Culture and Heritage Online, 13 - 16 April 2005 , Vancouver, Canada.
Retrieved December 22, 2004, from:
http://www.archimuse.com/mw2005/index.html

IS & T (Society for Imaging Science & Technology) Archiving Conference 2005, 26 - 29 April 2005, Washington, D.C., USA.
Retrieved December 22, 2004, from:
http://www.imaging.org/conferences/archiving2005/

May

14th International World Wide Web Conference : WWW 2005, 10 - 15 May 2005, Chiba City, Japan.
Retrieved December 22, 2004, from: http://www2005.org/

June

Joint Conference on Digital Libraries 2005 : Digital Libraries Cyberinfrastructure for Research and Education, 7 - 11 June 2005 , Denver, Colorado, USA.
Retrieved December 22, 2004, from: http://www.jcdl2005.org/

ELPUB 2005 : From Author to Reader : Challenges for the Digital Content Chain, 8 - 10 June 2005, Leuven-Heverlee, Belgium.
Retrieved December 22, 2004, from: http://canada.esat.kuleuven.ac.be/elpub2005/home.htm

September

DC-2005: International Conference on Dublin Core and Metadata Applications 2005, 12-15 September 2005, Madrid, Spain.
Retrieved December 22, 2004, from: http://dublincore.org/

ECDL 2005: 9th European Conference on Research and Advanced Technology for Digital Libraries, 18-25 September 2005, Vienna, Austria.
Retrieved December 22, 2004, from:
http://www.ecdl2005.org/

ETD 2005 : 8th International Symposium on Electronic Theses and Dissertations, 27 - 30 September 2005, Sydney, Australia.
Retrieved December 22, 2004, from:
http://adt.caul.edu.au/etd2005/

Refresh : First International Conference on the Histories of Media Art, Science and Technology, 28 September - 3 October 2005, Banff, Canada.
Retrieved December 22, 2004, from:
http://www.mediaarthistory.org/

November

Sommet Mondial sur la Societe de l'Information 2005 : World Summit on the Information Society Conference 2005, 16 - 18 November 2005, Tunis, Tunisia.
Retrieved December 22, 2004, from:
http://www.smsitunis2005.org/plateforme/home.htm

A comprehensive and frequently updated list of forthcoming events is available from the PADI Web site:
http://www.nla.gov.au/padi/format/event.html


Problem links last disabled or updated: 30 September 2009

Warning! Web site links tend to have very short lifetimes, as documents are frequently updated or deleted, Web sites are restructured, domains are renamed or moved, etc. The compilers of this bulletin, therefore, cannot guarantee that all of the URLs in this document will successfully resolve to the resources described here. However, in these cases, try searching for the same resource on the PADI gateway (http://www.nla.gov.au/padi/), which will provide updated URLs wherever possible.


This content has been locked. You can no longer post any comment.


Scroll to top