Sarah Middleton

Sarah Middleton

Last updated on 28 September 2016

In this issue:

  • What's on, and What's new
  • Editorial: Digital curation skills: Who needs them and how do you get them? Joy Davidson, Associate Director, Digital Curation Centre (DCC).
  • Who's who: Sixty second interview with Jen Mitcham of Archaeology Data Service
  • One world: Maggie Jones, Australia (previous Executive Secretary of the DPC)
  • Your view: commentary, questions and debate from readers
Compiled by Najla Rettberg. What's new is a joint publication of DPC and DCC. Also available as a print friendly PDF

What's on:

Sustainable Economics for a Digital Planet: Ensuring Long-term Access to Digital Information.
6 May 2010
JISC and The Blue Ribbon Task Force on sustainable digital preservation and access are sponsoring a one day free symposium on Thursday 6th May 2010 at the Wellcome Trust Conference Centre in London.
Open Knowledge Scotland
13 May 2010
This event brings together interested parties from across the open knowledge spectrum based in Scottish educational institutions, Scottish research organisations, Scottish local and national government, and members of the public for the purposes of teaching, learning and discussion.
JISC MRD Progress Workshop
17-18 May 2010
The purpose of the workshop is to consider how best to meet the challenge of managing research data in UK Universities. Participants will be informed of progress made under the JISC MRD Programme to examine researchers’ data management requirements and improve planning in a number of academic and institutional contexts. The objective will be to discuss the detail of the projects’ discoveries and to consider the more general landscape and initiatives required.
DCC Tools of the Trade Workshops
19 May 2010
The DCC is delighted to announce two new Tools of the Trade workshops. The first will introduce users to the newly refined Data Asset Framework (DAF) toolkit and the second, held in cooperation with the University of London Computing Centre (ULCC), will introduce users to the Assessing Institutional Digital Assets (AIDA) toolkit.
7th International Conference on Preservation of Digital Objects (IPRES 2010)
19-24 September 2010
The Austrian National Library and the Vienna University of Technology are pleased to host the International Conference on Preservation of Digital Objects (iPRES2010) in Vienna in September 2010. iPRES2010 will be the seventh in the series of annual international conferences that bring together researchers and practitioners from around the world to explore the latest trends, innovations, and practices in preserving our digital heritage.
Storing information in the Cloud Unconference
21 May 2010
This workshop-based unconference is part of a project currently run by the Department of Information Studies at Aberystwyth University and funded by the Society of Archivists looking into security, operational and governance issues of storing information in the cloud.
DL.org Summer School on “Digital Libraries & Digital Repositories:
Interoperability Perspectives
6-10 June 2010
The EU-funded DL.org project is delivering the DL.org Summer School on “Digital Libraries & Digital Repositories: Interoperability Perspectives” from the 6th to the 10th of June 2010 in Tirrenia (Pisa), Italy. Participation in the DL.org Summer School will assist attendees in understanding how to address interoperability challenges, approaches and techniques within the context of digital libraries and digital repositories.
Managing Data in Difficult Times: policies, strategies, technologies and infrastructure to manage research and teaching data in a fast changing technological and economic environment
1-2 July 2010
Following the success of previous conferences held in venues such as York and Belfast, JISC and the Coalition for Networked Information (CNI) are proud to announce the 8th International Meeting that will be held at the Barcelo Carlton Hotel, Edinburgh. The meeting will bring together experts from the United States, Europe and the United Kingdom. Parallel sessions will explore and contrast major developments that are happening on both sides of the Atlantic. It should be of interest to all senior management in information systems in the education community and those responsible for delivering digital services and resources for learning, teaching and research.
1st Karlsruhe Summer School on Service Research
18-22 July 2010
Through lectures, tutorials and social events, the Summer School will provide a forum for participants to discuss and learn about Service Research. Participants can benefit from the expertise of the lecturers and share experience with fellow attendees. Furthermore, the Summer School will foster interdisciplinary research and collaboration opportunities among international students and researchers interested in the disparate fields within Service Research.
RDMF5: Economics of Applying and Sustaining Digital Curation
27-28 October 2010
This will be the fifth meeting of the Research Data Management Forum. Aimed at researchers, digital repository managers, staff from library, information and research organisations, data curators, data centre managers, data scientists, research funding organisations and research networks, the event will address the topic "Economics of Applying and Sustaining Digital Curation."
6th International Digital Curation Conference 2010: “Participation and Practice: Growing the Curation Community through the Data Decade”
6 - 8 December 2010
Digital curation manages, maintains, preserves, and adds value to digital data throughout the lifecycle, reducing threats to long-term value, mitigating the risk of digital obsolescence and enhancing usefulness for research and scholarship.

What's New:

PREMIS in METS Toolbox
The Library of Congress and the Florida Center for Library Automation (FCLA) are pleased to announce the availability of the PREMIS-In-METS Toolbox. The PREMIS-in-METS Toolbox is a set of open-source tools developed to support the implementation of PREMIS preservation metadata in the METS container format.
Ensuring Perpetual Access- German National Hosting Strategy - Study available
The Ensuring Perpetual Access: establishing a federated strategy on perpetual access and hosting of electronic resources for Germany study is now available. The study was commissioned by the Alliance of German Science Organisations to help develop a strategy to address the challenges of perpetual access and hosting of electronic resources. It was requested to focus primarily on commercial e-journals and retro-digitised material.
New JHOVE2 alpha release v. 0.6.0
A new alpha release of JHOVE2 is now available for download and evaluation (v. 0.6.0, 2010-03-17). Distribution packages (in zip and tar.gz form) are available on the JHOVE2 public wiki.
InChI Trust Website is now live
The InChI Trust develops and supports the non-proprietary IUPAC InChI standard and promotes its uses to the scientific community. The Trust's goal is to enable the interlinking and combining of chemical, biological and related information, using unique machine-readable chemical structure representations to facilitate and expedite new scientific discoveries.
Memento Updates
The Memento project – partly funded by the Library of Congress – aims to make access the Web of the past as easy as access the current Web. The MementoFox add-on for FireFox browsers and a Memento plug-in for the MediaWiki platform have been released.
KEEP Newsletter now available
The second issue of EC-funded KEEP Project newsletter features s summary of the KEEP project objectives; emulation portability and examines the Case for Local Software Preservation.

Editorial: Joy Davidson, Associate Director, Digital Curation Centre (DCC)

Digital curation skills: Who needs them and how do you get them?

In the words of William Kilbride in last month's What's New editorial,

'Information management is not trivial and it’s not new: but dependence on digital sources for research and a massive increase in data volumes and complexity means that researchers face new challenge'.

Couldn't have said it better myself! Over the last decade we've seen a vast increase in the general awareness about the need for digital curation and preservation activity. Indeed, many funding bodies are now seeking assurances from institutions and the researchers they employ at the bid stage that they are ready and able to manage access to their digital information over time. The DCC's handy policy overview shows just what UK funders' currently expect. But just who is responsible for digital curation and how do those charged with responsibility get the skills they need to do the job?

First things first - who is responsible for digital curation? Well, there really isn't any single role within an institution that can take on the effective management of digital information from creation through to re-use in isolation. Whether you're a researcher, a librarian, an IT specialist, or a senior manager you've all got a role to play. What is less clear at this point is when and how the various roles should interact to best effect. Obviously, this will vary from institution to institution depending on infrastructures and support systems. The ongoing JISC Research Data Management Infrastructure (RDMI) Programme projects should go some way towards identifying current and good practice in this respect and will produce some exemplars that other UK HEIs can follow. What we do know for certain is that clear and regular communication between the range of stakeholders is essential from the outset of any digital curation activity. It is easy to under-estimate just how vital communication is as you focus on seemingly more complex technical and operational challenges. But, without clear communication about just what it is you are aiming to achieve you may find yourself facing unnecessary roadblocks and costly delays. It may seem petty, but do spend some time early on agreeing and communicating key terms - like just what it is you mean by the term digital preservation - as different stakeholders can have very different interpretations!

Ok - we know who is responsible for digital curation but now we need to determine how the various stakeholders can get the skills they need to undertake their specific roles in the digital curation lifecycle. There has been a lot of work in recent years to develop intensive training courses for data custodians and digital preservation practitioners such as the Digital Preservation Training Programme (DPTP), Digital Curation 101, and Digital Futures. These courses all aim to attract participants from a range of professional backgrounds to ensure that a wide variety of perspectives are shared and that viable curation approaches can be jointly developed and implemented at the institutional level. These courses have proved very successful to date and have led to some real changes in working practice within institutions. There are also numerous postgraduate courses emerging that aim to produce professional data curators such as the MA in Digital Asset Management (MADAM) at Kings College London and the Graduate School of Library and Information Sciences MSc in Data Curation specialism at the University of Urbana Champagne.

So data custodians and preservation practitioners have some formal and informal training options, but how about reaching those who are generating research data in various disciplines? In their 2008 report to JISC, Swan and Brown recommended the development of short, postgraduate training courses aimed at researchers to help ensure that basic data management and curation skills are embedded into professional research practice. Taking this recommendation forward, JISC has recently issued a call for bids to develop disciplinary-specific research data management training programmes. In addition, several of the current JISC RDMI projects are also seeking to develop and implement postgraduate level training as part of their sustainability plans. Researchers also have increasing access to a number of high quality resources and dedicated support like those being produced by the UK Data Archive to assist them with their data management and curation activity. So researchers from a range of disciplines should have access to a number of postgraduate training options for honing their curation skills in the coming years.

But how can we start to ensure that formal and informal educational programmes for professional data curators and researchers complement each other and allow for portability of skills across both institutions and countries? At the moment, there are several international working groups trying to establish basic skill-sets for emerging professional data curators. The International Data curation Education Action (IDEA) and the European Commission MSc in Digital Curation and Preservation working groups are both aiming to pin down minimum requirements to allow for greater comparability of skills across current educational programmes. The RIN Information Handling Working Group are also active in this area and are using the draft Vitae Researcher Developer Framework (RDF) - which is intended for use as professional and career development planning tool for researchers at all stages of their career - as a means of benchmarking emerging training resources and providing pathways through heterogeneous professional development training offerings.

As you can see, there is a lot going on at the moment to help build capacity and hone curation skills. However, it seems the more we do the more we realise needs to be done so watch this space to keep up to date with what’s new and what’s on the horizon in the field of digital curation training.


Who's Who: sixty second interview with Jen Mitcham, Archaeology Data Service

Jen Mitcham, ADS

Where do you work and what's your job title?

I am a Curatorial Officer at the Archaeology Data Service (ADS) based at the University of York. Our offices are in King’s Manor which is a beautiful Grade I listed building just outside York city walls.

Tell us a bit about your organisation

We are a digital archive for archaeological data from all sectors. We were originally set up in 1996 and hosted one of the 5 subject centres of the Arts and Humanities Data Service (we were also known as AHDS Archaeology), but in 2008 funding for AHDS was discontinued, and AHDS Archaeology was no more. The AHRC agreed to continue to support archaeology though and the ADS has been directly funded by them for the last couple of years.

How did you end up in digital preservation?

By accident really. I started off with a degree in archaeology and worked as a field archaeologist for a few years. Then I did a MSc course in archaeological computing and got more involved in the computer side of archaeology (databases, web sites, Geographic Information Systems, that sort of thing). On the basis of my computer skills, I managed to get a job at the Archaeology Data Service and have been here for 7 years now. Digital preservation has been something I have learnt ‘on the job’ while I have been here.

What projects are you working on at the moment?

We are really excited to be building up to the big launch of our brand new website. This is something that we have been working on behind the scenes for many months now but it is great to see it all finally coming together. I have been concentrating on fine tuning the web delivery of all of our digital archives. Whereas in the past we have hard coded details of each archive into the html pages, we now have a more dynamic web site which allows many of the details of each of our 300+ collections to pulled out of an underlying database. This will enable us to display each of our archives in a much more consistent way, which makes me very happy indeed!

I am also busy working on lots of other bits and pieces. As well as individual archives such as a searchable database of fieldwork summaries for Medieval and Post-Medieval Britain and Ireland, I am also interested in the ‘bigger picture’ of digital archiving. Certification of archives is something I need to spend more time looking at when I get the chance.

What are the challenges of digital preservation for data services such as yours?

A lot of the challenges are the same as any other repository would have to deal with, though some headaches are caused by the wide range of file types that we have to handle. A lot of the projects we are asked to archive feature cutting edge research using new and innovative technologies. As well as the more common documents, databases and images that can be found in the majority of archives, we also have to deal with the outputs from maritime and terrestrial geophysics, photogrammetry, Lidar, virtual reality and anything else that our depositors come up with. Finding ways of trying to preserve these sorts of data can be a big challenge. They are invariably large in size and come in a huge variety of proprietary and binary data formats …sigh!

We are also continuing to work with data creators to think about the costs of digital archiving during the earliest stages of a project. Digital archiving is often perceived as expensive and it is a challenge to convey to potential depositors the difference between digital archiving (and the time and costs involved in that) and disseminating something on line with no archival backup.

Another more specific issue for archaeological data is that a lot of these born-digital data sets come from fieldwork or excavations that are non-repeatable. In excavating an archaeological site you are in effect destroying the remains that are in the ground. Once it has been excavated and recorded, there is no way you can go back and repeat the exercise. The digital records that are created on site during these excavations are therefore a pretty valuable resource and that is why it is important that we do a good job of preserving it.

What projects would you like to work on in the future?

We are just starting a project to implement Fedora Commons as our digital object management system. This is one of those things we have been talking about for a long time but never had the time/money to implement. We currently have 300+ collections with an estimated one million files so we didn’t think it was going to be trivial task to get this all set up and all of our current data recorded in the new system! We are very lucky to have received funding from DEDEFI to make this dream a reality and this is something I am hoping to get involved with as the project gets underway.

As I said earlier I am interesting in the certification of digital archives and want to continue following closely any developments in this area. The TRAC checklist and the Data Seal of Approval have so far been useful benchmarks for us in make sure the ADS is moving in the right direction though we haven’t formalized any of this as yet. This is definitely something for the future.

What sort of partnerships would you like to develop?

We are always keen to work with other archaeological digital archives in order to avoid duplication of effort in archiving material deposited in other locations, whilst still ensuring that methodologies are employed to ensure users can find the archives regardless of their home (through the use of metadata harvesting, web services etc). We currently work closely with the London Archaeological Archive and Research Centre (LAARC), the Royal Commission on the Ancient and Historical Monuments of Scotland (RCAHMS) and the British Geological Survey (BGS) for example.

If just one tool or standard could be brought into existence that would make your job easier, what would it be?

Two things actually (I am being greedy?)

  1. A metadata extraction tool (like JHOVE or with the combined power of FITS) that can recognize and process all of the many file formats that we deal with. The existing tools are a great, but we want to be able to collect that level of information for all the weird and wonderful types of file that our depositors give us.
  2. A tool that reliably and seamlessly batch processes PDF files into PDF/A …..now wouldn’t that save a whole lot of time?

If you could save for perpetuity just one digital file, what would it be?

On the basis of what I said earlier about archaeological excavation being non-repeatable, I think it would have to be a large excavation database such as the one created by Framework Archaeology during the excavations in advance of improvement works at Stansted airport. See: http://ads.ahds.ac.uk/catalogue/archive/stansted_framework_2009/

The stratigraphy, finds and features of this highly complex multi-period site were recorded in an extensive database which we have archived as a series of delimited text files. The database was used to drive a Geographic Information System of the site which we have replicated on-line as far as possible.

Kings Manor, ADS offices, York

As the interpretation of archaeological data can be quite a subjective thing, the same dataset could be made to tell several different stories. Preserving a database such as this is crucially important as it will allow future archaeologists to return to the primary data in order to test new hypothesis and carry out further analysis, leading to fresh interpretations of the site. Keeping datasets such as this one alive for further research is really what we are all about!

Finally, where can we contact you or find out about your work?

You can have a look at our website at http://ads.ahds.ac.uk (though the new website won’t be available publicly for a few months yet) or e-mail me at This email address is being protected from spambots. You need JavaScript enabled to view it.


One World

In this section we invite a partner or colleague to update us about a major digital preservation activity in their country. This month, former DPC Director, Maggie Jones highlights some digital preservation activities in Australia

Maggie Jones

Maggie Jones

With thanks to Cornel Platzer and Michael Carden [NAA]; Maxine Davis, Douglas Elford , David Pearson, and Colin Webb [NLA] who provided insights into their current work. Any conclusions drawn, particularly with regard to the efficiency dividend, are mine and do not necessarily reflect those of either institution.

This brief report was written from the perspective of two cultural institutions; the National Archives of Australia and the National Library of Australia. Both organisations are digital preservation pioneers and have been practising what they preach for several years. This article provides a brief overview of some of the issues they are thinking about as they move well beyond project phase and into fully integrating digital preservation within their respective organisations.

The National Archives of Australia has a dual role, articulated on its website as to:

  • promote good records management in Australian Government agencies
  • manage the valuable records of our nation, and make them accessible now and for future generations

In terms of digital preservation, the NAA sought to fulfil these roles by developing tools to efficiently manage the ingest of digital records into their archival store. One of these tools, Xena [XML electronic normalising for archives] detects the file formats of ingested digital objects and then converts digital records into open formats. The second tool, the Digital Preservation Recorder, manages the integrity and authenticity of records by recording and maintaining metadata related to preservation actions . Version 5 of Xena was recently released and at each update, NAA staff have been careful to take on board comments and feedback received from other organisations who have used Xena, as well as their own ongoing requirements. Xena is freely available, released under the GNU General Public License and NAA staff welcome feedback on it. The rationale behind Xena is that by converting [or normalising] at ingest into open formats, digital records will have greater prospects for longevity and ease of management over the long term. The latest version also incorporates an OCR feature employing Google’s Tesseract software, which enables extraction of plain text from scanned documents. This feature was designed with longer term enhanced access for researchers in mind but is likely to prove valuable in the short term for greater efficiency in access examination by NAA staff.

The NAA is currently investing further effort in preparing for an increasing volume of digital records from Australian government agencies by undertaking projects to test the efficient ingest and management of anticipated categories of records, such as database backed business systems. NAA know they will need to be able to deal with legacy systems and their own practical experience is now enabling them to better prepare for future challenges. The need to be able to scale operations to meet demand is a major source of effort and research on NAA’s part but it appears that transformation on ingest is becoming an increasingly efficient strategy for managing and preserving digital records.

Under its Act, the National Library of Australia is required to develop and maintain a national collection of library materials and to make this material available. Translating this requirement to the digital world has been occupying the NLA for close to twenty years so they have now built up considerable experience and expertise in all things digital. PANDORA (Preserving and Accessing Networked Documentary Resources of Australia), one of the first web archives in the world, began in 1996 and several other digital projects preceded that. Improving efficiency is also on the minds of NLA digital preservation staff as their digital materials collection has just reached 0.5PB. This is gathered from a number of different sources, including manuscript material, harvested websites, negotiated websites, a significant Oral History collection, and their own extensive digitisation programmes.

As well as volume, the diversity of digital material collected is posing some significant practical challenges. Like the NAA, the NLA has needed to develop its own tools to support its programmes. Recent tools include Prometheus, a tool developed specifically for NLA acquisitions and cataloguing staff to efficiently transfer data on unstable physical carriers to mass storage. Another NLA specific tool has also been developed to provide information on carriers for collection areas to facilitate informed decisions on acquisition of content on various carriers. Media Pedia is a tool designed to provide a knowledge base of structured descriptions of a wide range of data carriers. The latter is likely to have a more generic utility and indeed collaborative community development of the tool is actively encouraged. These tools have been developed by NLA in response to a detailed risk assessment exercise, in this case focussing on the specific risks associated with content held on discrete physical carriers.

A further tool currently under development is ‘Configulator’, a means of documenting the various hardware/software permutations needed to provide access to content held in a range of file and format carriers. The intention is to encourage community sharing of such information to enable indicators of effective obsolescence. NLA is also maintaining viewpath components in the pragmatic belief that it makes no sense to discard working instances of them before alternative means of access can be assured. They stress however that this is not a commitment to a technology museum strategy, simply a further risk management approach.

Collaboration is seen to be key though this in itself is quite resource intensive and sometimes different funding regimes can inhibit effective collaboration globally. It is also often necessary to invest significant time and effort into building effective partnerships during which little tangible result may be seen short term. The One World article contributed by Inge Angevaare in April’s instalment of What’s New? has some interesting observations on levels of collaboration, based on four sectors.

In terms of web archiving, NLA’s membership of the International Internet Preservation Consortium (IIPC ) has been an important source of pooling experience and expertise. The IIPC was formed in 2003 by 12 organisations, including the NLA. It currently has 37 members. A current IIPC Preservation Working Group project the NLA is participating in aims to document the tools which provide effective access to live web content. An earlier IIPC project tested available tools for their usefulness in preserving meaningful access to web content and concluded that there are no currently available tools capable of satisfactory transformation of web archives or for emulation of the full technical environment required to represent web archives in a different environment. Moving beyond capturing dynamic web content to ensuring continued access to it is a challenge facing all organisations involved in preserving this category of content so it makes sense to build on collaboration and partnership to test viable strategies.

The NLA feels they are now in a strong position to understand and identify their specific requirements. This is based in part on their increasing practical experience but also thanks to exercises such as a recent major proposal for additional funding, which provided a catalyst for refining requirements. It has become clear that in order to move the NLA into the next phase will require a new repository capable of streamlining inconsistencies, enabling sound preservation planning, and paving the way for a fully integrated and sustainable system.

In addition to forging a way forward, the NLA has retained a commitment to maintaining the PADI current awareness service, which originated the What’s New in Digital Preservation? updates.

Both NAA and NLA spoke of the difficulties in attracting sufficient, ongoing funding. No surprises there but digital preservation remains a tough act to sell when competing with other priorities.

An additional burden faced by Australian commonwealth government agencies is the annual application of an onerous cost saving measure known as the efficiency dividend. Originally marketed as a short term means to achieve a more streamlined public service, the combination of ingenuity and staff dedication in successfully coping with these cuts without obvious degradation to the level of service has made it an irresistible cost cutting measure for successive governments for the past two decades. This year however, cultural organisations are making a concerted effort to urge the Rudd government to revisit the efficiency dividend and are drawing attention to the negative impact of the cumulative effect of the cuts. Well argued submissions have been made but it remains to be seen whether these will be effective in a federal election year. It is to be hoped they will for cultural organisations generally and digital preservation specifically. Australia made a rapid and early response to the challenges of digital preservation, resulting in one of the world’s first web archives and innovative development of tools and software when there was almost nothing available off the shelf. Whether they can maintain such leadership will inevitably depend on the ability to attract adequate funding to build robust, scalable systems.


Credits
What's new is a joint publication of DPC and DCC complied by Najla Rettberg. Thank you to contributors. If you would like to feature in What's New or have a contribution to the news or events, then please contact us: info (at) dpconline.org

This content has been locked. You can no longer post any comment.


Scroll to top