19 February 2015 | 10:00 - 16:30 / 10:30 - 13:30 Portsmouth | Eldon Building, University of Portsmouth


The E-ARK Project, the Digital Preservation Coalition and the UK Data Archive are delighted to invite you to a workshop on the preservation of relational databases.

DPC and E-ARK logos

Introduction

Relational database management systems are one of the essential building blocks of information technology. Ubiquitous but often obscured behind layers of scripting, processing, forms and queries, they are arguably the most important invention of the Twentieth Century. It’s hard to think of a software application or service that does not have some fundamental dependency on database technologies.

So it’s surprising that the digital preservation community seems to have spent so little time explicitly considering their preservation. More accurately it’s surprising that there is less awareness and integration between the digital preservation community and the various tools and approaches used in commercial settings to manage the long term accessibility of records in database systems known as ‘data warehousing’. There’s no question that databases present a complex challenge to preservation. They can be difficult to document and difficult to understand even when they are documented. The complex interdependencies of data, query and scripting make migration problematic and highly specialised. Database migration is often seen as a purely technical operation, upgrading one legacy system with another soon-to-be-legacy replacement.

Relational databases and to some extent data warehousing approaches, which favour structure and homogeneity, are sometimes contrasted with ‘big data’ approaches that tend to favour heterogeneity and de-normalisation. It could be suggested that a concern with relational databases is outmoded and that the preservation community could simply adopt big data approaches. But the contrast can be overstated, especially when preservation issues are discussed. In practice ‘big data’ tools seem to offer improved workflows that complement rather than replace existing data warehouse tools. And even if ‘big data’ tools are the solution for access they still need to integrate with fundamental preservation processes and standards.

Better technical guidance and organisational know-how are needed if the digital preservation community is to offer confident and consistent solutions to long-term access for relational databases.

This two-part workshop, made possible by the E-ARK project and sponsored by the DPC, will review the state of the art in the preservation of databases and explore emerging themes around the preservation of ‘big data’.

The workshop will be split over two days:

Day one will:

  • Start out by clarifying where databases, data warehouses and big data complement / overlap each other
  • Review the state of the art in the preservation of databases
  • Present case studies of current tools and practices around the preservation of relational databases
  • Introduce commercial approaches to ‘data warehousing’ and explore the relationship with preservation
  • Introduce big data approaches for database preservation

Day two will:

  • Review the state of the art in the use of ‘big data’ and its implications for preservation
  • Examine and debate the use cases for archived databases / big data
  • Identify recommendations for further research and guidance in the preservation of ‘big data’

Interested parties are welcome to attend either or both days.

Who Should Come?

This briefing day will interest:

  • Collections managers, librarians, curators, archivists in memory institutions
  • CIOs and CTOs in organisations with commercial intellectual property
  • Records managers and business analysts with requirements for long-lived data or legacy systems
  • Vendors and developers with digital preservation solutions
  • Researchers with interests e-infrastructure and digital preservation

Draft Programme

Day One – Thursday 19th Feb 2015

1000 – Registration opens, tea and coffee

1030 – Welcome and Introduction

1035 – Why preserving databases matters and why it is harder than it sounds (Matthew Woollard, UK Data Archive (UKDA))

1055 – The E-ARK project and database preservation (Kuldar Aas, National Archives of Estonia)

1135 – What do we mean by big data, data warehousing and OAIS? (Janet Delve, University of Portsmouth; Karin Bredenberg, National Archives of Sweden)

1205 – Q&A

1215 – Lunch State of the art: Case studies in Preserving Databases

1315 – Case study one: Anders Bo Nielsen, Danish National Archives

1335 – Case study two: Andreas Voss, Swiss National Archives

1355 – Case study three: Hélder de Jesus Almeida da Silva, KEEP Solutions

1415 – Case study four: Tarvo Kärberg, National Archives of Estonia

1435 – Q&A

1445 – Tea and coffee State of the art: Case studies in Preserving Databases contd.

1510 – Case study five: Jože Škofljanec, Boris Domajnko, Slovenian National Archives

1530 – Introducing big data solutions: E-ARK big data techniques at AIT, Rainer Schmidt, Austrian Institute of Technology.

1550 – Round table

By 1630 close  

Day Two – Friday 20th Feb 2015

0930 – Registration opens, tea and coffee

1000 – Welcome back and synopsis of day one

1020 – Preserving databases: practical lessons from Archaeology (Jo Gilham, ADS)

1040 – Big data and relational data: the same but different (Nathan Cunningham, UKDA)

1110 – De-normalising data for archival preservation (Jan Rörden, University of Cologne)

1140 – Data mining for accessing archived databases (Richard Healey, University of Portsmouth)

1210 – Q&A

1220 – Panel session: Preserving big data and relational databases: – what is to be done?

1315 - Next steps and future directions

By 1330 close

About E-Ark

E-ARK is a 3-year multinational research project co-funded by the European Commission under its ICT Policy Support Programme (PSP) within its Competitiveness and Innovation Framework Programme (CIP). It is creating and piloting a pan-European methodology for electronic document archiving, synthesising existing national and international best practices that will keep records and databases authentic and usable over time. Our objective is to provide a single, scalable, robust approach capable of meeting the needs of diverse organisations, public and private, large and small, and able to support complex data types. E-ARK will demonstrate the potential benefits for public administrations, public agencies, public services, citizens and business by providing simple, efficient access to the workflows for the three main activities of an archive - acquiring, preserving and enabling re-use of information. E-ARK will run from 1st February 2014 to 31st January 2017. For more see: http://www.eark-project.com/


Scroll to top