By now you’ll have heard of The Cloud. The big amorphous space out there that is the answer to anything digital. You want more storage? You need the cloud. You want a back-up copy of all of your treasured photos? You need the cloud. You want to undertake large scale high performance number crunching? You guessed it…you need the cloud. So it’s no surprise that the cloud is featuring more and more in the cultural heritage sector too. Tate Gallery, the Parliamentary Archives and the Bodleian Library have all dipped their toes, or their heads, into cloud technology. The National Library of Scotland has also been thinking about the role of the cloud, which is essentially a service that stores and manages digital information, as part of its continuing mission to preserve the nation’s digital culture. Is the cloud the answer to all our digital problems and if it is surely there’s a price tag attached to it. To find out the National Library of Scotland is about to embark on a journey of discovery with the Edinburgh Parallel Computing Centre, National Galleries of Scotland and the Digital Preservation Coalition. It doesn’t matter if you haven’t heard of these organisations, just be assured that we are all interested in preserving digital culture for current and future generations. Our journey starts at a project called EUDAT…
EUDAT? What is that?
In 2012 a group of research centres across Europe came together, under the name of EUDAT, to improve how scientific and humanities research data could be shared and made accessible to European citizens in a more efficient, collaborative and cheaper way. Why should each of them have to invent the best way to store, find, use, share and preserve this publically funded information for future use? By 2015 EUDAT had designed a series of data tools and services that it was able to offer to newcomers to test, but would they be relevant in the wider world? To find out they invited other organisations to suggest ways to test their work. The National Library of Scotland and Edinburgh Parallel Computing Centre put in a bid with the support of the National Galleries of Scotland and Digital Preservation Coalition for a project called Cloudy Culture.
Howdy! Cloudy Culture
As you’ve probably guessed the successful idea of the Cloudy Culture team was to explore the potential of cloud services, essentially what was being offered by EUDAT, to help preserve digital cultural heritage. What you may not guess is that the National Library of Scotland has been collecting and creating a growing amount of digital culture over the last 20 years, culture that it has to preserve and make accessible. This includes millions of maps, images, websites, books, and articles. Not to mention the growing amount of digital videos and archives from authors who may have written your favourite book in Word, or its predecessors in the 70s and 80s. The same applies to the National Galleries of Scotland who as well as creating high quality digital images of the paintings you see on the walls also have a growing amount of digital art that has never been near a paintbrush.
The Cloudy Culture project will allow us to add lots of data (between 50 to 100 terabytes - which is the roughly the same as all of the information you can fit on 25 laptops) to the EUDAT cloud as an additional back-up copy. If you’ve ever lost a valuable computer document, treasured digital photo, or your phone’s contact list you will appreciate the value of having a second copy of the information. We won’t manually send EUDAT a series of USB sticks to transfer the data – we will do it automatically over the internet using a connection that encrypts the data during transfer and checks that it arrives at EUDAT without being changed or corrupted. Whether or not the internet connection is fast enough and robust enough to handle all of the files that we send is one part of our investigations.
Once the data is in the EUDAT cloud we want to take advantage of the masses of computing power which is also on offer to carry out some typical tasks that help us to preserve it – and do it at great speed. Imagine that you are putting on a party and need to bake 200 fairy cakes in your very small oven that can only take 20 fairy cakes at a time. You will be hovering around in the kitchen while each of the 10 batches bakes. Yawn! If you had 20 ovens in your kitchen you could bake all of those cakes in one go and still have time to relax before the party starts. Using lots of parallel computing power in the cloud is like using as many ovens as you want to get jobs done quicker. We’ll be using it to automatically check that the files we put into the cloud haven’t changed over time due to human error, vandalism or the wear and tear of computer storage. We’ll also be trying to find out more about the files once they’re in the cloud – how wide and tall are the pictures or videos, and what digital format are they in. If we can automate these tasks then we could use the cloud for even more tasks in the future.
Finally we’ll be doing plenty of measuring. How long does it take to transfer 50 terabytes of data over the internet? How much computing power did we use and how much would this cost if we use commercial cloud service providers? Is the cloud cheaper than traditional approaches to storage?
The start of the start and the end of the end
We are now at the very start of the project, just about to transfer digital collections from the National Library of Scotland to the EUDAT cloud, and by the end of June 2017 the project will have finished. By that time we hope to have shared 4 updates on the project with anyone who would like to read them, of which this is the first. We hope that other people can learn from the experiences of the Cloudy Culture team and better understand the potential of the cloud in helping to preserve digital culture heritage for current and future generations.
A sneaky peek for the closet geek
This is the first of four reports that the Cloudy Culture team will be writing and we’re keen not to scare people away with tech speak. For the brave hearted read on…
The National Library of Scotland works in a Windows Environment but to connect to the EUDAT services and tools we have to use Linux. To do this we have installed a virtual machine running Ubuntu 14 on a Windows PC. We have set up a folder that can be seen by both the Windows PC and the virtual Ubuntu machine so that we can move files (the digital culture) from our Windows environment and into Ubuntu to use some of the tools created or enhanced by EUDAT. The first of these tools is called iCommands and interacts with a EUDAT service called B2SAFE which essentially uses iRods to control how the files are managed. iCommands is part of iRods and so the two applications speak to each other – the iCommands at the National Library of Scotland speaks to the iRods server at the Edinburgh Parallel Computing Centre (EPCC) – who are providing the EUDAT cloud. Using iCommands and iRods, files and reports are transferred between the Library and EPCC using an option to encrypt the files as they are transferred. Encryption is chosen to help demonstrate that sensitive data can be protected during the transfer process.
To check that the files don’t change during transfer or over time we use a technique that is widely known in the digital preservation community – fixity checking. This requires a computer generated fingerprint, a checksum value, that can almost uniquely identify the contents of a file and present the contents as a short(ish) sequence of characters and numbers. By taking fresh fingerprints, fresh checksum values, for the files in the cloud you can check whether they match the original ones held in another location. If they don’t match then the file has changed and you will need to use a good copy to replace the changed copy.
By Lee Hibberd, Digital Preservation Officer, National Library of Scotland - April 2016