Matthew Addis

Matthew Addis

Last updated on 3 October 2023

Matthew Addis is the Chief Technology Officer at Arkivum.

It was great to be at iPRES 2023 in person again this year.  I was privileged to be invited onto a panel called ‘Tipping Point’ that was run by Paul Stokes and Karen Colbron from Jisc.  The panel questioned the premise that there is so much data being generated each year that we are at the point where we no longer have the ability to process it in any meaningful way, let alone curate and preserve it.  Panellists included Helen Hockx-Yu, Kate Murray, Nancy McGovern, Stephen Abrams, Tim Gollins and William Kilbride.   As you can imagine, the discussion was varied, insightful and thought provoking!  It was perhaps my favourite session at iPRES this year (other than the ever inspiring keynotes). 

For my very small part on the panel, I raised the issue of environmental sustainability and climate change, as did some of the other panellists. 

As an aside, environmental sustainability was a recurring theme of iPRES 2023 and built upon a similar thread that ran through last year’s conference.   A shout out goes to a great paper by Mikko Tiainen and colleagues from CSC on Calculating the Carbon Footprint of Digital Preservation – A Case Study and likewise a great panel from a team at the University of Illinois on The Curricular Asset Warehouse At The University Of Illinois: A Digital Archive’s Sustainability Case Study.

On the Tipping Point panel, I raised the interplay between doing digital preservation at ever larger scales and the environmental impact that this can have.  On the one hand, climate change is a threat to digital preservation.  There will be inevitable disruption and content loss from climate-change induced fires, floods, energy shortages, civil unrest, diversion of budgets away from preservation, and about every other disaster scenario you can imagine!  It’s not an if, it’s a when.   On the other hand, digital preservation is also contributor to that climate change.  Conventional digital preservation uses compute and storage servers.  These servers consume energy, they have an embodied carbon footprint, and they need buildings and infrastructure to house them, which also have a climate cost.  I’ve blogged about  this extensively in the past

William Kilbride made the very important point that the digital preservation community has much lobbying work to do to raise the issue of the impact that climate change will have on preservation of the world’s digital memory.  Climate change is a major threat.  It will take time, effort and money to mitigate this – and this has to come from somewhere.  However, it also struck me that it could be harder to lobby effectively if the preservation community doesn’t know the flip-side of the coin of what its own carbon footprint is and how that footprint can be managed and reduced. 

No one, at least not to my knowledge, has ever quantified the possible carbon footprint of doing digital preservation at a global scale for massive volumes of data. 

What would happen if we tried to preserve even a small fraction of the 181 ZettaBytes (ZB) [1] of digital data that Paul Stokes quoted from IDC as saying will be created in 2025? 

I stuck my head above the parapet on the panel.  I stated that, by my estimates, if we were to preserve just 1% of the data created by humanity each year then the carbon footprint of doing so would be 10 MegaTonnes (MT) of CO2 equivalent.  That’s about the same footprint as produced each year by a city of 2 million people, which coincidentally is about the size of Chicago which is close to where iPRES was hosted in Urbana Champaign. 

On the panel I promised that I would explain where those numbers came from – and that’s what this blog post is about.  

I’m going to take two approaches, one top down that looks at global data volumes and ICT carbon emissions, and the other bottom up, which looks at actual measurements of carbon footprint when doing real world digital preservation at scale. 

Top down.  A round number for the total volume of global data created each year is 100 Zettabytes.  Global carbon emissions are approx. 40 GigaTonnes (GT) per year.  ICT contributes more to this total than the airline industry and, depending on which estimates you read, is between 2% and 4% of the total.   Much of this comes from video content, which I guess is no surprise considering the carbon footprint of streaming services.  This means that ICT produces approx. 1 GT CO2eq per year to support humanity’s production, processing, distribution and consumption of 100 ZB of data.  If we make a big leap that processing, storing, accessing and using data when doing digital preservation results in the same level of emissions as processing data in general, then if we preserved 1% of that 100 ZB each year, then the emissions would be 10 MT of CO2eq per year.   There’s plenty of assumptions and rounded figures in this approach, but it gets us to a ballpark number.

Bottom up.  I’ve blogged before on how Arkivum has estimating actual carbon footprint from doing long-term digital preservation in the cloud, both resulting from energy consumption and also from the elephant in the room (or should that be data centre) that is the embodied footprint of all the ICT equipment involved.  If we want to preserve data at cloud scale, then the cloud is the only infrastructure that currently has the capacity to do it.  Carbon emissions vary depending on data type, data volume, processing applied, how the data is stored, and how often it is accessed and used.  Using the numbers from my previous blog posts, an average of 10 kg per TB per year to ingest, store and access data that has a mix of data types and preservation use cases is in the right ballpark.  That also comes out as 10 MT CO2 eq per year for 1 ZB of data. 

Remarkably, the top-down and bottom-up approaches come out with the same number!  I was somewhat surprised that they were even close to be honest.  That doesn’t mean it’s an accurate number – it should be treated as a finger in the air.  The number has huge error bars.  It could easily be an order of magnitude larger or smaller.  The carbon footprint will also grow over time – not just because of all the new data preserved each year, but also because of the carbon footprint of ongoing storage and access of previously ingested data.   There are technological ways that could be used in the longer term to reduce the footprint of long-term digital preservation, with DNA storage being one example.  But these simply don’t scale yet. 

It’s hard to visualise 10 MT of CO2eq carbon emissions. 

For comparison, the average carbon footprint per person on the planet is approx 5 tonnes CO2 eq per year.  That means 10 MT CO2eq each year for preserving 1 ZB of data is equivalent to the carbon emissions of approx. 2M people.  The population of Chicago is 2.7M, which is why I said that global digital preservation of 1% of the worlds data would have the same carbon footprint as a large city.   But we could look at it a different way.  Driving a car produces 280g CO2eq per mile on average.  The world population is 8 Billion.  Therefore, the footprint of preserving 1% of the world’s data is the same as everyone on the planet each driving 4.5 miles in a car.  That doesn’t seem too bad? Maybe, but everything that we can do to reduce carbon emissions, we should do.  It is important.  It does matter.   This is also an illustration of the fun and games you can have with numbers and how to bend them to deliver different messages!

It should be said that the 1% number I used as the percentage of data to preserve is entirely arbitrary.  I have no basis for this number other than it simultaneously sounds like a small fraction yet is also an extraordinarily large amount of data.  My gut feeling is that the number could, and probably should, be smaller.  But that would require some form of analysis of what data is being produced, what should be kept, why, who for, and whether the ends justify the means.  This in turn brings in return on investment and business cases that cover cultural, social, economic and operational dimensions.  This blog post is not the place to go into what is a huge area.  So for now I’ll stick with 1%.  If you think it should be more, or less, then just scale the numbers up or down.

What can we do to make our carbon footprint smaller?

The panel discussed various ways to tackle the ever-growing amount of data that humanity generates each year.   The discussion was centred on how to cope with the perennial problem of not having the staff, skills, funding or technology to cope with ever larger volumes of data.  It has always been this way, and probably always will.  But the things we can do to address the deluge of data all have the potential to reduce the carbon footprint too.   Some of the approaches discussed on the panel included:

  • We should keep less, which means developing better ways to do selection so that we better understand what we really need to preserve and what we don’t.

  • We should accept that not everything will be preserved even if we wanted to. The panel referred to this is various ways such as accepting ‘benign neglect’ and not getting trapped in a ‘moral panic’ of feeling that we are not doing enough.

  • We need to do less with content, which means being parsimonious as Tim Gollins has being saying for over a decade now.

  • We need to change expectations on access. We do not need to access everything, everywhere, all at once - and certainly not instantly!

  • We need to increase efficiencies, for example shared platforms for community archives, and reducing unnecessary duplication across institutions.

None of these are new.  However, the panel was a timely reminder that continued work is needed in these areas and that environmental sustainability adds further impetus.   As the panel and members of the audience pointed out, environmental impact is more than just carbon footprint – it includes water consumption, noise pollution, displacement of people and other adverse effects.  These need quantifying and addressing too and make the above strategies doubly important.

The threat of climate change to digital preservation

But out of all of this, and shining through all the talk about tipping points, data deluges, and carbon footprints, is that climate change is surely an existential threat to our current digital memory.  This was by far and away the most significant thing for me that came out of the panel.

Reducing the footprint of digital preservation is important for sure, but the real challenge is how digital content in our archives will survive the consequences of climate change.   It is less about the effect of digital preservation on climate change, and much more about the effect of climate change on digital preservation.

Climate adaptation and resiliency for our digital memory is surely the priority and it would be great to see that as a topic for iPRES 2024.

[1] A ZettaByte (ZB) is one thousand million TeraBytes (TB), otherwise known as ‘a lot of data’.

Scroll to top