Blog

Unless otherwise stated, content is shared under CC-BY-NC Licence

Nothing About Us Without Us

William Kilbride

William Kilbride

Last updated on 20 January 2020

I was asked recently to sketch out some thoughts about archives and artificial intelligence. I am drawn to the topic as usual but with little real clue of where to start, so my point of departure is a comment on ethics. I have no real mandate to frame the ethical tone for what should be a very important debate, but if we don’t start here – if we put technology first – then there’s every possibility that we will end in the wrong place, either through sterile solutionizing, or worse by selling the whole farm to obscure, unaccountable and deeply unattractive corporate interests.

So, my challenge is whether the methods and approaches of digital preservation can help build an artificial intelligence that works for the people, with the people and perhaps even by the people?

It seems like an awfully remote prospect. The large technology companies have mostly escaped the social contract of an earlier generation of capitalists, with incredibly large client bases and incredibly large net-worth based on remarkably small workforces. There are good reasons why the rest of us might feel estranged and apprehensive about the ways artificial intelligence could (and already does) constrain, shape and control our daily lives. And nor for that matter have the tech giants covered themselves in glory. Instead they have engineered dramatic and secretive incursions that all too often disregard the security and well-being of many thousands – at times many millions – of their users. Artificial intelligence seems to be applied in business and corporate systems to us, and about us, but seldom with us. That leads me to adopt the old anti-colonial slogan ‘Nothing about us without us’. It’s an old adage. But it’s hard to think of a more appropriate counterproposition to how AI is perceived.

So far, so ranty. This is a digital preservation blog, so a digital preservation theme has to emerge sooner or later.  It seems to me that there are four ways in which we intersect with artificial intelligence:

  • We can use artificial intelligence to do digital preservation better. That’s coming into focus slowly but noticeably.
  • We can become better at capturing, receiving and arranging the inputs and outcomes of artificial intelligence, knowing and documenting the variables in play and the dependencies that exist between them.
  • We can take steps to preserve artificial intelligence at a systemic level. That’s something we’ve barely even begun to think about. We’re used to worrying about data and our usual complaint is that as the amount of data grows, so the job gets larger. We’ve largely forgotten that the quantities of data mean that ranking algorithms and personalization of view are more important to the shaping of public discourse than the content that they purvey (DPC 2019, 86)
  • And we can disrupt or take over artificial intelligence - creating, monitoring and maintaining AI and developing the kinds of services on which it depends.

All of these are possible but it’s the last one that has my attention. Instead of fearing change can we be the change?

I am going to make some lazy assumptions about artificial intelligence. For example, I am going to focus on an important developmental subset of AI which is increasingly ubiquitous: big data analytics (for a readable introduction see Mayer-Schonberger and Cukier 2013). This, for better or worse is how most of us have used, and have been used by artificial intelligence. What does our experience of big data analytics indicate about the operation of artificial intelligence and the role of digital preservation there?

There’s a clue in the name: big data analytics depends on the existence and maintenance of big data. How should we conceptualize this big data? How does it arise? It comes from the digitization of just about every aspect of life. This is one of the critical developments of the last few years and it’s easy to become desensitized to the repeated and ever-more-frenzied statements about data growth. Let me be clear that I am not talking about the 2d digitization that replicates paper records as digital surrogates. The creation of digital surrogates is slow and expensive, and my point is to underline how little digitized content there is compared to how much born digital content exists and is created every day.

You can’t describe this digitization without acknowledging the incursion of technology into our daily lives and how that gets turned into data points of one form or another. This data is most often about us, but mostly it’s not for us: it mostly benefits those with corporate interests (Zuboff 2019). I am thinking about internet-enabled beds which will report how well you are sleeping; or vacuum cleaners that report the cleanliness of your house. I am thinking about phones that know how far you walked today (every day); and how that compares to the calorie count from your Uber-eats order; and which might be selling that data to your insurance broker so he or she can sell you a pension you won’t live long enough to earn. Privacy is the only luxury that most of us can no longer afford (Lovink 2019). It’s almost impossible to avoid being someone else’s behavioural surplus.

Underpinning this digitization is a process of recombinant innovation. The deep story of the digital age is not really about innovation: it’s about infinite recombinations of existing innovations with infinitely expanding data points: one small innovation is combined with another to create a third innovation, and so on ad infinitum (Brynjolffson and McAfee 2014). The opportunities are vast and depend on being able to crunch ever greater quantities of data. The more data, the more combinations are possible. That’s because alongside the growth of data, and, if anything even more astonishing, is the seemingly limitless acceleration of processors.

So, big data analytics tell us that more data and faster processors are the building blocks for AI; that we cannot really escape being bought and sold in the market place of surveillance capitalism; and that because innovation is fundamentally combinative there is no practical limit to how it is likely to expand.

This is where I start to part company with the promise of big data analytics and by extension AI.  I spy four problems.

Firstly, if your digital preservation klaxon hasn’t gone off already then you need to change its batteries. Data frequently goes dark faster than it can be used. We can barely get PDFs to work through two generations. We know that, but if someone could tell the Financial Times or the Wall Street journal then I’d be very grateful. How then can we generate meaningful time series data at anything resembling the real world where change is slow and incremental? If AI is built on data, then it’s built on sand.

Secondly, and related, David Rosenthal has been telling us for many years now to beware the rising costs of data storage (Rosenthal 2012). We have grown up with an inter-generational price crash of data storage and may now be blind to the possibility that it will not be the historic norm. A point will come, in fact it has arrived already, where the price crash will end. If storage costs level-out, then our consumption habits will have to change rapidly too. Otherwise the blunt trauma of economics will change them for us. One can already see chaotic processes of data loss in which the haphazard exigencies of business failure decide what digital legacies our children can enjoy.

Also, remember that miniaturization is finite. All the computing that’s ever impressed us is about electrons passing through transistors. By all means you can increase computing power by cramming ever larger numbers of ever smaller transistors onto silicon wafers. But there’s an end point to this and it’s already in sight. The transistor was invented in 1957; and just because we’ve only ever known computing power to have been increasing ever since it doesn’t follow that this will be the historical norm. You cannot change the laws of physics: you cannot miniaturize the electron.

I promised I would relate this to archives. Here we go.

The proponents of big data analytics argue for more-better data and more-better computing: keep everything because everything has potential, everything will be useful, storage is cheap, and superabundance is an opportunity. Don’t bother cataloguing because we’ll get around to that with super-duper new technologies. Metadata shmetadata. Archivists demur: less is more, appraisal empowers use, storage is expensive, overabundance is a risk. Cataloguing is critical. Metadata matters. It seems to me that, for the reasons given a moment ago, archivists have history on their side. Don’t even get me started on global warming.

That would be a nice conclusion when speaking to a digital preservation audience. But it also leads me to direct two further challenges which may be altogether more uncomfortable.

Firstly, for digital preservation: where are the journal articles and conference papers, not to mention the techniques and tools which enable archival appraisal of digital materials at scale? Maybe I missed them. Reading the digital preservation literature, it’s hard to know what we can afford to keep and what we cannot afford to lose
If my hunches about technology are right, then we need to get a move on. Currently preservation seems to be a binary state: either something is preserved, or it is not; a repository is trusted, or it is not. There is an urgent need for the digital preservation community to face up to the suffocating proliferation of digital materials and respond with much more subtlety. That means better capability to identify the digital objects that matter, and more nuanced intentions for everything else. If we can identify and prioritize high value collections, we can set equally high expectations about how much effort it’s going to take to preserve them. By extension that means we are also identifying those things which are not needed and would do well to delete. If we do not dispose of something, we will not be able to preserve anything. And somewhere between these two extremes, perhaps we can begin to envisage a middle ground, where less intervention and best endeavours enable planned, perhaps even graceful decay. ‘Just in case’ is not a bad argument; but we can’t let it consume all our resources.

Secondly, and this is where the heat-ray falls especially on archivists: appraisal in the digital age isn’t easy but it’s vital. In an era of post-truth obfuscation and sinister deletion, the ability to collect, retain and authenticate is suddenly a super-power. In an era of relentless proliferation, the confidence to select and consolidate, with implied permission to relegate and de-duplicate, is ubiquitously essential. In an era where data is the ‘new oil’ of the ‘information society’, we hold the keys not only to the past, but now also to the future.

One would have thought that this generation more than any other would be the age of the archivist, a continuing proof of common cause for the common good. I know it doesn’t feel that way. I wish it did, but if archival appraisal practice was more cleverly embedded in digital preservation tools; if together we can codify and express our expectations and assumptions about archival significance; if in our generation we can remake the tools and techniques that have served the cause of truth and authenticity for centuries, then perhaps there’s a chance.

On one hand, if you want to know how to control and influence artificial intelligence, then you could do a lot worse than use the tools of artificial intelligence. That will both secure and transform the archival profession. Let’s use the weight of the tree to ensure it falls in our favour.

And there’s no appraisal without values (Caswell 2019). We might pretend that our role is the humble and neutral functionary of an objective record, but I don’t buy that. If we get this right, then we’ll have an artificial intelligence that embeds the value judgements and the ethics that we bring to the table through our appraisal processes.

So, let’s get appraisal and selection to the top of our shared research agenda in digital preservation. In our sights, and in our hands: artificial intelligence by the people, for the people, with the people.

Nothing about us without us.


References

Brynjolffson E and McAfee A 2014 The Second Machine Age: Work Progress and Prosperity in a time of Brilliant Technologies, Norton, New York

Caswell M 2019 Whose Digital Preservation? Locating our standpoints to reallocate resources, in iPRES2019, 16th International Conference on Digital Preservation, M. Ras and B. Sierman, Eds., Amsterdam, 2019. [Online]. Available: https://vimeo.com/362491244/b934a7afad

DPC 2019, The BitList 2019: The Global List of Digitally Endangered Species, Second Edition, online at: http://doi.org/10.7207/DPCBitList19-01

Lovink, G 2019 Sad By Design: On Platform Nihilism, Pluto Press, London

Mayer Schonberger V and Cukier K 2013 Big Data: A revolution that will Transform How we Live Work and Think, John Murray, London

Rosenthal, D 2012 Storage Will Be A Lot Less Free Than It Used To Be in DSHR Blog online at: https://blog.dshr.org/2012/10/storage-will-be-lot-less-free-than-it.html

Zuboff, S 2019 The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power, Profile, London

 

Acknowledgements


I am grateful to Lise Jaillant who provoked me to consider the issues of archives and artificial intelligence and to Sarah Middleton, Jen Mitcham, Sharon McMeekin and Paul Wheatley who commented on an earlier draft of this paper.

 

Read More

Making a list, checking it twice…Migrating a digital national archive to a new storage infrastructure

Garth Stewart

Garth Stewart

Last updated on 10 January 2020

Garth Stewart is Head of Digital Records Unit at National Records of Scotland


Anyone who has ever moved home can probably agree that it is at once a very exciting, yet stressful experience. Fitting your personal belongings into cardboard boxes can be a real mission; delivery vans can sometimes turn up at the wrong address, or not at all; and once you do manage to transport everything across town and country to your new gaff and unpack everything, inevitably something goes missing in transit. In short, moving big collections of stuff significantly increases the risk of loss.

Read More

How are we meant to do it?

Helen Shalders

Helen Shalders

Last updated on 19 December 2019

Helen Shalders is Digital Archives and Cataloguing Manager at Historic England


The Historic England digital archive, which forms part of the Historic England Archive (HEA), holds 60TB of data, predominately images in TIF format but also, PDFa, shape, wav and mp3 and some more obscure specialist formats.  We ingest around 100 thousand files per year which is around 5TB. What we hold represents a national data set, and the content has usage potential well beyond the heritage sector. We have recently moved our Archive to the cloud, with mixed results and we use Extensis Portfolio as our platform of choice as well as a plethora of spread sheets to manage our holdings. Digital material for which appropriate rights are held is available to view via our website (archive.historicengland.org.uk). We have just commissioned Golant Innovation to work with us on developing a DAM proposal and business model.

Read More

Felis ADSus: herding CATS and improving workflows. The Archaeology Data Service’ CATS week

Ray Moore

Ray Moore

Last updated on 20 December 2019

Ray Moore is Digital Archivist at the Archaeology Data Service


Felis ADSus, a breed rarely seen beyond their natural habitat in the King’s Manor (York), were enticed from their lair into the wider world for their annual CATS (Curatorial And Technical Staff) week in September. With the continued support of the ADS director and management team, CATS week has become a feature in the ADS calendar in recent years allowing digital archivists to take time away from their daily activities to work on focused tasks and have those in-depth conversations about process, metadata and formats. The ‘catnip’ for any discerning digital archivist.

Read More

WDPD: Reflections and Ripples

Sarah Middleton

Sarah Middleton

Last updated on 6 December 2019

Now that the dust has settled after World Digital Preservation Day (WDPD) on 7th November and I have finished travelling around the country for the year (I think), I have had a chance to pause and reflect on what was - quite frankly - another stupendous outpouring of digital preservation community goodness!

Unlike last year when we were in Amsterdam for the Memory Makers Conference and Digital Preservation Awards, I was on home turf in York primed and looking forward to remaining glued to my tweetdeck for a good 36 hours. I was relishing the fact that I could quite literally binge on whatever WDPD was going to throw my way, with no distractions!

And my word, did WDPD throw us digital preservation delights by the bucketload!

Read More

Ready, steady, sprint….or how to write a policy toolkit in 3 days

Jenny Mitcham

Jenny Mitcham

Last updated on 2 December 2019

Many years ago I ran a half marathon in Bristol but running a book sprint there was an entirely different proposition.

It could be argued however that both were exhausting and rewarding in equal measure!

Last week, DPC staff joined with colleagues from the University of Bristol and a small group of invited experts to work on a new resource for DPC Members - a Digital Preservation Policy Toolkit.

Read More

iPres 2019: Preserving the people in digital preservation

Elisabeth Thurlow

Elisabeth Thurlow

Last updated on 27 November 2019

Elisabeth Thurlow is Digital Archives and Collections Implementation Manager at the University of the Arts London. She attended iPres2019 with support from the DPC's Career Development Fund which is generously funded by DPC supporters.


A recurrent theme across many of the papers presented at this year’s iPres conference was the important role of people in digital preservation. Technology tends to dominate conversations around digital preservation, but for digital preservation to ultimately work, we need people too.

Read More

Capturing Cultural Transformation: an update on the Hull 2017 City of Culture digital archive

Laura Giles

Laura Giles

Last updated on 22 November 2019

Laura Giles is City of Culture Digital Archivist at The University of Hull


Back in October 2017 we at the University of Hull blogged about the early stages of our plan to archive the Hull 2017 City of Culture. The project was in its infancy then so we’re keen now to share an update of where this journey has taken us since.

The idea for the Hull City of Culture Digital Archive was conceived shortly after the announcement in November 2013 that Hull was to be the holder of the 2017 title of UK City of Culture. Knowing that 2017 had the potential to be a complete game-changer for Hull, it was seen as crucial to capture a historical record of the year. There was a strong desire to document this time to guarantee that decisions made, works created, residents engaged, visitors attracted and money spent were chronicled and accessible to researchers in the future.

Read More

Introducing the new NDSA Levels of Preservation

Jenny Mitcham

Jenny Mitcham

Last updated on 14 November 2019

Since August 2018 I have been involved in an ambitious international effort to revise the NDSA Levels of Preservation.

When I first joined the revision group I was working as a digital archivist at the Borthwick Institute for Archives.

I was a digital archivist who very much appreciated the NDSA Levels and had used them frequently to measure progress and to communicate with colleagues. The rumours that I had them printed out and pinned up above my desk are indeed true. I believe I was what you might call an NDSA Levels of Preservation ‘Super Fan’.

I joined the group because I had highlighted (for example in this blog post) some areas where I was unsure how to apply them or felt they could be subject to slightly different interpretation.

Read More

Finding the Cutting Edge in Common Formats

Elizabeth Kata

Elizabeth Kata

Last updated on 11 November 2019

Elizabeth Kata is Digital Archives Assistant at the International Atomic Engergy Agency (IAEA). She attended iPres2019 with support from the DPC's Career Development Fund which is generously funded by DPC supporters.


Placing a session with the title “Common Formats” under the theme “Cutting Edge” seemed at first contradictory as I looked over the iPRES 2019 program, but the four papers presented in this session demonstrated cutting edge work being done with and to preserve common formats, from data tape recovery to PDF/A analysis. And read more to see what upcoming actions this session inspired! 

Read More

Scroll to top