Paul Gooding

Paul Gooding

Last updated on 26 April 2021

Paul Gooding is Senior Lecturer in Information Studies at the University of Glasgow. 

Thaeoliane AEOLIAN (Artificial Intelligence for Cultural Organisations) Network, funded under the “AHRC and NEH New Directions for Digital Scholarship in Cultural Institutions” scheme, aims to bring together scholars, computer scientists, archivists and digital preservationists to investigate the role that Artificial Intelligence can play to in making digital cultural records more accessible to users (and If you’re wondering about the awful pun in the title, the Aeolian harp is a musical instrument that is played by the wind. No, I’m not sorry…). As we prepare for our first workshop (the call for participation is out now), I hope that many of you will consider joining us over the two year project duration. 

The Santa Barbara Statement on Collections as Data has been highly influential in shaping an ethics of curation in heritage organisations (plug alert: the Collections as Data lead, Thomas Padilla, will be on keynote duties at the first AEOLIAN workshop…). In my own research, questions around the application of AI and machine learning are a recurring theme. As the following examples show, though, those questions just as often end up being ethical and political in nature as technical and technological.

First, what happens when the politics of access define the extent of usage? From 2017 to 2019, I led the Digital Library Futures project, where we attempted to explore the impact of the UK non-print legal deposit regulations upon academic deposit libraries and their users. These regulations were developed around five years before the UK copyright exception was introduced to allow non-commercial text and data mining, and include access protocols that restrict usage to a narrow range of scenarios outlined in the regulations. Through interviews with library practitioners, we found there was concern that “for as long as we can’t download a large dataset derived from the legal deposit collection, only certain kinds of research will be possible using this material, and it will be small-scale qualitative research.”

Legal deposit regulations reflect a complex landscape of often conflicting rights and priorities, balanced between publishers, libraries, and researchers. But what does it mean for each of these groups when access to materials is shaped not by what is allowable under copyright, but what is acceptable to the most dominant voice in the room? Where should we draw the line in allowing computational access, and who should be responsible for providing such services? And how might we understand the politics of access that have led to the divergence of copyright law and legal deposit regulations in the area of text and data mining?

Second, what does it mean for something to be truly global in nature? From 2019 to 2020, I was part of the Global Digitisation Dataset network, which set out to create a global dataset of digitised texts. One of our key tasks was to match and aggregate bibliographic records from different sources, namely HathiTrust, the National Library of Scotland, the British Library, and the National Library of Wales. We applied machine learning techniques to experiment with methods for matching bibliographic records, to see how feasible it was to identify identical items across different cataloguing standards. The subsequent aggregated dataset, comprising approximately 17.5 million items, was aggregated and made available via the National Library of Scotland Digital Foundry – since then, it has formed the basis of the Open Texts initiative, which aims to provide a discovery platform for digitised collections from around the world.

As Natalie Fulkerson’s blogpost on the data matching explains, the difficulties of duplicate protection certainly demand further research, but for me the standout questions are cultural and ethical as well as technical. What does it mean, for instance, for a discovery platform to be truly global? How do we overcome the challenge of diversification (of standards, of languages, and of cataloguing practices) that a truly representative resource would require? And what does it mean for a global resource to be both comprehensive and representative? As with the example of legal deposit, these are research questions about the ethical and philosophical underpinnings of AI-enhanced access to cultural collections.

Before mention a little bit more about the project, I’d like to wrap up by noting that as the scale of our digital cultural collections increase, and as computational methods permeate curatorial and research practices, it is the intersection of the technological, the ethical, and the political where further work is required. These questions require input from researchers, information professionals, computer scientists, and digital preservationists. It’s my hope that AEOLIAN will provide a cross-sectoral forum for defining the application of AI in relation to our professional values, rather than the other way round.

Finally, we’d love to involve as many people as possible in the network – to that end, please do consider joining us at one of our events. The first AEOLIAN workshop will take place online on 7th July 2021 from 12:00 to 17:30 GMT. Please note that the number of guests able to attend is limited, and there is an application process. Full details are available at the AEOLIAN website. You can also follow us on Twitter (@AeolianNetwork) for all the latest news from the project!

Scroll to top