This section highlights some useful resources recommended by our expert workshop participants.

Case studies

The case studies listed in this section provide helpful examples of how computational access techniques have been applied in practice.

  • Discovering topics and trends in the UK Government Web Archive by the Data Study Group at the Alan Turing Institute – ‘A detailed case study demonstrating a series of experiments to unlock the UK Government Web Archive for research and experimentation by approaching it as a dataset.’ Jenny Mitcham, Digital Preservation Coalition

  • Providing computational access to the Polytechnic Magazine (1879 to 1960) by Jacob Bickford – ‘A practical and accessible case study describing a project at the University of Westminster which used NLP to provide computational access to a collection of digitized material. It includes links to useful resources that helped with getting started, and the code developed is also shared so that others can make use of it.’ Jenny Mitcham, Digital Preservation Coalition

  • Datathon slides highlighting possibilities with the material by the Archives Unleashed Project – ‘A collection of slides from datathons run as part of the Archives Unleashed project, these highlight a number of computational methods that could be applied.’ Leontien Talboom, University College London

  • Using AI for digital selection in government by The National Archives (UK) – ‘Outputs from an experiment which tested and evaluated a series of tools to undertake automated classification of a dataset to predict retention labels.’ Jenny Bunn, The National Archives (UK)

  • Reflecting On a Year of Selected Datasets by Predo Gonzalez-Fernandez – ‘A useful write up of how datasets have been made available at the Library of Congress.’ Jacob Bickford, The National Archives (UK)

  • Exploring National Library of Scotland datasets with Jupyter Notebooks by Sarah Ames and Lucy Havens – ‘An interesting example using Jupyter Notebooks to explore National Library of Scotland collections in a different way.’ Leontien Talboom, University College London

  • Web Data Research blogs by the Internet Archive – ‘Multiple examples of types of use and various tools, platforms, and partnerships.’ Jefferson Bailey, Internet Archive

  • Machine Learning with Archive Collections by Jane Stevenson – ‘This blog post shows the possibilities and potential of machine learning when applying it to archival collections.’ Leontien Talboom, University College London

The case studies below were shared as part of a series of events to promote the launch of this guide and represent a range of different approaches to computational access.

Jacob Bickford, The National Archives UK -  ‘DIY’ Computational Access: the Polytechnic Magazine (1879 to 1960)

Sarah Ames, National Library of Scotland -  Collections as data at the National Library of Scotland: access, engagement, outcomes

Ian Milligan, University of Waterloo -  Providing Computational Access to Web Archives: The Archives Unleashed Project

Ryan Dubnicek and Glen Layne-Worthey, HathiTrust Research Center, University of Illinois Urbana-Champaign -  How to Read Millions of Books: The HathiTrust Digital Library and Research Center

Jefferson Bailey, Internet Archive -  Tales from the Trenches: Building Petabyte-scale Computational Research Services

Tim Sherratt, University of Canberra Living archaeologies of online collections

Examples of approaches and infrastructures

The list below provides some helpful examples for you to explore, showing how different organizations have opened up their collections for computational access using a variety of techniques:

  • SHINE – ‘This is a prototype search engine for UK Web Archive with trend analysis. It is a good example of what is possible but is also very user friendly – a great way of introducing people to looking at a digital resource in a different way.’ Jacob Bickford, The National Archives (UK)

  • GLAM Workbench – ‘A set of Jupyter notebooks, demonstrating different computational techniques. This is great for its broad range of subject matter and exposing the underlying code.’ Jacob Bickford, The National Archives (UK)

  • SolrWayback – ‘Search tools for web archives, again a great demonstration of what is possible.’ Jacob Bickford, The National Archives (UK)

  • DigHumLab – ‘A digital ecosystem that highlights a number of collections open for computational access, also includes some inspirational examples of how researchers have used these collections.’ Leontien Talboom, University College London

  • Data Foundry – ‘This is a good example of access to National Library of Scotland data collections.’ Jane Winters, University of London

  • Web Archive Datasets – ‘Datasets made available by the Library of Congress Labs including Jupyter notebooks on working with the meme collection.’ Jacob Bickford, The National Archives (UK)

  • The Cybernetics Thought Collective – ‘A digitization project that also provides a web portal and analysis engine and highlights the potential of computational methods for this type of material.’ Leontien Talboom, University College London

  • BigLAM (Libraries, Archives and Museums) - ‘This collaboration showcases how there is an interest in GLAM material in the form of data. The provided datasets also highlight what attributes may be of importance to people wanting to use this material for different computational methods.’ Leontien Talboom, University College London

Further reading

This guide has provided an introduction to computational access for beginners but of course there is no shortage of further reading on the topic. The following articles, papers and blogs have been recommended by the contributors to this guide and can be used to explore the topic in more depth:

  • The Web as History by Niels Brügger and Ralph Schroeder – ‘Highlights a number of examples that could be used to explore web archives, also discusses the broader constraints and benefits of using this type of material.’ Leontien Talboom, University College London

  • Lessons from Archives: Strategies for Collecting Sociocultural Data in Machine Learning by Eun Seo Jo and Tinnit Gebru – ‘A nice example of cross-domain exploration of what archival science/practice can offer to machine learning development – it is rare to see the relationship framed this way.’ Thomas Padilla, Center for Research Libraries

  • From archive to analysis: accessing web archives at scale through a cloud-based interface by Nick Ruest, Samantha Fritz, Ryan Deschamps al. – ‘This paper looks at how first-hand research and analysis of how researchers actually use web archives has led to the development of the Archives Unleashed Cloud – an online interface for working with web archives at scale.’ Ian Milligan, University of Waterloo

  • Responsible Operations: Data Science, Machine Learning, and AI in Libraries by Thomas Padilla – ‘This report discusses the technical, organizational, and social challenges that need to be addressed in order to embed data science, machine learning, and artificial intelligence into libraries. The recommendations within it align closely with community aspirations toward equity and ethics.’ Jenny Mitcham, Digital Preservation Coalition

  • Ensuring Scholarly Access to Government Archives and Records by William Ingram and Sylvester Johnson - "This project provides a number of examples and recommendations when applying machine learning methods for the automatic creation of metadata. This may not be the main focus of this guide, but it does give some interesting concepts and ideas around applying these methods at scale." Leontien Talboom, University College London


Scroll to top