Unlocking the UK Government Web Archive for Research Users

Also in this section
Blog Topics

Latest Comments

14 Things I Loved and Learned at iPRES 2025
- Villy Magero 5 months ago
  
  I am so proud of your work Ruby. Keep going you are destined for good things in the profession!
Archiving Facebook, Right Now
- Helena 8 months ago
  
  Thanks for sharing this Andy, it is such a useful read
The Data Recovery of it all: iPRES 2025
- Norah 6 months ago
  
  Interesting to learn that like many solutions or innovations there are a lot of adjustments to be ...

DPC Blog RSS Feed

Also in this section

Natasha Kitcher

Last updated on 30 June 2026

Dr Natasha Kitcher is Digital Researcher at The National Archives

I started at The National Archives a little over a year ago with the brief to investigate how to unlock the UK Government Web Archive’s (UKGWA) data for computational researchers. I was asked if there was a way we could enable computational access to the web archive, and what might be the best solution both for us as an institution and for researchers in general.

While I was aiming to work out how to unlock the data of web archives, I felt like we had more audiences we could explore – beyond those that are comfortable with warcs, cdxs, and other web archive quirks. Despite ‘big data’ being the starting point, I was also interested in smaller projects, and most especially in finding ways to make the web archive appealing to non-web-archive-people.

In summer I received a grant from The National Archives’ internal funding stream the Strategic Research Fund for a project called ‘Unlocking the UK Government Web Archive.’ Using this I hosted two in-person workshops, one aimed at web archive users, and one aimed at those interested in learning more.

The aim of the workshops was to teach people about the UKGWA and get them to complete three tasks to help us understand them and their needs better. The tasks were designed in collaboration with the User Research team at the archive to help us create a workshop that offered attendees the chance to learn something and enabled us to get the information we needed.

Firstly, attendees completed an observed search exercise in pairs. One participant would search the web archive, starting at a point specified by the team (this could be the web archive home page, an A-Z list of archived sites, or the main search page) and the other would make notes about how the searching participant found the experience. They would record things like the participant’s approach to dead-ends, filtering, and whether or not the ‘searcher’ found what they were looking for.

Participants reported feeling overwhelmed by the size of the archive, most especially when facing the A-Z list, but they often found ways to quickly navigate the page themselves and make the system ‘work’ for them. While we expected users to be disappointed not to find a ‘Google’-like experience when entering the search environment, they generally seemed to appreciate that a web archive was different and therefore understand that the search environment itself was also different.

Secondly, participants were put in teams of four to five people to collaboratively imagine a research project they would like to do with the web archive. They had to imagine the projects aims, key research questions, and what they would need from the web archive in order to achieve their goals. In these projects there was a common theme of change over time, with participants wanting to use the web archive to investigate both our changing relationship with the internet over time and the way different attitudes and themes have been reflected over time. Often participants expressed a desire to read specific pages closely to do this, although a minority were interested in a full text export of the entire web archive – a tall ask!

Finally, participants were presented with ten access scenario cards having been introduced in more detail to challenges we as a web archive face when engaging users. This specifically included the Takedown policy, a reclosure policy which can lead to specific assets being removed from the public side of the archive but remaining in the background data.

Image 1: Takedown policy, courtesey of Dr Natasha Kitcher

The access scenario cards were co-designed with the UKGWA team to reflect realistic activities we could undertake to increase research use of the collection. The challenge was that we cannot, of course, prioritise everything. We therefore invited participants to join us in our very real question of deciding how to decide our next steps.

Access scenario 1

Image 2: Scenario card, courtesey of Dr Natasha Kitcher

Access scenario 2

Image 3: Scenario card, courtesey of Dr Natasha Kitcher

Access scenario 3

Image 4: Scenario card, courtesy of Dr Natasha Kitcher

The top choices among all participants were changes to our search infrastructure, and a proposal for new special collections in the web archive which would showcase potential research routes and act as a jumping off point for researchers that were new to the collection. Surprisingly for us, researchers were keen to collaborate with us on these and proposed a series of guest curators for the special collections. This is an idea we are putting into practice now, with two placement students joining us at The National Archives later this year to work on their own special collections for the web archive.

The workshop has led to some real change for us as a web archive. We now know better who some of our research users could be and what they want from us in order to do more research. As a result we are shaping new outreach plans, including a new training offering and student placements. We are also doing further work internally to understand how to unlock our data, whether this be raw, text extracts, or derived datasets.

Another takeaway has been a reminder that we do not need to tackle challenges alone. Challenges like our search infrastructure, or the sensitive issue of Takedown, can easily lead to an institutional anxiety around engaging researchers in our processes – but it is far easier to develop meaningful, lasting relationships that lead to innovative and impactful research if we bring the researchers on the journey with us.

Web archives are unusual for being such ‘live’ archive environments – we collect and grow while trying to encourage the use of our collection, something traditional paper archives that have existed for centuries do not have to content with in the same way (technically our records are created for the first time when they enter the archive). It often feels more personal if there are things missing from the archive, or questions to be asked about our collection that might tie back to living people and decisions that they have made. Loss, for instance, can feel like a dirty word when the implication in web archives is that a system, or a person within that system, somehow ‘failed’ and lost an artefact.

But the liveness of the web archive is a major asset to research – especially historical study. It is a place where the researcher must face the fragility and humanity of the collection and its collector head-on. Workshops like this can enable learning for both the archive and the researcher, and we hope to take this approach forward for more user engagement with the web archive in the future.

Add comment

14 Things I Loved and Learned at iPRES 2025

Archiving Facebook, Right Now

The Data Recovery of it all: iPRES 2025

Unlocking the UK Government Web Archive for Research Users

Natasha Kitcher