Sara Day Thomson

Sara Day Thomson

Last updated on 30 June 2017


From the 12th to the 16th June, the member institutions of IIPC and researchers from a surprising array of backgrounds gathered at the Senate House in London for Web Archiving Week – an amalgamation of the Archives Unleashed hackathon, the annual IIPC conference, and the ReSAW Conference. The week’s activities conveyed a healthy and vibrant picture of current capture and curation practices as well as the use of archived web content for research.


Talks and demonstrations highlighted the advances made by institutional web archiving programmes and services, from integrated access for NYARC to better metadata for discovery from the OCLC Research Library Partnership Web Archiving Metadata Working Group.



The programme also included new updates on the development of tools for curators and researchers as well as the creative use of existing tools. The Library Innovation Lab at Harvard demo-ed, a shiny new tool for comparing an archived version of a website to its live counterpart. And Steven M. Schneider from SUNY Polytechnic Institute walked us through the use of Tiddlywiki for scholarly analysis of archived web objects.



The diversity of research at the conference revealed an impressive and compelling range of ways that the study of the web provides a critical approach to almost every discipline. And, correspondingly, institutions have shown a new commitment to understanding user needs and building web archive collections researchers can use. Notably the user needs project at Parliamentary Archives carried out with the help of Peter Webster.



Anat Ben-David from The Open University of Israel presented on the DNS leaks from North Korea in 2016. As part of her research, Ben-David showed the disparity of access to North Korean websites from different nations. The United States (from where the Internet Archive crawls its collections) is only able to return a fraction of the North Korean webpages accessible in Russia. Even Europe has greater access to the highly secretive .kp domain. This finding makes a persuasive case for national web archives to share resources.



Nicholas Taylor from Stanford University Libraries gave an illuminating overview of the court cases where web content (from the Internet Archive) has been introduced as evidence.  Taylor reported that since the first uses in 2004, courts have come to generally accept Wayback Machine records as evidence. This research goes to show that questions about the authenticity and reliability of archived web content reaches far beyond the archive and library professions.



These research projects and many others reveal a growing interest in the study of web archives as an entire corpus rather than focusing on individual web objects. The volume of content accessible through web archives (especially national collections) has created a new resource for data analysis and longitudinal studies of trends over time. Institutions have begun to re-think their role as custodians and begun addressing the need to provide access to tools and services that enable users to access web archives as datasets, rather than ‘one at a time’.  The UK Web Archive’s Jason Webber led a workshop on SHINE, a great example of a free, easy-to-use tool using the JISC UK Web Domain Dataset (1996-2010) that anyone can play with online. Even if you’re not a web researcher (yet!) and are just looking for a party trick this weekend, have a look at the Trends function. Some of us may or may not have analysed the use of our own name, place of work, hometown, alma mater, favourite band, well you get the idea…



If you want to learn more about my talk on applying principles of digital preservation to social media archiving, head over the SAS blog to see mine (and others’) full conference papers

 Or if you want to version with pictures, here are my slides .


To sum up my experience, over the two days I was able to attend, I detected a very inspiring and encouraging theme surfacing across many of the talks and discussions. Archiving institutions and researchers alike expressed an escalating need to approach the web as it comes, rather than trying to flatten it to fit established research models and collecting practices. By trying to mould archived web resources into something that can be inserted into our current systems and ways of working, we distort meanings inherent to web technologies, platforms, and content. By doing so, we restrict the full breadth of knowledge that might be gleaned from the study of the web. If I take forward one lesson in my approach to archiving web content, especially social media and user-generated content, it is that we preserve to innovate. By ‘preserve to innovate’ I mean developing approaches to archiving and curating that reflect user needs and don’t interfere with the full range of possible uses. Our aim as archivists, librarians, curators, and tool developers is to maximise the potential investigations and experiments undertaken by researchers and journalists and policy-makers and all future users.

Web Archiving Week 2017 was a roller coaster of inspiring ideas and glimpses into the progress of web collecting and web research. Thank you to the organisers and all the wonderful speakers and participants. Until next time!



Web Archiving Week was a big, sprawling conference – which goes to show the growth and popularity of web archiving and web research, but it also means I didn’t make it to every session. To learn more about the diverse range of talks and workshops, have a look at these other summaries:


If you have a conference summary not listed above, please add it to the comments below!

Scroll to top