IMPORTANT - All accounts have been temporarily locked whilst we complete an important software update on the website. Sorry for any inconvenience.
Sara Day Thomson

Sara Day Thomson

Last updated on 26 April 2023

Sara Day Thomson is the Digital Archivist at the University of Edinburgh Library as well as a member of the DPC's Web Archiving & Preservation Work Group (WAPWG) Steering Committee. This blog was first posted on the Universtiy of Edinburgh's Website and Communications blog


The University of Edinburgh has appointed its first Web Archivist to capture and preserve important University web pages. The University has in fact appointed the very first dedicated Web Archivist of any non-legal deposit university in the UK!

Since early 2022, the University Library has been archiving health-related websites as part of the Wellcome-funded Archive of Tomorrow project. Along with the National Library of Scotland, Cambridge University Library, and Bodleian Libraries, we’ve been building a ‘research-ready’ collection of materials to support current and future research into how information (and misinformation) about health is shared online. As the project draws to a close this month, the team reflects on the achievements of the 12-month pilot as well as the daunting challenges it helped to articulate and bring to light.

First things first – archiving websites? Yes!

Information on the Web disappears at a rapid pace, even more rapidly than other fragile born-digital formats. In a paper from 2015, the UK Web Archive reported that in just two years, 40% of websites collected in the national web archive had disappeared. A 2014 study by Harvard Law School discovered that more than 50% of links in US Supreme Court opinions don’t work.

Websites related to health have an even shorter lifespan, as government and other authorities update their guidance and delete or overwrite old information. Content shared on social media platforms evolves even more rapidly as trends rise and fall.

Building on local initiatives to ‘capture COVID’, the Archive of Tomorrow project has helped to preserve vital web and social media content related to health from 2019-present. As a priority, the project has also focussed on making archived websites more accessible and usable for researchers. In such a short time, however, the project encountered far more challenges than it could solve. Namely, the project could not overcome the wider lack of awareness of web archives among researchers and the wider public. This low awareness could be attributed to the tight access restrictions on some web archives. It could also be a result of relatively low engagement from institutions themselves in taking action to preserve their own important websites, the University of Edinburgh included.

While the slightly better known Wayback Machine (of the Internet Archive) has been archiving ed.ac.uk since 1997, many pages deeper in the website (below the homepage) have not been captured.

Wayback Error Message for Missing Web Page

In more recent years, the web technology used to support the University’s website has outstripped the tools to archive it, leaving many vital resources broken or simply missing from the University’s historical record. Many University web pages for research projects, publications, and community activities have vanished.

The University Library’s involvement in the Archive of Tomorrow project has facilitated a close collaboration with the UK’s most experienced web archivists and technicians. It also provided resource for a part-time Project Web Archivist based here at the University of Edinburgh in Heritage Collections. Perhaps most importantly, this influx of resource and support from Wellcome and project partners has provided the capacity to better engage our local research communities with a stake in web archiving.

This experience made it evident that the University required its own programme of web archiving to support the many communities who share information on its website.

Website Communications and Heritage Collections have been working closely together in recent months to build a Web Lifecycle Management programme. Together, these teams aim to ensure that important web pages are identified, archived to a high quality, and removed from the live web when they are no longer current or supportable on new technology.

To deliver this new programme of work, we appointed Alice Austin as the Web Archivist for the University’s web estate. As the Web team migrates the website and services to a new platform, this work is more important than ever to make sure we don’t lose important web pages with long-term value.

Alice, whose contract currently ends at the end of July, has been working with key content creators across the University to identify web pages that need to be preserved for longer than the typical life span of a web page (which may be shorter than you think!). The primary approach for archiving the University’s web pages relies on a partnership with the UK Web Archive, the UK’s legal deposit web archive. As a UK institution, the University of Edinburgh’s web pages fit squarely into the UKWA’s legal mandate (more about that here).

With Alice’s help, archived copies of the University’s web pages have gone from this…

UK Web Archive Capture with Missing Formatting

To this…

UK Web Archive Capture with Fixed Formatting

However, the University publishes many web pages that cannot be archived by the UK Web Archive for legal or technical reasons, leaving a gap for exploring mixed approaches.

So there’s still a lot of exciting work to do!

Bitmoji Avatar of Web Archivist and GlobeTo support communities around the University to capture their web pages and social media, we need your help. Have a look at the guidance on Web Archiving at the University of Edinburgh and keep your eyes peeled for workshops and training! Please get in touch about web pages that need to be archived. It is especially important to let us know about:

  • Web pages not on the University’s web platform (not on ed.ac.uk or subdomain)

  • Web pages that contain unique information not published or stored anywhere else

  • Web pages about to be taken down or significantly updated or changed

  • Web pages with a significant amount of dynamic or interactive content, like embedded video

Through this initiative, Website Communications and Heritage Collections are taking the first steps towards a sustainable, robust web estate. Web archiving will support better compliance and information governance across the University. Ultimately, these archives will document the decisions and actions taken in the 20th and 21st centuries, allowing future generations to understand the history and transformation of the University of Edinburgh over time.


Links

For more information about Web Archiving at the University of Edinburgh, read our guidance: https://www.ed.ac.uk/information-services/library-museum-gallery/crc/collections/archives/digital-archives-and-preservation/what-is-web-archiving

For more about the Archive of Tomorrow Project, see the project website: https://www.webarchive.org.uk/wayback/archive/20230311105420/https://www.nls.uk/about-us/working-with-others/archive-of-tomorrow/


Scroll to top