Phil Clegg

Phil Clegg

Last updated on 25 July 2018

Phil Clegg is Co-founder and Chief Technology Officer of MirrorWeb

We were recently invited to join the first episode of the Digital Preservation Futures webinar series by the DPC.

The webinar provided a valuable opportunity for us to learn more about the digital archiving community. We were also able to share what we have learned from projects we’ve worked on across the public and private sectors, as well as what our customers look for in a digital archiving and preservation solution.

In our experience, there are four key capabilities our customers seek from a digital archiving solution. In collaboration with our partners at TNA, we developed a set of procedures we call C.A.P.S:

  • Capture
  • Access
  • Preserve
  • Search


Capture refers to the collection of digital information from the varied sources of information online, including websites and social media channels, into one central portal.

Within the community, we all understand what digital preservation is and isn’t, but often we forget that digital preservation also enables institutions to maintain much more than discrete online documents or objects. Collecting contextual information around ‘a’ document or object helps organisations capture associated social, political, and other relevant circumstances at a specific moment in time. Web and social media archives can provide this context.

This is equally useful for:

  • local and national government
  • brands seeking to store their digital legacy
  • the finance industry
  • educational institutions
  • museums and libraries

Capturing public engagement with a social media post, campaign or hashtag can help organisations understand how the public perceived and responded to a particular initiative or event and inform future decision-making or brand positioning.

During events such as festivals, for example, councils can gather intelligence to understand how people travel into the area, where pressure points on the local transport network occur, and where people eat and drink. That insight can aid resource and infrastructure planning in the future.


Having access to contextual digital information allows users to replay that content with greater confidence that the archived version represents the object as closely as possible to its original form. For social media this means interpreting accompanying metadata to re-generate interactions in a form that resembles the original. For websites, this is replaying the archived site, so users can browse content much the same as it was originally hosted.

Imagine the average company website - it won’t just be words on a page, it will include content such as:

  • interactive quizzes and polls
  • video
  • links to documents and external sites
  • infographics

This online content is as valuable to a company in the future as its privacy policy or product information page, and likely more expensive to reproduce, and should also be accessible.


The average life of a webpage is 90 days. Not only does digital archiving ensure that that it will remain accessible for as long as needed, but also that it will remain usable. Possessing a reliable, useable long-term record of an online history provides evidence of an organisation’s actions. This record is useful for research and for process review, but also can assist with compliance issues or legal disputes.

This protection may be more important for some sectors than others, but most institutions have a requirement to archive corporate web content. Now that universities are subject to Advertising Standards Authority rules, for example, they are increasingly aware of  the importance of being able to prove what was said in a previous prospectus. In response, MirrorWeb has been able to extend their expertise in archiving web content for regulatory purposes to institutions new to these rules and requirements. Each year a course will change slightly, a webpage will be updated, and the exact version of a digital prospectus that existed for one cohort of students will be lost. Preserving the webpages which contain this information allows universities to access them at any point in the future and can act as proof of delivery of the promised course.


The ability to search and create a full-text index of digital content enables the presentation of digital information in a multi-faceted search interface. This Google-like search functionality allows for instant retrieval of data.

This function is particularly important for larger public memory institutions, which hold vast amounts of data and provide public facing portals with search. Indexing the data, and making it searchable, means that relevant digital information can be identified quickly, and filtered to show only results between certain dates or within a specific website.

C.A.P.S is a simple way of summarizing a very complex process, but it is one that provides a useful framework for preserving web content for a broad range of institution types. We are happy to talk to any institutions who would like to discuss further how these processes might work for them.

You can watch our DPC webinar here and for more information on how we apply C.A.P.S  to web and social media archiving, visit our website:

Scroll to top