Critically Endangered small

Web archiving is a recognized specialism within digital preservation, able to capture large quantities of material with routine and standards-based tools. But there are significant issues of intellectual property rights associated with website capture and republication. In many jurisdictions, but by no means all, those obstacles are overcome by regulations that enable a national library or other ‘legal deposit’ agency to copy and preserve content. Where no such permission exists, there is a significant risk of loss.

Group: Web

Trend in 2021:

Consensus Decision

Added to List: 2019

Towards Greater Risk

Previous classification: Critically Endangered

Imminence of Action

Action is recommended within twelve months, detailed assessment is a priority.

Significance of Loss

The loss of tools, data or services within this group would impact on many people and sectors.

Effort to Preserve

Loss seems likely: by the time tools or techniques have been developed the material will likely have been lost.


Domains registered without a country code; domains with a country code but weak or unenforceable legal deposit permission to harvest.

‘Practically Extinct’ in the Presence of Aggravating Conditions

Rapid churn of websites; lack of access to Internet Archive harvest; contentious content; encryption; digital rights management; non-standard content management

‘Endangered’ in the Presence of Good Practice

Permissive approach to Legal deposit;

2021 Review

This entry was added in 2019.  It is characterised by regulatory barriers rather than technical ones, though the pace of change in web technologies as well as the growth of web content mean that significant technical challenges still exist.  The 2019 Jury noted that local conditions are also a significant factor.  For example, web sites often also fall under public records legislation or are important elements of corporate records: and so important parts of the web are harvested even when there is no explicit legal deposit legislation. Moreover the Jury particularly recognizes the work of the Internet Archive to capture and preserve content.  Even so there are significant gaps in web archiving and in too many cases it is regulation that is the barrier. The 2021 Jury agreed with this description and classification but added that in in some limited instances, pywb tools (as opposed to automated web crawlers like Heritrix) can effectively capture the look and feel of a platform interface, preserving legacy versions for users to interact with in the future. However, pywb tools are manual and therefore cannot address the scale of this issue. They also do not capture interfaces in a way that makes it possible to recreate them in the future, only interact with a defined set of web pages. For this growing issue of scale, the 2021 trend is towards greater risk.

Additional Jury Comments

Unless the Internet Archive is picking these up, the early web or permission regimes are in place, and these early instances are gone forever and will continue to be lost. 

Scroll to top