![]() |
||
Web archiving is a recognized specialism within digital preservation, able to capture large quantities of material with routine and standards-based tools. But there are significant issues of intellectual property rights associated with website capture and republication. In many jurisdictions, but by no means all, those obstacles are overcome by regulations that enable a national library or other ‘legal deposit’ agency to copy and preserve content. Where no such permission exists, there is a significant risk of loss. |
||
Group: Web |
Trend in 2021: |
Consensus Decision |
Added to List: 2019 |
|
Previous classification: Critically Endangered |
Trend in 2022: |
||
|
||
Imminence of Action Action is recommended within twelve months, detailed assessment is a priority. |
Significance of Loss The loss of tools, data or services within this group would impact on many people and sectors. |
Effort to Preserve Loss seems likely: by the time tools or techniques have been developed the material will likely have been lost. |
Examples Domains registered without a country code; domains with a country code but weak or unenforceable legal deposit permission to harvest. |
||
‘Practically Extinct’ in the Presence of Aggravating Conditions Rapid churn of websites; lack of access to Internet Archive harvest; contentious content; encryption; digital rights management; non-standard content management |
||
‘Endangered’ in the Presence of Good Practice Permissive approach to Legal deposit; |
||
2021 Jury Review This entry was added in 2019. It is characterized by regulatory barriers rather than technical ones, though the pace of change in web technologies, as well as the growth of web content, mean that significant technical challenges still exist. The 2019 Jury noted that local conditions are also a significant factor. For example, websites often also fall under public records legislation or are important elements of corporate records: and so important parts of the web are harvested even when there is no explicit legal deposit legislation. Moreover, the Jury particularly recognizes the work of the Internet Archive to capture and preserve content. Even so, there are significant gaps in web archiving, and in too many cases, it is regulation that is the barrier. The 2021 Jury agreed with this description and classification but added that in some limited instances, pywb tools (as opposed to automated web crawlers like Heritrix) could effectively capture the look and feel of a platform interface, preserving legacy versions for users to interact with in the future. However, pywb tools are manual and therefore cannot address the scale of this issue. They also do not capture interfaces in a way that makes it possible to recreate them in the future, only interact with a defined set of web pages. For this growing issue of scale, the 2021 Trend was towards greater risk. |
||
Additional Comments Unless the Internet Archive is picking these up, the early web or permission regimes are in place, and these early instances are gone forever and will continue to be lost. |