Content Published on the Web Which Cannot Easily Be Captured Through Conventional Web Archiving Practices

   Endangered large

Material that is not capturable via conventional web-archiving practices (i.e. remote capture with a non-browser-based crawler). The common characteristic of the material is not so much the type of content, or the context but rather the preservation risk posed to the material as a result of decisions made by the website creators, to use technologies and make design decisions that do not support the capture of the content, combined with the limitations of current web archiving processes.

Digital Species: Web

New Entry

Consensus Decision

Imminence of Action

Immediate action necessary. Where detected should be stabilized and reported as a matter of urgency.

Significance of Loss

The loss of tools, data or services within this group would impact on people and sectors around the world.

Effort to Preserve | Inevitability

Loss seems likely: by the time tools or techniques have been developed the material will likely have been lost.

Examples

Examples are wide ranging but include: 1. Interactive content such as maps, charts, 3D models, multi-page forms etc.; 2. Content that is only accessible through search (and does not support a blank search with pagination URLs; 3. Content that is only accessible via POST or Ajax requests (i.e. 'Load More' issues);.4. Content hosted on sites which aggressively block crawlers;

5. Content hosted on proprietary platforms whose technical implementation makes web archiving difficult (Wix, Medium etc.)

‘Critically Endangered’ in the Presence of Aggravating Conditions

Creation and design decisions that do not support the capture of the content; limitations of current web archiving processes.

‘Vulnerable’ in the Presence of Good Practice

Continued development of tools that can capture some types of this material; continued resourcing of the web archiving community and developers working in this space;

Longer-term work towards a cultural change that prioritizes accessibility and achievability.

2023 Review

This is a new Bit List entry added in 2023 to draw attention to the particular challenges of web content that is not capturable via conventional web-archiving practices. While this entry can be considered very wide-ranging, covering different kinds of content in and across different entries represented in the Bit List, this entry was added to draw attention to risks and issues not fully addressed by these entries–those relating to material created and designed with technologies that do not support the capture of the content combined with the limitations of current web archiving processes. Given the ephemeral nature of the web, any material that is not preserved in a web archive is at risk; However, this entry focuses on material that is not accessible to conventional crawlers that is considerably less likely to be preserved because: 1. While the development of alternative capture tools (i.e. forms of browser-based capture) provide potential options to mitigate some of these issues, they remain imperfect and using such tools requires significant time and resources, meaning they are unlikely to be applied to very large-scale crawls; and 2. While website creators can be influenced to avoid such problematic technologies, even in the context of current work (e.g. within the UK Government), this can be difficult.

Additional Comments

The online experience is becoming increasingly ‘mediated’ and, due to the prevalence of personalized experiences, there is little that can be seen as ‘generic’ which can be meaningfully captured. Add-ons such as ad, tracker and script blockers also fundamentally change how users experience the web compared to others. 

There is no specific time constraint but given the relatively ephemeral nature of the web, it is likely that there is an ongoing and constant loss of material.

Given the broad scope of the entry, it can be difficult to assign an overall significance level, with some examples being trivial and others being highly important. Quantifying the impact of loss of this entry is also difficult, but it would be fair to say that it would have a significant impact on the ability of citizens to hold their governments to account and on the completeness of the historical record. Given that these issues are common across the web archiving community, this thus becomes a global problem.

Some important examples from this entry includes highly significant content of national interest that is currently difficult to capture, for instance maps showing proposed changes to electoral boundaries, government blogs and published datasets which can only be accessed through search or via 'Load more', and whole sites of national importance that aggressively block crawlers. Other pertinent examples include PowerBI and Tableau which are both increasingly widely adopted visualization tools and are very difficult to capture and also to replay. They are used to disseminate data about all sorts of things but particularly government transparency information. The mitigating action of publishing the underlying data (for example as CSV or XLS(S)) is not often observed on the web.

See also:

  • Beking, A. (2021) ‘Fixity When Nothing is “Fixed” - Reflections on Upstream Engagement and Digital Preservation’, Digital Preservation Coalition Blog. Available at: https://www.dpconline.org/blog/wdpd/abeking-wdpd21 [accessed 24 October 2023]

  • Monks-Leeson, E., White, R. and Palendat, K. (2021) ‘Adaptability in the face of adversity - Archiving the Web to Help Persons Forced to FleWeb Archiving the 2021 Federal Election at Library & Archives Canada – A Look Behind the Curtain’, Digital Preservation Coalition Blog. Available at: https://www.dpconline.org/blog/wdpd/lac-wdpd21 [accessed 24 October 2023]

  • Wilson, T. (2022) ‘Adaptability in the face of adversity - Archiving the Web to Help Persons Forced to Flee’, Digital Preservation Coalition Blog. Available at: https://www.dpconline.org/blog/blog-tom-wilson-22 [accessed 24 October 2023]

  • Hawes, A. (2020) ‘Archiving the architecture of Saydnaya: Using Webrecorder to capture 3D-spaces and ambient audio’, Documenting the Interactive Documentary Webinar Available at: https://youtu.be/mXLSwrEHyNU [accessed 24 October 2023]


Scroll to top