Added on 14 June 2021


 

With traditional web crawling strategies, heritage institutions could easily harvest large amounts of static web pages. But the new generation of dynamic sites requires a different approach. 

In the report 'Server-Side Web Archiving', by Eoin O'Donohoe from the Netherlands Institute of Sound and Vision explores the possibilities.

Static or dynamic

Websites are no longer just static portals with text and images as we have known them since the early days of the internet. The ever-increasing complexity of web technologies is contributing to a wider range of websites that deliver a richer, more dynamic experience. Think of interactive documentary websites, where viewers can determine their own way through a story, online artworks or websites with game elements. We call the first type of sites static, the second dynamic.

Web crawling no longer sufficient

These developments do not make it any easier for heritage institutions to preserve websites. Traditional web crawling strategies have worked well for mass harvesting of static web pages, including text and images. But with this approach, a heritage institution only preserves the outermost layer of a website. 

Focus on server side

On dynamic websites, there are all kinds of operations at the back end, the server side, to retrieve, present and store data that is used by the user at the front end, the client side to provide a unique user experience. Think of the recommendations on YouTube that present each user with a unique offer. For example, the content of a server can include scripts, media, and databases that are controlled by the creators of the website.

First exploration

In order to preserve the dynamic websites that the web is now rich in, heritage institutions must focus on the server side instead of using the current 'crawling' methodology. This means that an archive approaches a website builder or administrator to receive the files from the server. This new field of activity has been explored as part of the NDE Software Archiving project.

The report Server-Side Web Archiving by Eoin O'Donohoe (Netherlands Institute of Sound and Vision) discusses the important properties of the dynamic web. In addition, it provides an overview of the tools with which institutions can record these websites. The report concludes with some examples and use cases that show why server-side archiving can provide an additional method for preserving dynamic web content.


Scroll to top