Supporting Evidence

Key motivators

BBC Domesday Project. Digital obsolescence because of proprietary storage formats. Rescued only partially through enormous effort. Obsolescence after fewer than 20 years.

Technology

Reputation

NASA Viking Mars program had brittle tapes and couldn’t decode data formats from 1976 mission. Eventually found enough paper information that they re-keyed everything.

Technology

Hackers Who Recovered NASA's Lost Lunar Photos

Technology

Hurricane Sandy caused flooding in data centers resulting in potential loss of business data 

Business Continuity

High-quality footage of Apollo 11 moonwalk lost to history because backup tapes were lost/recorded over.

Corporate/Cultural Memory

BBC (and American networks) regularly re-used tapes. Many important programs like Dr. Who have lots of missing episodes.

Corporate/Cultural Memory

With the change in presidential administrations, web pages and data sets were removed from government agency websites.

Corporate/Cultural Memory

Reputation

Accountability

This is the oldest tech still used by the US government 

Technology

Five of the most outdated IT systems in the government 

Technology

Retail data, patient data, etc. might be manipulated without owners noticing. Good digital preservation is essential for discovering hacks and recovering data.

Security

Accountability

U.S. Nuclear System Runs on Early Computers and 8-Inch Floppy Disks 

Technology

Security

Business Continuity

As much as 80% of scientific data from the 1990s is irretrievable 

Corporate/Cultural Memory

Enabling Research

Authenticity

Accountability

Reputation

Precedent-setting Supreme Court opinions contain links to online sources that are disappearing

Corporate/Cultural Memory

Authenticity

Accountability

Reputation

Meet the digital historians on a mission to preserve data for future generations 

Corporate/Cultural Memory

How Archivists Could Stop Deepfakes From Rewriting History 

Authenticity

Accountability

Reputation

Fake news

Authenticity

Accountability

Reputation

Future-Proofing Critical Digital Data in an Increasingly Complex Global Regulatory Environment , extract from report undertaken by the IGI https://iginitiative.com supported by Preservica.

Full report available here 

Accountability

Corporate/Cultural Memory

Preserving history and ensuring citizen access to digital government records using the cloud extract from report undertaken by the IGI https://iginitiative.com supported by Preservica.

Full report available here 

Accountability

Corporate/Cultural Memory

A Practical Approach to Governing 170 Years of Critical Corporate Records extract from report undertaken by the IGI https://iginitiative.com supported by Preservica.

Full report available here 

Accountability

Corporate/Cultural Memory

In regulated industries such as financial services, digital archiving can help firms meet specific compliance needs. MiFID II, for example, requires that all firms keep unalterable records of all electronic communications intended to conclude in or confirm a transaction. The unalterable, date and time-stamped format of digital archives can also provide organisations with legally admissible records of all online activity, enabling disputes to be more easily resolved.

Information provided by MirrorWeb

Accountability

Authenticity

Compliance

For brands and public sector organisations, digital archiving allows them to capture a permanent record of web and social media content, protecting it from alteration and unauthorised use. It also ensures that content continues to deliver value long into the future. The use of big data techniques such as sentiment analysis to understand customer engagement and brand perception over time, for example, could be used to inform future marketing strategy.

Information provided by MirrorWeb

Authenticity

Reputation

Enabling Research

2.5 quintillion bytes of data are created online every single day. To try and conceptualise that, if you laid out 2.5 quintillion one pence coins, it would cover the surface area of Earth five times over.

Information provided by MirrorWeb

Corporate/Cultural Memory

Technology

Over 90% of all the data in the history of the world was generated in the last two years (although that window is shortening!).
Information provided by MirrorWeb

Corporate/Cultural Memory

52% of links to web pages of government departments quoted in Hansard between 1997 and 2006 were broken by 2007 Corporate/Cultural Memory

Every single minute:

  • 456,000 tweets are posted
  • 510,000 comments are posted on Facebook
  • Wikipedia undergoes 600 page edits
  • 154,200 Skype calls are made

Information provided by MirrorWeb

Corporate/Cultural Memory

95 million photos and videos are shared on Instagram every day.

Information provided by MirrorWeb

Corporate/Cultural Memory

The average size of a web page is approximately 3MB, and the average website is about 50/60MB. The time taken to crawl a website would depend on a number of factors, most notably on the make up of the URIs, i.e., how many media files, pages, images, PDFs etc. there are. The other major factor is the structure of the site in terms of links and the CMS used, as this has a significant impact on the current limitations of crawl technology such as Heritrix.

Information provided by MirrorWeb

Technology

MirrorWeb recently worked with The National Archives to migrate the UK Government Web Archive, including Twitter and YouTube content, to the MirrorWeb cloud platform. It took two weeks to capture and transfer 120TB of data from 72 hard drives at The National Archives to internal hard drives, before transferring and hosting that archive in the cloud. To put that in perspective, 120TB of data is five and a half times the complete film catalogue on Netflix.

Information provided by MirrorWeb

Technology

The average university website might be around 30-60GB, and this would take anything between 6-20 hours on average to crawl, dependent on the platform and makeup of the content and links within.

Information provided by MirrorWeb

Technology

Enabling Research

The cloud does not need infrastructure to accommodate growth, cutting down on a lot of storage overheads and meaning less costs for customers. For MirrorWeb to archive 30GB of website data, it would cost just £650 for the year, and £300 for a social media account annually.

Information provided by MirrorWeb

Technology

Costs

Image Bank

Why orgs keep data


Scroll to top