DPA_2026_Award_Title_Banners_webpages_RMC.png

PRONOM refresh

Sam Palmer, Steve Daly, Andrew Hosgood

 pronom1.jpg  pronom2.jpg pronom3.jpg 

PRONOM is an important registry of file-format information and format identification signatures which is used heavily across the Digital Preservation community and relied upon to underpin the operation of tools such as DROID, Siegfried, FIDO and many commercial products. It also provides a user interface to allow archivists to research the formats in their collection and consider how to maintain their long-term accessibility.

PRONOM was developed by The National Archives (UK) who support it and administer the contribution, testing, and release of format identification signatures developed by experts across the wide community, as well as researching new format information and signatures in-house.

The PRONOM database, process and website have not had any significant development in over 20 years and all these elements are becoming increasingly difficult to support and keep secure, accessible, sustainable, and available - putting the continued existence of this valuable resource at risk. Support for the application is spread across various legacy components owned by multiple teams within The National Archives (TNA) and each release requires a complex coordination activity.

Although there has been a desire for many years to give attention to PRONOM, with limited budgets there are no staff within TNA devoted purely to developing tools such as DROID and PRONOM, nor to undertaking file format research and signature release activities. This work is delivered alongside other responsibilities, which does give a significant positive benefit of keeping the maintainers of these tools connected to real-world challenges arising from business-as-usual operations.

But it wasn’t possible to secure funding or resource for any significant work on PRONOM during 2025/26 due to critical business priorities such as the migration of the records of The Parliamentary Archives to TNA, and replacing core digital repository systems.
However, a small group of people felt strongly about this challenge and found spare time alongside other commitments to deliver a new version of PRONOM during 2025/26.
Rather than simply undertake a like-for-like monolithic refresh of the application, we wanted to bring both the architecture and principles up to date for the modern age. The new PRONOM needed to be trivial to support and administer; with a minimal environmental footprint; be 100% open source; intrinsically secure; support easy third-party access to the data; operate with a minimal cost; enable community contributions to both the file-format data and the application; be deliverable with minimal resource; and facilitate agile development and continuous deployment to keep pace with rapidly changing requirements. Whilst still being backwards-compatible with every version of DROID still in service!

Key aspects of PRONOM workflows include managing third-party contributions; complex regression testing; managing releases of signature sets; controlling user access for data modification and maintaining audit histories. These are also all functions critical to modern software development activities and so we decided to make use of existing software development tooling for the core functions of the new PRONOM.

Therefore, all these aspects of PRONOM are now undertaken by the source code control system “GitHub”. The PRONOM ‘database’ is no longer a complicated relational database, but is simply a set of JSON files stored in GitHub here https://github.com/nationalarchives/pronom
User-management is handled by GitHub; testing is fully automated; and releases of file format signature sets are managed by GitHub’s release process which use GitHub Actions for automated testing. It will be significantly easier for archivists to maintain the data within the new PRONOM and to deliver regular releases.

The new PRONOM website runs very efficiently using serverless cloud technologies. When no-one is accessing the site, it consumes zero energy with zero cost. There are no servers running to maintain PRONOM - resources are only used on-demand. Due to heavy caching and pre-rendering of pages, the incremental cost from user access is minimal too. The site is built using Infrastructure-As-Code (IaC) technologies which assists with business continuity and disaster recovery. Although data is hosted in GitHub, TNA do hold additional copies.

The new PRONOM is delivered initially as a Minimal Viable Product and intentionally does not replicate every feature of the previous system, however there has been a heavy focus on user engagement and, due to using modern software development and deployment practices, features can trivially be added and deployed immediately. Recent requests for feature enhancements were all delivered and deployed the same day. We have some enhancements designed but not yet deployed, for example around self-service user submission of signature information, but have launched intentionally with a minimal viable solution.

Currently the site is dual-running with the old PRONOM, but under the new address of https://pronom.nationalarchives.gov.uk promoting the system to a more prominent domain name within TNA. The previous address will be kept live for access by existing DROID applications, but will soon point to the new PRONOM once we are comfortable that all essential features are delivered by the new product. DROID users won’t notice any change.
The new PRONOM makes use of TNA’s highly-accessible and responsive web templates and we are pleased with the usability and accessibility improvements this brings over the previous system. A user survey is offered to all users to gather feedback for continuous improvement to the product.

Using serverless technologies brings a natural improvement in security posture due to there being no need for operating-system patching and updates, however this new application benefits from a range of security tools such as continuous code analysis, cloud security posture management and cloud detection and response. Independent penetration testing has been undertaken on the system before launch.

A significant issue with the previous PRONOM was that the underlying data was not programmatically accessible outside of TNA without scraping our website, however as all data underpinning the new PRONOM is available publicly via GitHub, anyone wanting to make use of this data can access it directly in whatever ways they require.

We are all very pleased with the results of this work and feel that this sets PRONOM up for many further decades of giving value to the Digital Preservation Community and introduces a design pattern for how such registries can be sustainably maintained.

 


DPC Members, login to reveal the link to the voting form!  

Votes must be cast online by 1200 (BST/UTC+1) on Monday 6th July.