File Formats

File formats define how information is encoded a digital file. File formats can be standardised, open, well documented and possibly associated with a reference implementation for how software should interact with files of that format. But file formats are not always as clearly defined, and format specifications are not always closely followed by the software that implements them. Understanding file formats and how we interact with them in practice can be therefore be critical to ensuring effective digital preservation. This page provides some guidance on the best sources of information for further information on file formats. For a broad introduction to file formats and digital preservation, see the DPC Handbook:

See also, the DPC Technology Watch Reports:

Understanding the broader challenges associated with file formats

A number of pieces of work have sought to develop methods of assessing the appropriateness of particular file formats for preservation, typically based on high level criteria. This includes the now somewhat dated DPC Tech Watch report. More recent thinking has begun to move away from this approach, due to the need to base decisions on practical experiences with working with file formats and software:

Precision and completeness are not qualties that can always be associated with file format specifications, and this lies problem lies at the root of many preservation challenges:

Examples from the Information Security community, while not typical of the preservation challenges we are likely to experience, illustrate the flexibility in many file format specifications:

File format identification

Applying a specialist software tool to identify the formats of files to be preserved is typically one of the first steps in a digital preservation work flow. Read more about File format identification here...

Seeking reference information and guidance on specific formats

There are a number of excellent sources of information to assist digital preservationists. Wikipedia remains a good place to start for high level information about a particular file format. The associated Wikidata is the also the focus of the latest effort to build a collaborative registry of file format information.

A small number of libraries and archives have been developing their own preservation focused assessments of particular file formats. These provide useful guidance on the risks associated with common file formats, and approaches for addressing them. They are located in different places on the web, but are linked from the home of a loose collaboration between these organisations on the DPC Wiki:

The Just Solve wiki provides a community driven site for gathering information about different file formats and is particularly good for discovering information on more obscure file formats:

Child Tags

PDFPDF/AJPEG2000

Parent Tags

Issues

Articles

DPC Reading Club: How the concept of AI technology impacts digital archival expertise

Today’s Reading Club session was a thought provoking discussion inspired by an article from Amber Cushing and Giulia Osti in the Journal of Documentation - “So how do we balance all of these needs?”: how the concept of AI technology impacts digital archival expertise (https://doi.org/10.1108/JD-08-2022-0170). The article summarized the thoughts and expectations of a focus group of archival practitioners around Artificial Intelligence (AI) and the impact on expertise within the sector. After a...

Read More


File format recommendations - I wouldn’t say they are unacceptable, but I wouldn’t recommend them either

Last week I joined a webinar entitled “A Comparison of Recommended File Formats and the New Dutch Method for File Format Assessment”. It’s great to see the outcomes of this collaborative work, and it’s clear that it has already played an important role in bringing out some key themes in the preservation approaches of various organizations. But I felt that a number of aspects give cause for concern. The collation of file format policies has highlighted some approaches that I believe should be...

Read More


Title: Preservation Digitisation Project – Digitising the Tasmanian Archives audio visual collection

Karin Haveman is Acting Manager Government Archives and Preservation at the Tasmanian Archives and Digitisation Services Coordinator In February 2021, Libraries Tasmania launched the Preservation Digitisation Project – a major collaborative project that brings together Digitisation Services, System Support and Delivery, Government Archives, and the Community Archives teams. The aim of this project is to digitise our Tasmanian film, sound, and video collections for long-term...

Read More


Digital preservation at the National Library of Australia

Libor Coufal is Assistant Director for Digital Preservation at the National Library of Australia We are very mindful that it has been (not quite all, but mostly) quiet on the NLA communication front in the last several years, while we have busily worked on implementing our digital preservation program. Our attendance at this year’s iPres (our first since 2014) was a great opportunity to pause and reflect on the progress we have made. We would like to update the community on what we have...

Read More


How is DPC RAM being used?

How is DPC RAM being used? Here are some examples of how DPC RAM has been used by members of the community to help benchmark their progress in digital preservation. If you have a example of DPC RAM in action that you would like to share please contact us: Assessing where we are with digital preservation - a blog post from Fabiana Barticioti, Digital Assets Manager at LSE Library From 'starting digital preservation' to 'business as usual' - a blog post from Anna McNally, Senior...

Read More


Digital Preservation of Community Archives: Breaking down barriers to digital preservation through training

Dr Deborah Thorpe is Education and Outreach Manager for the Digital Repository of Ireland This autumn, the Digital Repository of Ireland (DRI) held an online introductory training programme in digital preservation for our members, with a particular focus on the training and community-building needs of community archivists. This course has been helping with breaking down barriers to digital preservation, by making topics such as appraising your digital collections for preservation,...

Read More


Understanding User Needs: Technology Watch Guidance Note on Access to digital collections available on general release

The DPC has released the next in its series of Technology Watch Guidance Notes on Access to digital collections. The new Guidance Note entitled Understanding User Needs by Sharon McMeekin is available to the digital preservation community from today. Understanding User Needs provides a pragmatic approach to conducting and interpreting a user needs analysis, whilst highlighting the importance and significance of the results.

Read More


Level up with DPC RAM

The DPC’s Rapid Assessment Model is a helpful tool for assessing an organization’s maturity with digital preservation. It allows you to consider both where you are currently and where you would like to be, and highlight gaps in your current digital preservation capacity.    This resource is designed to help you work out how to address those gaps and move up the levels of RAM. For each of the 11 sections of RAM there are helpful tips, links to useful resources and case studies...

Read More


Scroll to top