File Formats

File formats define how information is encoded a digital file. File formats can be standardised, open, well documented and possibly associated with a reference implementation for how software should interact with files of that format. But file formats are not always as clearly defined, and format specifications are not always closely followed by the software that implements them. Understanding file formats and how we interact with them in practice can be therefore be critical to ensuring effective digital preservation. This page provides some guidance on the best sources of information for further information on file formats. For a broad introduction to file formats and digital preservation, see the DPC Handbook:

See also, the DPC Technology Watch Reports:

Understanding the broader challenges associated with file formats

A number of pieces of work have sought to develop methods of assessing the appropriateness of particular file formats for preservation, typically based on high level criteria. This includes the now somewhat dated DPC Tech Watch report. More recent thinking has begun to move away from this approach, due to the need to base decisions on practical experiences with working with file formats and software:

Precision and completeness are not qualties that can always be associated with file format specifications, and this lies problem lies at the root of many preservation challenges:

Examples from the Information Security community, while not typical of the preservation challenges we are likely to experience, illustrate the flexibility in many file format specifications:

File format identification

Applying a specialist software tool to identify the formats of files to be preserved is typically one of the first steps in a digital preservation work flow. Read more about File format identification here...

Seeking reference information and guidance on specific formats

There are a number of excellent sources of information to assist digital preservationists. Wikipedia remains a good place to start for high level information about a particular file format. The associated Wikidata is the also the focus of the latest effort to build a collaborative registry of file format information.

A small number of libraries and archives have been developing their own preservation focused assessments of particular file formats. These provide useful guidance on the risks associated with common file formats, and approaches for addressing them. They are located in different places on the web, but are linked from the home of a loose collaboration between these organisations on the DPC Wiki:

The Just Solve wiki provides a community driven site for gathering information about different file formats and is particularly good for discovering information on more obscure file formats:

Child Tags


Parent Tags



What’s Up, (with Google) Docs? – The Challenge of Native Cloud Formats

Paul Young is Digital Preservation Specialist/Researcher at The National Archives UK The Challenge The National Archives has recently been looking at the issue of transferring material from departments with Google Workspace Environments (previously GSuite). The rise in cloud document management has brought new challenges which require changes to existing processes and methods.  One of the biggest issues is dealing with the Google native cloud formats produced by the suite of...

Read More

Preserving the Mystery (Box)

Eleanor Dumbill is the Library Support Officer (Research Repository) at the University of Loughborough. Shortly after joining the repository team at Loughborough, I was presented with a mysterious cardboard box. My colleagues were only able to tell me that the contents were generally related to past doctoral theses and that the box had been sitting under a desk in the office for at least ten years. I’ve been able to use some of my experiences investigating this box in training as an...

Read More

Finding the Cutting Edge in Common Formats

Elizabeth Kata is Digital Archives Assistant at the International Atomic Engergy Agency (IAEA). She attended iPres2019 with support from the DPC's Career Development Fund which is generously funded by DPC supporters. Placing a session with the title “Common Formats” under the theme “Cutting Edge” seemed at first contradictory as I looked over the iPRES 2019 program, but the four papers presented in this session demonstrated cutting edge work being done with and to...

Read More

Hyperreal Intangible Cultural Heritage: Digital Preservation of Dance

Anna Oates is Scholarly Communication and Discovery Services Librarian at Federal Reserve Bank of St. Louis in the USA, and former graduate student of the Illinois School of Information Sciences where these studies originated. A Roundabout Introduction to Digital Preservation of Dance: Navigating the PDF/A Standard Four months after its initial submission, my master's thesis [1] appeared on the University of Illinois at Urbana-Champaign institutional repository. Since this...

Read More

The Pre-Digital Preservation Black Hole

Eng Sengsavang is Reference Archivist for United Nations Educational, Scientific and Cultural Organization (UNESCO) in Paris, France Credit: George Chernilevsky: My experience of technology is inevitably historical: as a child of the 80s, an adolescent of the 90s, and an adult into the 2000s, I experienced first-hand the paradigm shift from analogue to digital technologies,...

Read More

Extracting Information from 5.25” Floppy Disks – Historic Environment Scotland

Frederick Alexander is Digital Archivist at Historic Environment Scotland The digital archive at Historic Environment Scotland comprises of 42 terabytes of digital materials. This archive, alongside its physical counterpart, contains information relating to the historic environment of Scotland. Scotland’s historic environment is the physical evidence of past human activity, from a prehistoric fort, to a Victorian garden, to a drawing of a cityscape. In this blog post I want to give an...

Read More

Sharing format preservation information and how this will benefit us all

Jon Tilbury is CTO of Preservica, and is based in the UK World Digital Preservation Day is all about the global community coming together to share ideas and collaborate. So how can we all work more closely on sharing format preservation information and what is the value of doing this?

Read More

PDF: you know she’s a little bit dangerous

Yvonne Tunnat is Digital Preservation Project Manager at Leibniz Information Centre for Economics in Germany When you think about risky file formats, the PDF format is not the first one that springs in your mind isn’t it? Instead, you might think of old word processing software for C64 or amiga 500, when 'windows' were just some glass to look through. Or, a more recent typical risky file format scenario: the dozen flavours of exotic file formats your institutional scientists...

Read More

Creating SIPs without Breaking a Sweat: The Pre-ingest Tool and File Scraper

Heikki Helin is Senior Technology Coordinator for Digital Preservation Services at CSC - IT Center for Science Ltd in Espoo, Finland The Finnish national digital preservation service, based on the OAIS reference model, has been in production since 2015. Providing services for preserving the cultural heritage and research data sectors, it is a service funded by the Ministry of Education and Culture of Finland. Currently, we have more than 1.3 million Archival Information Packages...

Read More

Ubiquity and the Floppy Disk: Challenges with Obsolete Carriers

Kevin Molloy is Manuscripts Collection Manger at State Library Victoria in Melbourne, Australia Floppy disks have a remarkable technological provenance that dates from the late 1960s. Developing through many iterations, the standard 3½ inch disk, produced from 1981, had become largely ubiquitous by the 1990s as the go-to format for business, personal storage and transfer systems. Use of the 3½ floppy lasted until the mid-2000s, and, as a storage device, is found in many physical collections...

Read More

Motivation to Undertake File Format Identification Research for Plain Text Files

Dr Santhilata Kuppili Venkata is Digital Preservation Specialist / Researcher at The National Archives, UK The file format identification problem has been of interest for quite some time in the areas of digital archiving and digital forensics. Many researchers are working to find a solution to this problem. While most of the work is done to identify files with binary file formats, not much work is found to identify the file type of plain text files. In this digital era, files are...

Read More

Undateables – methods for determining date ranges for born-digital documents when file system dates go bad

Paul Young is Digital Preservation Specialist/Researcher at the National Archives UK What’s the problem? Determining reliable dates for digital records can be a source of frustration, especially when confronted with a large volume of digital files with dates that are obviously incorrect, such as why your Microsoft Word Document 1997 version dates from 1st January 1970. Dates are very important for The National Archives in particular as we look to transfer records from departments under...

Read More

DPA2018 Winners Webinars: EPISODE 3 - Navigating the PDF/A Standard

The Digital Preservation Awards 2018 (DPA2018) Winners Webinar Series provides an opportunity to learn more about some of the latest and best digital preservation initiatives, recently celebrated by the Digital Preservation Awards on World Digital Preservation Day 2018 in Amsterdam. Each episode explores the winning entry for each category of the Digital Preservation Awards, providing an overview of each initiative, investigating how their work might be used within the community, and...

Read More

DPA2018 Winners Webinars: EPISODE 2 - Archivists Guide to Kryoflux

The Digital Preservation Awards 2018 (DPA2018) Winners Webinar Series provides an opportunity to learn more about some of the latest and best digital preservation initiatives, recently celebrated by the Digital Preservation Awards on World Digital Preservation Day 2018 in Amsterdam. Each episode explores the winning entry for each category of the Digital Preservation Awards, providing an overview of each initiative, investigating how their work might be used within the community, and...

Read More

Cinderella's Stick - A Fairy Tale for Digital Preservation

Yannis Tzitzikas is Associate Professor of Information Systems in the Computer Science Department of the University of Crete and Affiliated Researcher in the Information Systems Lab at FORTH-ICS, Greece and Yannis Marketakis works as an R&D Engineer in the Information System Laboratory at FORTH-ICS. They are authors of Cinderella's Stick: A Fairytale for Digital Preservation Once upon a time, a life changing opportunity is offered to Daphne (our modern-day Cinderella). An...

Read More

PDFS: When a Standard isn’t Standard in your Collections

Leslie Johnston is Director of Digital Preservation at the U.S. National Archives and Records Administration (NARA) in Washington DC, USA In 2017 the DPC announced its “Bit List” of Digitally Endangered Species as a crowd-sourcing exercise to discover which digital materials our community thinks are most at risk, as well as those which are relatively safe thanks to digital preservation. The “Items of Concern” portion of the list included PDF, a format bearing some discussion...

Read More

The Archivist’s Guide to KryoFlux

Dorothy Waugh is Digital Archivist at Emory University in the USA On this World Digital Preservation Day, we’re here to remind you of the humble floppy disk, last century’s save icon. Though limited in terms of capacity, these inexpensive and lightweight disks were the dominant storage device for three decades, as is evidenced by the boxes of floppy disks now found among the stacks of most archives. Among the disks at Emory University are files created by novelist and Pulitzer Prize...

Read More

Against the clock: videotape digitisation and preservation now!

Stephen McConnachie is Head of Data and Digital Preservation and Charles Fairall is Clifford Shaw Head of Conservation, at the BFI  In the late 1950s magnetic videotape recording transformed the way television programmes were made, edited and broadcast. For a generation, the 2” Quadruplex format dominated the UK broadcast industry – the machinery was manufactured to military specifications and some 60 years on, it is still just possible to replay the...

Read More

Preserving the past: the challenge of digital archiving within a Scottish Local Authority

Lorraine Murray is Archivist at Inverclyde Council in Scotland During my masters course where I studiedInformation Management, Digital Preservation and Archives at the Department of Information Studies (or as I knew it at the time; HATII!) at The University of Glasgow, I came to realise how useful it was to create and use digital content with the aim of making historical information and original source material more widely accessible. However, the successful curation, management...

Read More

Automation in digital preservation

Richard Lehane is an archives and recordkeeping consultant with Recordkeeping Innovation, Sydney, Australia. Next year he'll be joining the IAEA's Archives team in Vienna, Austria When I find spare moments, I work on siegfried, a file format identification tool like DROID and fido. I've been tinkering on it now for over five years. Automation has been critical for me to sustain the project; otherwise I just wouldn’t be able to attend to all the things that need doing, besides improving the...

Read More

Scroll to top