File Formats

File formats define how information is encoded a digital file. File formats can be standardised, open, well documented and possibly associated with a reference implementation for how software should interact with files of that format. But file formats are not always as clearly defined, and format specifications are not always closely followed by the software that implements them. Understanding file formats and how we interact with them in practice can be therefore be critical to ensuring effective digital preservation. This page provides some guidance on the best sources of information for further information on file formats. For a broad introduction to file formats and digital preservation, see the DPC Handbook:

See also, the DPC Technology Watch Reports:

Understanding the broader challenges associated with file formats

A number of pieces of work have sought to develop methods of assessing the appropriateness of particular file formats for preservation, typically based on high level criteria. This includes the now somewhat dated DPC Tech Watch report. More recent thinking has begun to move away from this approach, due to the need to base decisions on practical experiences with working with file formats and software:

Precision and completeness are not qualties that can always be associated with file format specifications, and this lies problem lies at the root of many preservation challenges:

Examples from the Information Security community, while not typical of the preservation challenges we are likely to experience, illustrate the flexibility in many file format specifications:

File format identification

Applying a specialist software tool to identify the formats of files to be preserved is typically one of the first steps in a digital preservation work flow. Read more about File format identification here...

Seeking reference information and guidance on specific formats

There are a number of excellent sources of information to assist digital preservationists. Wikipedia remains a good place to start for high level information about a particular file format. The associated Wikidata is the also the focus of the latest effort to build a collaborative registry of file format information.

A small number of libraries and archives have been developing their own preservation focused assessments of particular file formats. These provide useful guidance on the risks associated with common file formats, and approaches for addressing them. They are located in different places on the web, but are linked from the home of a loose collaboration between these organisations on the DPC Wiki:

The Just Solve wiki provides a community driven site for gathering information about different file formats and is particularly good for discovering information on more obscure file formats:

Child Tags

PDFPDF/AJPEG2000

Parent Tags

Issues

Articles

Undateables – methods for determining date ranges for born-digital documents when file system dates go bad

Paul Young is Digital Preservation Specialist/Researcher at the National Archives UK What’s the problem? Determining reliable dates for digital records can be a source of frustration, especially when confronted with a large volume of digital files with dates that are obviously incorrect, such as why your Microsoft Word Document 1997 version dates from 1st January 1970. Dates are very important for The National Archives in particular as we look to transfer records from departments under...

Read More


DPA2018 Winners Webinars: EPISODE 3 - Navigating the PDF/A Standard

The Digital Preservation Awards 2018 (DPA2018) Winners Webinar Series provides an opportunity to learn more about some of the latest and best digital preservation initiatives, recently celebrated by the Digital Preservation Awards on World Digital Preservation Day 2018 in Amsterdam. Each episode explores the winning entry for each category of the Digital Preservation Awards, providing an overview of each initiative, investigating how their work might be used within the community, and...

Read More


DPA2018 Winners Webinars: EPISODE 2 - Archivists Guide to Kryoflux

The Digital Preservation Awards 2018 (DPA2018) Winners Webinar Series provides an opportunity to learn more about some of the latest and best digital preservation initiatives, recently celebrated by the Digital Preservation Awards on World Digital Preservation Day 2018 in Amsterdam. Each episode explores the winning entry for each category of the Digital Preservation Awards, providing an overview of each initiative, investigating how their work might be used within the community, and...

Read More


Cinderella's Stick - A Fairy Tale for Digital Preservation

Yannis Tzitzikas is Associate Professor of Information Systems in the Computer Science Department of the University of Crete and Affiliated Researcher in the Information Systems Lab at FORTH-ICS, Greece and Yannis Marketakis works as an R&D Engineer in the Information System Laboratory at FORTH-ICS. They are authors of Cinderella's Stick: A Fairytale for Digital Preservation Once upon a time, a life changing opportunity is offered to Daphne (our modern-day Cinderella). An...

Read More


PDFS: When a Standard isn’t Standard in your Collections

Leslie Johnston is Director of Digital Preservation at the U.S. National Archives and Records Administration (NARA) in Washington DC, USA In 2017 the DPC announced its “Bit List” of Digitally Endangered Species as a crowd-sourcing exercise to discover which digital materials our community thinks are most at risk, as well as those which are relatively safe thanks to digital preservation. The “Items of Concern” portion of the list included PDF, a format bearing some discussion...

Read More


The Archivist’s Guide to KryoFlux

Dorothy Waugh is Digital Archivist at Emory University in the USA On this World Digital Preservation Day, we’re here to remind you of the humble floppy disk, last century’s save icon. Though limited in terms of capacity, these inexpensive and lightweight disks were the dominant storage device for three decades, as is evidenced by the boxes of floppy disks now found among the stacks of most archives. Among the disks at Emory University are files created by novelist and Pulitzer Prize...

Read More


Against the clock: videotape digitisation and preservation now!

Stephen McConnachie is Head of Data and Digital Preservation and Charles Fairall is Clifford Shaw Head of Conservation, at the BFI  In the late 1950s magnetic videotape recording transformed the way television programmes were made, edited and broadcast. For a generation, the 2” Quadruplex format dominated the UK broadcast industry – the machinery was manufactured to military specifications and some 60 years on, it is still just possible to replay the...

Read More


Preserving the past: the challenge of digital archiving within a Scottish Local Authority

Lorraine Murray is Archivist at Inverclyde Council in Scotland During my masters course where I studiedInformation Management, Digital Preservation and Archives at the Department of Information Studies (or as I knew it at the time; HATII!) at The University of Glasgow, I came to realise how useful it was to create and use digital content with the aim of making historical information and original source material more widely accessible. However, the successful curation, management...

Read More


Automation in digital preservation

Richard Lehane is an archives and recordkeeping consultant with Recordkeeping Innovation, Sydney, Australia. Next year he'll be joining the IAEA's Archives team in Vienna, Austria When I find spare moments, I work on siegfried, a file format identification tool like DROID and fido. I've been tinkering on it now for over five years. Automation has been critical for me to sustain the project; otherwise I just wouldn’t be able to attend to all the things that need doing, besides improving the...

Read More


Digital Preservation Topical Notes Series

This series of topical notes aims to address key issues of digital preservation for a non-specialist audience. Starting with "What is Digital Preservation?", the notes examine why digital preservation is important while giving an overview of the steps needed to maintain access to digital information. Written specifically with record creators in mind, the notes also provide simple guidance on how they can ensure their digital records are preservation ready. The development of this series of...

Read More


Plans are my reality

{jcomments on} Yvonne Tunnat is Preservation Manager at the ZBW Leibniz Information Centre for Economics I was fresh from university when I started my job as a preservation manager in October 2011 at the ZBW. Having taken a module named “Digital Preservation” during my studies of library and information science and after a 9-week-internship at the Digital Preservation Department of the university of Utah, I obviously was the best they could find for the job, although I knew next...

Read More


DAM and LAM - towards convergence

Helen Hockx-Yu is Program Manager, Digital Product Access and Dissemination in the Office of Information Technologies for University of Notre Dame, Indiana USA INTRODUCTION Digital media are frequently produced and widely used at the University of Notre Dame (UND) to support education and research, and to document campus activities and athletic competitions. UND’s media products range from photographs and simple sound or video capture to sophisticated footage appropriate for...

Read More


The Data Vanishes

It’s time to come clean: I no longer know what data is. I am looking pretty hard but I just can’t see it any more. It’s a troubling realisation for someone who has spent twenty years or so trying to preserve the stuff. But the most unsettling part is this: I don’t think it’s me who is lost. Don’t get me wrong, this is not some delayed attack of post-modern angst. I am just trying to get to the end of the day. Is it possible that, just as it was reaching a crescendo of profile, polemic and...

Read More


New 'Preservation with PDF/A’ Technology Watch Report released to DPC members

The Digital Preservation Coalition (DPC) and Charles Beagrie Ltd are pleased to provide DPC members with a preview of the latest in the series of Technology Watch Reports. Preservation with PDF/A by Betsy A. Fanning (AIIM) provides a comprehensive review of the standard and its use, in order to help readers best ensure the integrity of their digital information. The member preview provides an opportunity for readers from member organisations to provide peer feedback and commentary. Please...

Read More


JP2 and colour profile limitations: a positive conclusion and findings

At the National Library of Scotland we were finding that some colour profiles in tif files were causing errors when we tried to convert them to JP2. So a couple of weeks ago I posted a question to the DPC-Discussion list asking members for their advice on JP2 files and colour profile support.  In the end we had some great help from Johan van der Knijff, from the Koninklijke Bibliotheek (KB) in the Netherlands, who has given me permission to use a distillation of our email exchange...

Read More


Quantitative File Formats for Preservation

Last month I emailed William Kilbride at DPC with a query about file formats for quantitative data for long term preservation and, as a result of that email and the ensuing conversation, I appear to have agreed to write a blog post about the topic. Here is that blog post.

Read More


PDF/Eh? redux: putting veraPDF into practice. Or how I rediscovered my inner geek

{jcomments on} Ancient history: how we got here Way back in 2013 the DPC collaborated with the OPF on a project called SPRUCE. Following on from the success of another little project called AQUA, and with some very handy funding from the Jisc, we ran a bunch of mashup events and got hands on with all sorts of digital preservation challenges. The management of PDF files, and particularly risk assessment, was a recurring theme. In response, the SPRUCE project held a hackathon in Leeds where...

Read More


VeraPDF

The veraPDF consortium will deliver a definitive validator for PDF/A: an authoritative corpus of test files establishing the objective frame of reference for validation of all parts and conformance levels of PDF/A, an open¬source, and a purpose¬built validator and policy checker to implement the collecting policies of memory institutions. We expect a vibrant community will develop to sustain these efforts. The veraPDF consortium brings together a unique network of stakeholders with...

Read More


Re:Format - What is file format obsolescence and does it really exist?

Digital preservation literature identifies file format obsolescence as one of the main threats, if not *the* threat, to the longevity of our digital data. Files must be migrated or emulated as they become obsolete, to ensure that they can still be rendered and used in the future. As Jeff Rothenberg famously put it at the end of the 1990s: "digital information lasts forever—or five years, whichever comes first". More recently however, the community has grown more sceptical. Luminaries such as...

Read More


Preserving Documents Forever: When is a PDF not a PDF?

Presentations An introduction to PDF, Sarah Higgins, Aberystwyth University Understanding PDF risks in preservation, Johan van der Knijff, National Library of the Netherlands PDF: Myths vs facts, Ange Albertini, Corkami Preserving PDF at the coalface, Tim Evans, Archaeology Data Service Introducing veraPDF, Carl Wilson, Open Preservation Foundation The Digital Preservation Coalition and the Open Preservation Foundation, with support from the European Commission and the...

Read More


Scroll to top