File Formats

File formats define how information is encoded a digital file. File formats can be standardised, open, well documented and possibly associated with a reference implementation for how software should interact with files of that format. But file formats are not always as clearly defined, and format specifications are not always closely followed by the software that implements them. Understanding file formats and how we interact with them in practice can be therefore be critical to ensuring effective digital preservation. This page provides some guidance on the best sources of information for further information on file formats. For a broad introduction to file formats and digital preservation, see the DPC Handbook:

See also, the DPC Technology Watch Reports:

Understanding the broader challenges associated with file formats

A number of pieces of work have sought to develop methods of assessing the appropriateness of particular file formats for preservation, typically based on high level criteria. This includes the now somewhat dated DPC Tech Watch report. More recent thinking has begun to move away from this approach, due to the need to base decisions on practical experiences with working with file formats and software:

Precision and completeness are not qualties that can always be associated with file format specifications, and this lies problem lies at the root of many preservation challenges:

Examples from the Information Security community, while not typical of the preservation challenges we are likely to experience, illustrate the flexibility in many file format specifications:

File format identification

Applying a specialist software tool to identify the formats of files to be preserved is typically one of the first steps in a digital preservation work flow. Read more about File format identification here...

Seeking reference information and guidance on specific formats

There are a number of excellent sources of information to assist digital preservationists. Wikipedia remains a good place to start for high level information about a particular file format. The associated Wikidata is the also the focus of the latest effort to build a collaborative registry of file format information.

A small number of libraries and archives have been developing their own preservation focused assessments of particular file formats. These provide useful guidance on the risks associated with common file formats, and approaches for addressing them. They are located in different places on the web, but are linked from the home of a loose collaboration between these organisations on the DPC Wiki:

The Just Solve wiki provides a community driven site for gathering information about different file formats and is particularly good for discovering information on more obscure file formats:

Child Tags

PDFPDF/AJPEG2000

Parent Tags

Issues

Articles

Current Trends and Future Directions for Digital Imaging in Libraries and Archives

Introduction Issues of validation, compression and preservation become more important in image management as collections grow in size and complexity. On one hand compression is seen as a necessary requirement to deal with the scale of the collection on order to make preservation a practical reality, but preservation advice generally discourages compression which is seen as a preservation risk. Validation is essential for quality assurance in the development of large collections and is a...

Read More


Digital Preservation with Portable Documents: a workshop to introduce and discuss the PDF/A version

Introduction The portable document format (PDF) is ubiquitous, easily-produced and is widely used in a diverse range of environments.  A variant of the standard – PDF/A in which ‘A’ stands for archive – was published in 2005.  This version of the standard, also published as ISO 19005, minimises the dependencies between the contents of a file and the system on which it is rendered.  This self-contained characteristic of PDF/A makes it particularly attractive for those...

Read More


JPEG 2000 for the Practitioner

Introduction A free seminar to explore and examine the use of JPEG 2000 in the cultural heritage industry was held at the Wellcome Trust. The seminar included specific case studies of JPEG 2000 use. It examined technical issues that have an impact on practical implementation of the format, and explored the context of how and why organisations have chosen to use JPEG 2000. Although the seminar had an emphasis on digitisation and digital libraries, the papers are relevent to a range...

Read More


DPC/BL Joint JPEG 2000 Workshop

Introduction The JPEG2000 image compression technique has been cited by experts as a new archiving format for digital images. It is both a preservation and delivery format, and has been seen as a possible alternative to the TIFF format which most institutions use as a long-term archiving standard. Produced by both imaging experts and the Joint Photographic Experts Group, it is now a recognised ISO standard. The standard JPEG file format which is so widely in use is not yet an ISO...

Read More


Scroll to top