PDF

   Vulnerable small

Documents presented in PDF (Portable Document Format) format (ISO 32000:1 and ISO 32000:2) and other data wrapped inside them, including all variants and versions, including PDF/A.

Digital Species: Formats

Trend in 2022:

No changeNo Change

Consensus Decision

Added to List: 2017

Trend in 2023:

No changeNo Change

Previously: Vulnerable/Endangered

Imminence of Action

Action is recommended as required, with periodic review every five years.

Significance of Loss

The loss of tools, data or services within this group would impact on people and sectors around the world.

Effort to Preserve | Inevitability 

It would require a small effort to preserve materials in this group, requiring the application of proven tools and techniques.

Examples

Documents stored offline, or online in repositories or EDRMS, including reports, agenda, minutes, correspondence, contracts, essays, articles, or research papers, PDF 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 and 2.0. PDF/A, PDF/X and PDF/E.

‘Endangered’ in the Presence of Aggravating Conditions

Loss of context; loss of authenticity or integrity; external dependencies; poor storage; lack of understanding; significant diversity of data; poorly developed digitization specifications; lack of integrity checking; poorly developed migration or normalizations specifications; lack of virus control; poor storage or replication; lack of validation at the point of creation; encryption.

‘Lower Risk’ in the Presence of Good Practice

Well-managed data infrastructure; preservation planning; authenticity managed; use of persistent identifiers; reduction of dependencies; application of records management standards; recognition of preservation requirements beyond formats; strategic investment in digital preservation; preservation roadmap; participation in digital preservation community; format validation; version control.

2023 Review

The PDF entry was added in 2017 and was split into two entries, ‘PDF/A’ and ‘PDF other than PDF/A’, in 2019 to emphasize the different threats faced by different types of PDF.

The 2021 Jury agreed with this decision and noted that trends for the PDF other than PDF/A entry and the PDF/A entry were both towards a reduced risk.

The 2023 Council recommended merging the two previously split entries (‘PDF/A’ and ‘PDF other than PDF/A’). After reviewing the two entries separately, they found more similarities than differences between the two and indeed across all types of PDF (not just PDF/A). Due to the level of commercial, open-source tools that are available to assist preservation, the risk of loss is less persistent than previously suggested. Therefore a Vulnerable classification is appropriate for all PDF formats as whole.

Additional Comments

There is a lot of material produced and kept in PDF. Some of it is authoritative, in other words, the only available copy, while some of it is not. However, if it is the only copy and it is lost, it can have an impact on a lot of people

The challenge in evaluating the significance and impact of the loss of PDFs is that they’re quite often a surrogate of something else, whether a digitized record or a Word document, etc. Whether or not that record is retained may be a factor. We should also be considering PDF Portfolios, which are an extension of PDF 1.7. Portfolios contain embedded files and can include text documents, spreadsheets, PowerPoints, emails, Computer Aided Design (CAD) drawings.

Vulnerability also depends on if the PDF file conforms to the specific PDF/A standard or not. This is caused by a combination of 1) not conforming to the standard and 2) collection managers assuming that the file is resilient simply because it purports to be a PDF/A. This risk is less with the format and more with the understanding and experience in data management. Moreover, materials embedded in or attached to PDF/A-2 and PDF/A-3 may be at risk.

See also:


Scroll to top