About

veraPDF is an open source conformance checker that validates all current parts and levels of the ISO 19005 (PDF/A) specifications. PDF/A is the archival document standard based on original PDF specifications. veraPDF is one of three conformance checkers developed with funding from the PREFORMA (PREservation FORMAts for culture information/e-archives) project.

The challenge

  • Organisations that have responsibilities for long term preservation and access to digital content face several challenges:
  • developing canonical, unambiguous interpretations of complex format specifications;
  • obtaining software that can validate files of particular formats according to the canonical specification; and
  • understanding the technical properties of files to inform long term preservation and access decisions.

veraPDF helps organisations to address these challenges. The conformance checker can be used to validate PDF/A files in different scenarios in digital preservation workflows: creation, ingest, digitisation, migration. The purpose of the software is to:

  • verify that file has been produced according to the specifications of a standard file format;
  • verify that a file matches the acceptance criteria for long-term preservation;
  • report properties that deviate from the standard specification and acceptance criteria in human and machine readable formats; and
  • perform simple automated fixes for deviations in the metadata of the preservation file, leaving the original bitstream untouched.

Open source licencing

The PREFORMA project specified that the software must be made available under a dual licence (MPL v2+ / GPL v3+). The test datasets and documentation are licensed under CC-BY-4. Using an open source approach means that anyone can download and use the software, modify it to meet their requirements and can provide feedback by reporting bugs or requesting new features. It also means that the code and documentation are made publically available and will be sustainable beyond the life of the project, or an individual developer.

User community

One of the core goals was to establish an active user community. The draft functional and technical specifications for veraPDF were published openly for review by the community during the design phase. We make regular software releases and respond to user feedback. We also run webinars and host mailing lists to help encourage use of the software and answer questions.

Development

veraPDF developed several software components:

  • a general purpose software library for format validation;
  • a PDF/A validation model;
  • a PDF/A parser and conformance checker; and
  • command line, GUI and REST interfaces.

The validation software library is intended to provide a starting point for anyone who wants to create a validator for any file format. The veraPDF software takes a set of XML validation rules and applies them to a validation model. Both of these concepts, i.e. the validation rules and model, are format agnostic.

All veraPDF software is developed in Java so that it is cross-platform, meaning it can be installed on Windows, Mac, and Linux machines.

veraPDF offers full support for all PDF/A versions (1,2,3) and levels (A,B,U). It can also extract and report technical details from PDFs to support custom policy checks beyond the PDF/A specifications:

  • It produces an XML report on all metadata, resources, embedded files, pages, annotations, document security, etc
  • It has an embedded facility to create and check against policies that can be developed using an XML schema.

One use case might be that an archive does not allow attachments to their PDF/A files. veraPDF can detect and analyse any embedded files, which are allowed in the PDF/A-3 specification.

VeraPDF Image 1

Deliverables

veraPDF test corpus

To fully understand the PDF/A format specification we developed the veraPDF test corpus, a substantial body of open test data for the PDF/A specifications (Versions 1B, 1A, 2B, 2U, 2A, 3B, 3U, 3A) as well as a number of additional tests files for ISO 32000-1. The test corpus complements the Isartor and Bavaria test suites and contains over 1,500 files.

Testing approach

We carefully examined each clause in the standards, and developed a formal grammar to describe the requirements in a machine-readable fashion. We then produced validation rules with an accompanying programmatic test for each requirement. PASS and FAIL corpus files were created to test the validator’s functionality. This process highlighted any misunderstandings on the veraPDF consortium’s part or revealed ambiguities in the standards. When there was an issue, we worked with the PDF Association’s PDF Validation Technical Working Group (TWG), analysing PDF validation issues as part of a transparent process.

Although the ISO specifications could not be revised, the ambiguities resolved during the development of veraPDF were recorded as a PDF Association Technical Note to aid consistency in interpretation. PDF Association Technical Notes have a good track-record of adoption by the industry. The development of veraPDF has also directly influenced the standardisation process, with several issues raised leading to enhancements in a forthcoming new part for PDF/A.

Consortium

Members of the verPDF consortium are:

Open Preservation Foundation, PDF Association, Dual Lab, KEEP SOLUTIONS, Digital Preservation Coalition.

The veraPDF consortium’s unique partnership helps bridge a gap between cultural heritage organisations and industry with specialist expertise to create a product that meets the needs of both communities. veraPDF has been adopted by a wide range of memory institutions, digital preservation vendors, PDF industry vendors and beyond.

Post project

Funding from the PREFORMA project ended in August 2017. veraPDF is now sustained and maintained by the Open Preservation Foundation. Dual Lab provides active user support and carries out maintenance and bug fixes. The PDF Association’s PDF Validation Technical Working Group continues in their role resolving ambiguities that arise and helping industry to adopt veraPDF standards.


Scroll to top