Monday, August 10, 2015

Why PDF/A validation matters, even if you don’t have PDF/A

Why PDF/A validation matters, even if you don’t have PDF/A. Johan van der Knijff.  KB Research, National Library of the Netherlands. July 7, 2015.
     The PDF format has a number of features that don’t fit with the aims of long-term preservation and accessibility, such as encryption, password protection, external fonts and reliance on external software. Some examples are PDFs that use Quicktime content. Acrobat cannot render this format natively, and relies on an external player. Also files that use Linux fonts, or files with 3D content.
Institutions may want to check their PDF files to similar examples. Reasons for doing this include:
  • Check compliance with institutional policy (e.g. do not accept PDFs with passwords)
  • Check collections for preservation risks (e.g. embedded multimedia content)
There are some useful software tools are available, such as:
  • qpdf gives detailed information about encryption and password protection
  • pdffonts tool that is part of xpdf is useful for checking whether fonts in a PDF are embedded
  • The professional version of Adobe Acrobat has a PDF/A validator built into its Preflight tool
  • PDF/A validator that is part of the open-source Apache PDFBox library
  • VeraPDF has the potential to develop into a full-fledged PDF validator
"The PDF/A standards are nothing more than a set of profiles that impose some restrictions on a PDF, ruling out features that are not well-suited to long-term accessibility."  These features are encryption, non-embedded fonts, multimedia content, and so on. Several tools exist that compare a PDF against PDF/A and report any deviations. These PDF/A validators are typically used to verify PDF/A files but can also be used to detect user-specified risky features in regular PDFs. It is possible to automatically evaluate PDFs against a user-defined set of features. But it is important to check the file because a PDF may satisfy all requirements of PDF/A, and still be broken.

Related posts:

No comments: