Beyond TIFF and JPEG2000: PDF/A as an OAIS Submission Information Package Container

     PDF/A can be used as a file format, but it can also be used as OAIS SIP containers. The PDF/A open standards can "simplify digitization process, reduce digitization cost, improve production substantially and build more confidence for preservation and access." PDF/A can be used as an Archival Information Package container.

The three main goals of PDF/A are to:
  • provide a way to present the appearance of documents independent of the tools and systems used
  • provide a framework for recording the context and history of electronic documents in the metadata
  • define a framework for representing the logical structure of electronic documents within conforming files

A typical SIP may consist of a directory containing the following information"
  • Content: 
    • Preservation master files (such as TIFF images files). 
    • Access files (such as a PDF or JPG / JPG2000 files).
    • Other content (such as OCR data).
  • Preservation description: 
    • Preservation metadata in the TIFF header
    • Other structural and technical metadata
    • Checksum files.
  • Packaging information: 
    • Directory and File naming, structural metadata.
  • Descriptive information: 
    • Descriptive metadata saved in digital management system, catalog, or textual/XML files.
"The key requirement of PDF/A is that it is self-described and self-contained so that it can bereproduced exactly the same way with different software in various platforms." It will include all information needed to display the content in the PDF/A file (text, images, fonts, and color profiles).

Master file formats should be non-proprietary, open and documented international standards that are  commonly used. The files should be unencrypted, and should be uncompressed or else use lossless compression. The author of the article recommends using PDF/A as the preferred file format for text and image files, and possibly using it as an OAIS SIP container. The author shows how PDF/A is a better file format than the currently preferred TIFF or JPEG2000 formats.

There are several issues with PDF/A naming and implementation. The most critical need is reliable open source software for producing and validating PDF/A files.

