Friday, September 04, 2015

Preserving Documents Forever: When is a PDF not a PDF?

Preserving Documents Forever: When is a PDF not a PDF?  Digital Preservation Coalition. July 15, 2015.
     This was a briefing day on preserving PDF at Oxford University. Presentations include:
  • An introduction to PDF, Sarah Higgins, Aberystwyth University
    • Portable Document Format (PDF)
    • Developed to enable document sharing across platforms while retaining “look and feel”
    • Originally a proprietary format - Adobe Systems
    • Specification available free of charge from 1993
    • Became an open standard in 2008 ISO 32000-1:2008 (PDF 1.7) 
    • Many flavors, PDF/A, PDF/X, PDF/E, PDF/VT, PDF/UA
    • PDF/A is a sub-set for the Long Term Preservation of multi-media page documents that may contain a mixture of text, raster images and vector graphics. Self contained, robust, predictable, no encryption, no interactivity, limited color space
    • Flavours of PDF/A: PDF/A-1, PDF/A-2, PDF/A-3 and different levels of conformance
    • A Document is not the same as a Record, which is
      • Authentic
      • Reliable
      • Has integrity
      • Usable
  • Understanding PDF risks in preservation, Johan van der Knijff, National Library of the Netherlands 
  • PDF: Myths vs facts, Ange Albertini, Corkami
    • Graphical fact sheet about PDF. Shows the structure
    • Many myths about PDF
    • Many possible malformations handled specifically by each reader
    • It’s a complex patchwork!
    • PDF is very useful, but it has many issues of all kinds to deal with. It is far from perfect. 
    • What if Adobe stopped supporting PDF (like Flash) and we were just left with the specs?
  • Preserving PDF at the coalface, Tim Evans, Archaeology Data Service
    • PDF to PDF/A 1B conversions were problematic
    • A lot of the PDF problems can be fixed with manual intervention 
    • Used PDF/A Manager created by PDFTron for batch processing, automated fix-ups, with 80% success rate.
    • Still using a mixture of PDFTron and Preflight
    • Concern over incoming PDFs checked by DROID showing false positives
    • Third party tools: CutePDF, OmniPageCapture SDK, Nitro PDF, PDFCreator, Acrobat PDFMakerfor Word (v.8)
    • Currently Practice: Use of PDF/A 1 and PDF/A 2; adopt a best fit of the two. 
    • Use Callas PDF Toolbox
    • Now tied to a mixed economy of softwares and tools (some free, some commercial) to ensure consistent and accurate creation and validation.
  • Introducing veraPDF, Carl Wilson, Open Preservation Foundation
    • veraPDF is a project, a consortium, and a software product
    • Plan to produce a conformance checker 
    • Keep up with Developments on Github: https://github.com/verapdf 
 Related posts:

No comments: