This report is an updated edition of the original Technology Watch Report 08-02, Preserving the Data Explosion: Using PDF (Fanning,2008). It looks at PDF/Archive as digital document file format for long-term preservation. The PDF/A versions of the PDF format have been developed as a family of open ISO Standards to address preservation of PDF files by removing features that pose preservation risks. It is important for preservation purposes to know how closely a file conforms to the requirements defined in the standard. There are preservation risks that may exist in the standard PDF file format:
- any file type can be embedded;
- the primary document can be conformant as a static document, but the embedded files may not be static;
- embedded files may be infected by computer viruses;
- embedded files may have extended metadata requirements, may introduce unexpected dependencies or be subject to format obsolescence;
- embedded files may complicate matters relating to information security, data protection or the management of intellectual property rights.
- device independence;
- self-containment;
- self-documentation.
- its standardized format for storing digital documents for long periods of time;
- it allows for digitally signed documents using the very latest digital signature software;
- it reliably displays special characters for mathematics and languages since all are embedded within the file;
- it displays correctly on any device as the author intended, including the reading order;
- platform independence;
- provision of fully searchable documents through Optical Character Recognition.
Metadata helps effectively manage a file throughout its life cycle, as well assist in document discovery searches. "Establishing a long-term digital document preservation system requires careful consideration of the metadata that will be needed to locate and render documents years from now." Collecting metadata for the PDF/A documents in optional in the standard, except for the identifier, which is generated when the PDF/A file is created. Preservation metadata should:
- be appropriate to the materials;
- support interoperability;
- use standardized controlled vocabulary;
- include clear statements on the conditions and terms of use;
- be authoritative and verifiable;
- support the long-term management of the document.
The report lists some recommendations, which are directed at groups that use the standard. They include:
- For those evaluating PDF/A as a digital preservation solution:
- Before adopting PDF/A as a preservation solution it is "essential to understand the organizational requirements and how PDF/A will support" the organization needs.
- PDF/A is not a preservation solution on its own a part of the wider preservation strategy that must be consistent with other components of the preservation infrastructure, such as backups, integrity checks and documentation.
- Different versions of PDF/A have different purposes, with different capabilities as well as different preservation risks. These should be understood and decisions should be documented and explained.
- Different vendors offer different tools to manage PDF/A that should be compared against your requirements..
- For organizations collecting and preserving digital data:
- While it may not be possible to control or restrict how documents are produced, it may be useful to give document creators guidance on what is desired.
- Embed PDF/A validation tools into preservation workflows and record the results to help manage the digital preservation risks associated with PDF/A files received.
No comments:
Post a Comment