Friday, September 22, 2017

Electronic Records Task Force Phase 2 Final Report

Electronic Records Task Force Phase 2 Final Report. John Butler, et al. University of Minnesota. August 23, 2017. [PDF, 68 pp.]
     The University of Minnesota Libraries sponsored an Electronic Records Task Force to monitor established workflows and to develop new workflows, policies, procedures and mechanisms for processing and providing access to electronic records. They are focused on the development of processing activities, best practices and guidelines. Creating finding aids, which are published online through ArchivesSpace, are the first step in providing access to electronic records. The long-term preservation of electronic records is a concern and this effort continues to be a work-in-progress. To keep up with the influx of electronic records, the Electronic Records Task Force provides the following recommendations:
  1. Staffing: Hire a permanent full time employee to work exclusively with electronic records
  2. Long-term Management: Create an Electronic Records Management Group to address ongoing electronic records needs
  3. Preservation: Review current workflows and long-term management requirements to address immediate and long-term solutions for file backup, recovery, and preservation according to policies and standards
  4. Security: Conduct a thorough review of security requirements
  5. Equipment: Establish initial and ongoing financial support for hardware, software and collections
  6. Access to Materials: Explore options for providing access to electronic records, including both access and preservation of these materials.
Project Tasks and Deliverables
  1. Develop Workflows for Processing Ingested Collections
  2. Define Processing Levels (minimal, intermediate, full)
  3. Develop Access Methods that Address End-user Needs, Copyright, Data Privacy and other Information Security Requirements
  4. Monitor Ingest Workflows and adjust as necessary
Additional notes:
  • "In the long-term, a full-time dedicated staff person is the most responsible approach to working effectively and efficiently, to achieve quality work, and to maintain our leadership role in the field of electronic records management. This is arguably the only way to address the ingest and processing activities that assist with long-term access to and preservation of electronic materials. Without a dedicated person who has an in-depth understanding of evolving workflows and protocols and who can provide a consistent approach with curatorial staff, any headway in addressing the records being collected will be made slowly."
  • The goal of processing unique electronic archival material is to make it available to end users, whether they be skilled researchers or a high school student working on a project.
  • Given divergent requirements, a singular asset management, backup, and preservation solution may not be a feasible goal in either the near or long term. However, efforts can be made to establish a limited number of processes to manage the vast majority of preservation use cases.

Friday, September 15, 2017

Preservation with PDF/A

Preservation with PDF/A (2nd Edition). Betsy A Fanning. DPC Technology Watch Report 17-01. July 2017. [PDF 34pp.]  [Link updated]
     This report is an updated edition of the original Technology Watch Report 08-02, Preserving the Data Explosion: Using PDF (Fanning,2008). It looks at PDF/Archive as digital document file format for long-term preservation. The PDF/A versions of the PDF format have been developed as a family of open ISO Standards to address preservation of PDF files by removing features that pose preservation risks. It is important for preservation purposes to know how closely a file conforms to the  requirements defined in the standard. There are preservation risks that may exist in the standard PDF file format:
  • any file type can be embedded;
  • the primary document can be conformant as a static document, but the embedded files may not be static;
  • embedded files may be infected by computer viruses;
  • embedded files may have extended metadata requirements, may introduce unexpected dependencies or be subject to format obsolescence;
  • embedded files may complicate matters relating to information security, data protection or the management of intellectual property rights.
By restricting some risk features and thus reducing preservation risks, the PDF/A format seeks to maximize:
  • device independence;
  • self-containment;
  • self-documentation.
Some reasons why an organization might use PDF/A to preserve their digital documents, include:
  • its standardized format for storing digital documents for long periods of time;
  • it allows for digitally signed documents using the very latest digital signature software;
  • it reliably displays special characters for mathematics and languages since all are embedded within the file;
  • it displays correctly on any device as the author intended, including the reading order;
  • platform independence;
  • provision of fully searchable documents through Optical Character Recognition.
History and Features of PDF and PDF/A. The Standard was drafted in multiple in order to make it easier to implement the Standard. "Unfortunately, the committee’s philosophy of multiple parts resulted in confusion in the market place, making it more difficult for users to select the optimum file format." Users  may need to do a file format assessment based on their requirements that can help them decide which PDF/A Standard to implement.

Metadata helps effectively manage a file throughout its life cycle, as well assist in document discovery searches. "Establishing a long-term digital document preservation system requires careful consideration of the metadata that will be needed to locate and render documents years from now." Collecting metadata for the PDF/A documents in optional in the standard, except for the identifier, which is generated when the PDF/A file is created. Preservation metadata should:
  • be appropriate to the materials;
  • support interoperability;
  • use standardized controlled vocabulary;
  • include clear statements on the conditions and terms of use;
  • be authoritative and verifiable;
  • support the long-term management of the document.
Just because a file purports to be a PDF/A does not necessarily mean that it is. Format validation of a file can increase confidence a viewer will be able to render the file correctly.  A number of PDF/A validators are available.The development work on the PDF Standards is a continuing effort. There are additional preservation challenges in the format that are in the process of being addressed.

The report lists some recommendations, which are directed at groups that use the standard. They include:
  • For those evaluating PDF/A as a digital preservation solution:
    • Before adopting PDF/A as a preservation solution it is "essential to understand the organizational requirements and how PDF/A will support" the organization needs.
    • PDF/A is not a preservation solution on its own a part of the wider preservation strategy that must be consistent with other components of the preservation infrastructure, such as backups, integrity checks and documentation.
    • Different versions of PDF/A have different purposes, with different capabilities as well as different preservation risks. These should be understood and decisions should be documented and explained.
    • Different vendors offer different tools to manage PDF/A that should be compared against your requirements..
  • For organizations collecting and preserving digital data:
  • While it may not be possible to control or restrict how documents are produced, it may be useful to give document creators guidance on what is desired.
  • Embed PDF/A validation tools into preservation workflows and record the results to help manage the digital preservation risks associated with PDF/A files received.

Wednesday, September 13, 2017

Self-preservation: The Gibraltar National Archives uses cloud to safeguard its history

Self-preservation: The Gibraltar National Archives uses cloud to safeguard its history. Caroline Donnelly. ComputerWeekly. 13 September 2017.
     Many enterprises are familiar with the concept of retaining corporate data as part of their regulatory and compliance obligations. But some fail to understand that the data must be kept accessible. "While regulatory compliance is the key reason why many enterprises embark on this process in the corporate world, for the Gibraltar National Archives (GNA), digital preservation is an essential part of ensuring the annals of its cultural heritage and democratic history are safeguarded forever." After a long process of digitizing historical content, they realized that digitising content is not the same as preserving it. "The risk was we could have spent all this time and money doing digitisation only to lose [this information] a few years down the line because it is not preserved correctly.” Digital preservation is about:
  • actively managing the file formats
  • ensuring they remain readable in future
  • being proactive and managing the content
Just as it is important to be able to prove the provenance of physical records, the fixity of the digital documents needs to be maintained.  “People often ask me when our digital preservation project will be finished. I tell them never, because every day we are collecting records. Every day we are archiving unique material from newspapers to government records all for generations to come.”