Monday, June 29, 2015

SIRF: Self-contained Information Retention Format

SIRF: Self-contained Information Retention Format. Sam Fineberg,et al. SNIA Tutorial. 2015. [PDF]
Generating and collecting very large data sets that need to be kept for long periods is a necessity for many organizations, included sciences, archives, commerce. The presentation describes the challenges with keeping data long term with Linear Tape File System (LTFS) technology and a Self-contained Information Retention Format (SIRF). The top external factors driving long-term retention requirements are: Legal risk, compliance regulations, business risk, and security risk.

What does long-term mean? Retention of 20 years or more is required by 70% of the responses in a poll.
  • 100 years: 38.8%
  • 50-100 years: 18.3%
  • 21-50 years: 31.1%
  • 11-20 years: 15.7%
  • 7-10 years: 12.3%
  • 3-5 years: 1.9%
The need for digital preservation:
  • Regulatory compliance and legal issues
  • Emerging web services and applications
  • Many other fixed-content repositories (Scientific data, libraries, movies, music, etc.)
Data stored should remain accessible, undamaged, and usable for as long as desired and at an affordable cost. Affordable depends on the "perceived future value of information". There are problems with verifying the correctness and authenticity of semantic information over time. SIRF is the digital equivalent of a self contained archival box. It contains:
  • set of preservation objects and a catalog (logical or physical)
  • metadata about the contents and individual objects
  • self describing standard catalog information so it can all be maintained
  • a "magic object" that identifies the container and version
The metadata contains basic information that can vary depending on the preservation needs. It allows a deeper description of t he objects along with the content meaning and the relationship between the objects.

When preserving objects, we need to keep all the information to make them fully usable in the future. No single technology will be "usable over the time-spans mandated by current digital preservation needs". LTFS technologies are "good for perhaps 10-20 years".

No comments: