Digital Preservation Matters - 14 March 2008

The Diverse and Exploding Digital Universe. An Updated Forecast of Worldwide Information Growth Through 2011. John F. Gantz. IDG. Whitepaper. March 2008. [PDF]

The paper, created by IDG and sponsored by EMC, looks at the extent of the digital objects in the world. Their estimates put the 2007 digital resources at 281 exabytes. It is expected to increase tenfold by 2011. The amount created exceeded the total storage capacity available. Not everything is stored, and not everything needs to be kept. To deal with this volume of information, institutions must:

  • Include more than just IT in the processes. It is not just a technical problem
  • Develop policies for the creation, storage and retention of the material
  • Develop new tools and standards to handle large volumes of digital materials

A simple email with a 1MB attachment may take as much as 50MB of storage in all the replications and backups. The ‘digital shadow’ about a person is larger than the data they create themselves. The increased information is mostly visual in nature.

Long-term preservation costs - some figures. Available Online. Website. March 10, 2008.

The Archaeology Data Service has published revised costs for various digital preservation tasks. The service charges users a preservation cost when the items are ingested into the system: “asking researchers who work on fixed term projects to pay annual costs for storage is just not feasible.” The costs increase with the size and complexity of the data. For the example given of 2000 images, the estimated preservation cost would be about 3.5% of the total funds. The full charging policy is available at the Archaeology Data Service website.

Digital Library Federation Panel discusses Moving Image Preservation. Library of Congress. Website. March 2008.

The Library has made a presentation from Carl Fleischhauer available that looks at preserving moving images. It looks at encoding, wrappers, formats, metadata, and more. Some projects save video signal without compression which can be 70 – 100 GB per hour. Others use a lossless or lossy compression. The most frequent lossy compression used is MPEG-2, which is about 28GB per hour. The Material Exchange Format (MXF) is one wrapper that is being developed. The PB Core specification is one example of video metadata being worked on. The ending statement is: “work is under way but there is still plenty to do!” The slides and notes are available in PDF format at Video Formatting and Preservation DLF Presentation.

Sun and Fedora Introduce a Petabyte-scale Object Store. Fedora Commons News. February 20, 2008. [PDF]

A system with Fedora and the Sun storage system is described in a 6 page brochure. The integration was done with the help of researchers at Johns Hopkins University, and integrates the Fedora software with the storage system. Oxford University Services are in the process of deploying Fedora and Sun as part of a repository framework for all digital in their libraries. “…Sun is virtually guaranteeing that there will be no format obsolescence.

