Friday, March 28, 2008

Digital Preservation Matters - 28 March 2008

Standards and Requirements for Digital Continuity in UK Government. Digital Continuity Project. UK National Archives. 14 March 2008. [PDF]

This is a draft of standards developed by the National Archives to help assess commercially available digital preservation solutions. This is their description and checklist of what a digital preservation system should do. The principle standards for digital continuity are defined in the OAIS model and the Trustworthy Repositories Audit & Certification Checklist. There is more information on their Digital Continuity project that is worth reading, as well as a brochure. It is estimated that 10 per cent of the Canadian Government’s electronic records are already unreadable. Some of their areas of work are:


A Possible Way Forward For Developing Cornell’s OAIS Infrastructure. Adam Smith. Blog. March 25, 2008.

A programmer looks at their long-term digital preservation project. In trying to create a system at Cornell, they originally used an object oriented approach. They encountered scaling issues that included both processing speed and memory usage. Addresses topics such as:

  • preserving “virtual” objects which serve to represent virtual relationships to other objects.
  • two broad sets of tasks in preservation processing before ingest, are
    1. normalizing the data
    2. gathering information to make a METS XML file
  • look at a functional paradigm instead of a object-oriented (OOP) paradigm
  • specifying collection specific tasks should be as declarative or configuration oriented as possible.


FACET: The Field Audio Collection Evaluation Tool. Mike Casey. Indiana University. 21 March 2008.

The Field Audio Collection Evaluation Tool (FACET) is an open-source tool to rank audio field collections based on preservation condition, including the level of deterioration they exhibit and the degree of risk they carry; to assess the characteristics, preservation problems, and deterioration of various tape formats. It includes the software, manual, format information, and worksheets.


On the Road With Fedora and Atos Origin in Paris. Carol Minton Morris. Fedora HatCheck Newsletter. March 12, 2008.

The Bibliotheque Nationale de France has contracted with an information technology services company to create a Fedora-based repository system. The library has chosen to use the OAIS model for the repository and the Fedora architecture.


Kofax® Wins $2.1 Million Contract with National Archives and Records Administration. Press Release. Business Wire. March 19, 2008.

Kofax will provide NARA’s Federal Records Centers with an enterprise level solution for capturing and processing documents. This is part of an initiative to “create and provide electronic records for preservation and use by the government and citizens”.


Web Curator Tool Project: 1.3.0 Released. Sourceforge. March 3, 2008.

The Web Curator is a tool to manage the web harvesting process. It was designed by the National Library of New Zealand the British Library. The tool supports the selection, harvesting and quality assessment of online information, either entire web sites or a portion. The workflow helps with the various tasks involved in the process, permissions, description, scope, and deposit. The latest version is now available for download.


Evaluating File Formats for Long Term Preservation. Judith Rog, Caroline van Wijk. National Library of the Netherlands. February 2008. [PDF]

Most documents deposited in the Koninklijke Bibliotheek have been in the PDF format. Because other formats need to be handled, the library has developed a quantifiable file format risk assessment method which can define strategies for specific file formats. At the time of the object’s creation, the file format can influence the long-term access. The method they developed has seven weighted criteria for file formats: Openness, Adoption, Complexity, Technical Protection Mechanism, Self-documentation, Robustness, Dependencies. They give recommendations but do not restrict deposits to specific file formats. One partner does not consider PDF/A suitable for archiving. Web archiving with different format types presents the biggest challenge.


Friday, March 21, 2008

Digital Preservation Matters - 21 March 2008

The Fifth Blackbird: Some Thoughts on Economically Sustainable Digital Preservation. Brian F. Lavoie. D-Lib Magazine. March 2008.
The article looks at digital preservation as an economically sustainable activity. This is an area that has had little progress. There has been little discussion or systematic analysis on how to make it last after the current funding ends. A task force and website has been created to examine those issues. While some contend that the answer is a Simple Matter Of Resources, he feels the principles are not known; “we have not yet established a systematic mapping between general economic models of resource provision and particular digital preservation contexts.” The task force will look at the issue for two years and hope to make the choice between different economic models a little clearer. It is unlikely that most institutions will be able to develop a preservation ability, but it is likely that a network of preservation repositories will emerge. “The ease with which we create information in digital form tends to obscure the true cost of maintaining it over long periods of time. It has almost become a truism to say that our capacity to produce digital materials far exceeds our capacity to maintain them over time.”

Rethinking Personal Digital Archiving, Part 1: Four Challenges from the Field. Catherine C. Marshall. D-Lib Magazine. March 2008.

Technical discussions about digital archiving are usually based on two assumptions:

1- preservation will rely on the ability to render digital objects in the future

2- trusted repositories will be used to store and exchange these digital objects

Will this really address the needs for consumers? Individuals rarely think that their own stuff needs preserving, or they may use various methods to put the objects in a ‘safe’ place. Benign neglect is the most common attitude. “Digital belongings are ultimately stored according to what people are planning to do with them….” The same characteristics that make digital assets attractive make the digital stewardship more difficult.

Rethinking Personal Digital Archiving, Part 2: Implications for Services, Applications, and Institutions. Catherine C. Marshall. D-Lib Magazine. March 2008.

Archiving services and applications must be able to assess value in a way that makes intuitive sense to individuals in the future. Digital assets are often in different locations, so creating a union catalog of the objects is an urgent first need. Many assume that digital stewardship is simply storing the data once and viewing it sometime in the future. Regular maintenance is needed, such as checking and refreshing media, migrating files to better formats, and virus and malware checks, etc. “It is more important to know what we have and where we've put it than it is to centralize all of our stuff into a single repository.”

IT is Not Responsible for Records Retention. Brian D. Jaffe eWeek Midmarket. March 10, 2008.

Deciding which records to keep on regulatory grounds and how to save everything in case of disaster are not compatible skill sets. A backup strategy is not the same as a record retention policy. IT should be responsible for backup, but not the content, which would mean they would need to know the content of the documents, how long they should be kept, and what to do with them at the end of that period. A retention policy is also not a recovery mechanism. So there is a need for both. Saving everything is not an option technologically nor legally. IT is responsible for keeping the data safe. The users are the owners of the data and should decide what to do with it. A written policy is needed to know the different roles and requirements. “Data is the most valuable asset that IT is responsible for, but it is a responsibility that can't be borne by IT alone.”

Do you really know where your e-mail slept last night? E-Mail Compliance – What does it mean? Andy Whitaker. IT Security. March 4, 2008.

It is important to verify the integrity of the organization’s communication and provide auditability, especially with data protection laws. With the increased emphasis on electronic records, it is important to know what information is captured, if there have been alterations, if the data can be retrieved, and if you can show who has access to the archives.

Friday, March 14, 2008

Digital Preservation Matters - 14 March 2008

The Diverse and Exploding Digital Universe. An Updated Forecast of Worldwide Information Growth Through 2011. John F. Gantz. IDG. Whitepaper. March 2008. [PDF]

The paper, created by IDG and sponsored by EMC, looks at the extent of the digital objects in the world. Their estimates put the 2007 digital resources at 281 exabytes. It is expected to increase tenfold by 2011. The amount created exceeded the total storage capacity available. Not everything is stored, and not everything needs to be kept. To deal with this volume of information, institutions must:

  • Include more than just IT in the processes. It is not just a technical problem
  • Develop policies for the creation, storage and retention of the material
  • Develop new tools and standards to handle large volumes of digital materials

A simple email with a 1MB attachment may take as much as 50MB of storage in all the replications and backups. The ‘digital shadow’ about a person is larger than the data they create themselves. The increased information is mostly visual in nature.

Long-term preservation costs - some figures. Available Online. Website. March 10, 2008.

The Archaeology Data Service has published revised costs for various digital preservation tasks. The service charges users a preservation cost when the items are ingested into the system: “asking researchers who work on fixed term projects to pay annual costs for storage is just not feasible.” The costs increase with the size and complexity of the data. For the example given of 2000 images, the estimated preservation cost would be about 3.5% of the total funds. The full charging policy is available at the Archaeology Data Service website.

Digital Library Federation Panel discusses Moving Image Preservation. Library of Congress. Website. March 2008.

The Library has made a presentation from Carl Fleischhauer available that looks at preserving moving images. It looks at encoding, wrappers, formats, metadata, and more. Some projects save video signal without compression which can be 70 – 100 GB per hour. Others use a lossless or lossy compression. The most frequent lossy compression used is MPEG-2, which is about 28GB per hour. The Material Exchange Format (MXF) is one wrapper that is being developed. The PB Core specification is one example of video metadata being worked on. The ending statement is: “work is under way but there is still plenty to do!” The slides and notes are available in PDF format at Video Formatting and Preservation DLF Presentation.

Sun and Fedora Introduce a Petabyte-scale Object Store. Fedora Commons News. February 20, 2008. [PDF]

A system with Fedora and the Sun storage system is described in a 6 page brochure. The integration was done with the help of researchers at Johns Hopkins University, and integrates the Fedora software with the storage system. Oxford University Services are in the process of deploying Fedora and Sun as part of a repository framework for all digital in their libraries. “…Sun is virtually guaranteeing that there will be no format obsolescence.

Friday, March 07, 2008

Digital Preservation Matters - 07 March 2008

National Archives Chooses Digital Vision to Automate Film and Video Restoration. Press Release. March 5, 2008.

NARA has selected Digital Vision to digitize and restore some of their 700,000 titles, in order to preserve them and make them available for public access. The solution, which requires minimal operator intervention, includes high-speed, 4K and 2K scanning and telecine systems. Both the film and video footage will be color corrected, along with sound normalizing and scratch/dust removal.

Scoping study for a registry of electronic journals that indicates where they are archived. JISC. 14 January 2008. [pdf]

A study to determine the scope and feasibility of a registry for archived e-journals. There is no single view of what constitutes a registry. Many feel it should be a place where the information is gathered, audited, then made available to local databases. Some ask why they are paying for the services when others also benefit without paying. A registry should contain the information on where an item is and how to access it. The study identified ten basic characteristics of digital preservation repositories:

  1. The repository commits to ongoing maintenance of digital objects for their communities.
  2. Demonstrates organizational ability to fulfill its commitment.
  3. Acquires and maintains contractual and legal rights and fulfills responsibilities.
  4. Has an effective and efficient policy framework.
  5. Acquires / ingests digital objects based on criteria that fit its commitments and capabilities.
  6. Maintains/ensures the integrity, authenticity and usability of digital objects over time.
  7. Creates and maintains preservation metadata about creation, maintenance, and actions taken.
  8. Fulfills dissemination requirements.
  9. Has a strategic program for preservation planning and action.
  10. Has technical infrastructure to adequately maintenance its digital objects

New Digital Preservation Newsletter from the Library of Congress. Press Release. March 3, 2008.

The Library of Congress has started a digital preservation newsletter. The March edition includes information on state partnerships, digital video reformatting, preservation tips, and NDIIPP information.

Publishers Phase Out Piracy Protection on Audio Books. Brad Stone. The New York Times. March 3, 2008.

Some of the largest publishers are removing the Digital Rights Management protections on audio books. This will allow the files to be copied to different devices and will allow retailers to sell content that will work on all digital devices. Random House was the first to move away from DRM and others appear to be following.