Friday, May 16, 2008

Digital Preservation Matters - May16, 2008

Keeping Research Data Safe: a cost model and guidance for UK Universities. Neil Beagrie, et al. JISC. 12 May 2008. [169 p. PDF]. Executive Summary.

Digital research raises issues relating to access, curation and preservation. Fund institutions are now

requiring researchers to submit plans for data management or preservation. The extremely detailed study includes a framework for determining costs variables, a cost model, and case studies. The service requirements for data collections will be more complex than many have thought previously. Accessioning and ingest costs were higher than ongoing long-term preservation and archiving cost:

1. Acquisition and Ingest .................... ca. 42%
2. Archival Storage & Preservation ...... ca. 23%
3. Access ............................................ ca. 35%

Ten years of data from the Archaeology Data Service show relatively high costs in the early years after acquisition but costs decline to a minimal level over 20 years. Decline of data storage costs, costs for ongoing actions such as file format migrations, and others, provide economies of scale.

Some significant issues for archives and preservation costs include:

  • Timing: Costs vary depending on when actions are taken. Costs for initially creating metadata for 1000 records is about 300 euros. Fixing bad metadata after 10 years may cost 10,000 euros.
  • Efficiency: The start-up costs can be substantial. The operational phases are more productive and efficient as procedures become established, refined, and the volume increases.
  • Economy of scale: Increased volume has an impact on the unit costs for digital preservation. One example is that a 600% increase in accessions only increases costs by 325%.

“While the costs of maintaining digital preservation capacity are not insignificant, the costs of the alternative are often greater.” They consider three staff essential to establish a repository:

  1. Archive Manager: co-ordinate activities;
  2. System Administrator: (half time) to install and manage hardware and software;
  3. Collections Officer: develop and implement appropriate workflow and standards

Tasks for the digital preservation planner include: Implementing a lifecycle management approach to digital materials, continuously assessing collections, their long-term value and formats, and making recommendations for action needed to ensure long-term usability. Also:

  • audit the Library’s digital assets, evaluating their volume, formats, and state of risk.
  • research into preservation methodologies.
  • ensure that preservation actions are carried out on digital assets at risk of loss by
  • formulate and publicize advice to data creators

“A data audit exercise is needed at the outset of scoping a digital archive. This will identify collections and their relative importance to the institution and wider community.”

Also, a library should consider federated structures for local data storage, “comprising data stores at the departmental level and additional storage and services at the institutional level. These should be mixed with external shared services or national provision as required.” The hierarchy should reflect the content, the services required, and the importance of the data.

The real cost of archiving results data roughly drops by 25% as new methods and media become available. The cost of migrations is extremely high. Raw data preservation costs per sample.

1970-1990 Paper records £30.00

1989-1996 Magnetic tapes £21.95

1990-2000 Floppy disks £ 7.25

1997-2003 Compact Discs £ 6.00

2000-present Computer disks £ 2.15

“A data preservation strategy is expected to form part of the university’s overall information strategy.” Start-up costs are higher for the early phases, especially for developing the first tools, standards and best practices.

Library of Congress Digital Preservation Newsletter. Library of Congress. May 2008. [PDF]

There are a number of items in the newsletter of interest, including:

  • LC creates and supports the development of some key open standards for digital content, such as
  1. Office Open XML. These estimate that over 400 million people use the different versions of the Microsoft Office programs. This new standard supports all the features of the various versions of Microsoft Office since 1997. Microsoft has released the specifications of its earlier binary formats and asked the Library of Congress to hold copies.
  2. PDF /A, which is a subset of the PDF format, suitable for preservation.
  • The Data Preservation Alliance for the Social Sciences website is a partnership to identify, acquire and preserve data which is at risk of being lost to social science research.
  • The MetaArchive Cooperative is participating in the NDIIPP digital preservation network. They have added an international member to the participants. The site provides documentation and information for private LOCKSS networks and a “Guide to Distributed Digital Preservation.”

The 29 fakes behind a rewriting of history. Paul Lewis. The Guardian. May 5, 2008.

The article emphasizes the importance and need for archive security and object authentication and verification. It is not just a problem for digital objects. Several books had been written based on forged documents planted in the UK National Archives. The author of the books used 29 documents in 12 separate files to write books on historical events; he is the only person to have checked out the files. An investigation resulted uncovered the fake documents; the Archives takes a serious view of anything that compromises the integrity of the information and the archive.

No comments: