Friday, February 05, 2010

Digital Preservation Matters - January 29, 2010

Preserving Born-Digital Legal Materials…Where to Start? Sarah Rhodes . VoxPopuLII. Cornell Law School. January 10, 2010.

Article well worth reading. We have already all heard the arguments for investing in digital preservation, how the digital world is ubiquitous, how ephemeral digital data is. “There is no denying the urgent need for libraries to take on the task of preserving our digital heritage.” Law libraries have a critically important role to play in preservation. The digital preservation field will always be in a constant state of change because technology is always changing. But there has been progress in creating tools, services, and best practices.

In the law libraries that they looked at, less than 7% of the digital preservation projects involved preserving born-digital materials. The remaining 93% involved preserving digital files created by digitizing physical originals. But those who responded to the survey said, by a margin of 2 to 1, that born-digital materials were in more urgent need of preservation than print materials. This may be a problem of not knowing where to start. In selecting materials for the digital archive, each library should each library establish digital archive selection priorities based on the unique institutional mandates and the research needs of its users. There is a strong case for preserving materials that may be redundant. In addition, academic law libraries should take responsibility for preserving digital content cited within their institutions’ law reviews to ensure that future researchers will able to reference source materials, and possibly the law reviews themselves.

The article looks at standards and systems. “Digital preservation represents an opportunity in the digital age for law libraries to reclaim their traditional roles as stewards of information, and to ensure that our digital legal heritage will be available to legal scholars and the public well into the future.”


File Formats for Preservation. Malcolm Todd. DPC Technology Watch Report. December 2009.

Preserving intellectual content requires a firm grasp of the file formats used to create, store and disseminate it. Five main criteria for file format selection:

  1. adoption: the extent to which use of a format is widespread
  2. technological dependencies: whether a format depends on other technologies
  3. disclosure: whether file format specifications are in the public domain
  4. transparency: how readily a file can be identified and its contents checked
  5. metadata support: whether metadata is provided within the format

These criteria should be used as a tool to create a clear preservation strategy appropriate to the repository. This must be done to manage the risk of obsolescence for the materials.


Google Docs to allow storage of any type of file. Juan Carlos Perez. Computerworld. January 12, 2010.

Google is opening up its Docs hosted office suite so that users can store any type of file in it. It doesn’t mean that they can be worked on or edited online. File size limit has been increased to 250 MB. This is in addition to the G-drive online storage service The G-drive is alive! 1GB free, $.25/GB/year after that.


Release of Web Curator Tool (WCT) version 1.5. Sourceforge. 2009.

The Web Curator Tool is an open-source workflow management application for archiving web sites. It is designed for use in libraries; it supports:

  • Harvest Authorization: obtaining permission to harvest and make publicly accessible;
  • Selection, scoping and scheduling: deciding what to harvest, how, and when;
  • Description: adding basic Dublin Core metadata;
  • Harvesting: downloading the selected material from the internet;
  • Quality Review: ensuring the harvested material is of sufficient quality for archival purposes;
  • Archiving: submitting the harvest results to a digital archive.

It was designed by the National Library of New Zealand, the British Library, and others. It is integrated with the Heritrix web crawler and supports job scheduling and the collection of descriptive metadata.

No comments: