Friday, February 24, 2006

Weekly readings - 24 February 2006

The AIHT at Stanford University: Automated Preservation Assessment of Heterogeneous Digital Collections. Richard Anderson, et al. D-Lib Magazine. December 2005.

The Stanford Digital Repository is a set of services to provide to eventually preserve any digital content deemed institutionally significant or valuable. The early process focus on preserving a subset of the potential content. The group focused on tiered levels of service for digital preservation which they hoped would provide the flexibility to adapt to changing technologies to manage and preserve digital materials and their descriptions. The technical metadata seemed easier to create automatically but the contextual would be difficult. JHOVE was used for technical analysis and metadata creation. Sustainability and quality & functionality are the primary forces making any format "preferred”. Their preferred formats by type are:

Plain text: ASCII, UTF-8;
Marked-up text: XML 1.0;
Image: TIFF 5.0 and above (uncompressed);
Page-Viewer: PDF (any version);
Audio: WAVE (linear pulse code modulation); and
the Video format was yet to be determined.

“An automated assessment process is clearly the only efficient means to collect technical information about large numbers of files.” The knowledge will help to measure risk, to negotiate with depositors, and to make decisions about the long term preservation. In general, their experience showed that practical tests are valuable for understanding the needs and making informed decisions.


UKWAC: Building the UK's First Public Web Archive. Steve Bailey, Dave Thompson. D-Lib Magazine. December 2005.

We depend on the internet in many ways, but we pay little attention to the long-term preservation of websites. Invaluable scholarly, cultural and scientific resources are being lost to future generations. Six leading UK institutions are working on a project to test selective archiving of UK websites. The group has chosen a modified version of the PANDAS software, developed by the National Library of Australia. Their goals include:

· To work collaboratively in the achievement of a common searchable archive of selected web sites investigating solutions to issues such as, selection, rights management and digital preservation

· To evaluate the development of the collaborative infrastructure for web archiving with regards to assessing the permanence and long-term feasibility of such a collaborative enterprise

Archiving web sites follows the basic archival principles of Selection, Acquisition, Description and Access. Individual Partners in the group select web sites to be archived, but there are some additional steps to this process. The partners will check that someone else has not already selected a particular web site for archiving. When a new site is selected, basic metadata is entered into the central database; a group member then becomes responsible for that site's life cycle management. They seek explicit written permission to archive sites from site owners before archiving the site. There are difficulties and challenges in archiving web sites. “Web archiving is not an exact science.” In spite of the difficulties, this has been an important project for digital preservation. It has shown that selective web archiving can be done through using a consortium, it has highlighted the fragility of web-based materials while offering a workable solution.


Mind the gap: assessing digital preservation needs in the UK. Martin Waller, Robert Sharpe. Digital Preservation Coalition, 2006.

This 'state of the nation' report today shows that less than 20% of UK organizations surveyed have a strategy in place to deal with the risk of loss or degradation to their digital resources – even though there is a very high level of awareness of the risks and potential economic penalties. The survey shows that digital data loss is commonplace and seen by some as inevitable.

· Over 70% of respondents said data had been lost in their organization

· 87% recognized that key material could be lost and

· 60% said that their organization would lose financially

· 52% of the organizations said there was management commitment to digital preservation

· Only 18% had a digital preservation strategy in place

Some high profile instances include: The decision that Morgan Stanley must pay over $1 billion for failure to preserve and hand over some documents required by the courts; the 1975 Viking Lander mission data tapes have deteriorated despite careful storage, and scientists are unable to decode the formats used. The principal risks to digital material are:

· the deterioration of the storage medium;

· obsolescence of hardware, software or storage format;

· failure to save crucial document format information, such as preserving tables of numbers without preserving an explanation of their meaning.

The report identifies 18 core needs and recommendations to address those needs. The needs include:

· Increase awareness of digital preservation issues especially among data creators;

· Take stock of digital materials (55% of those surveyed did not know what digital material they had;

· Fund digital preservation aspects of projects from the beginning ;

· Increase funding for digital archives

“Gone are the days when archives were dusty places that could be forgotten until they were needed. The digital revolution means all of us – organisations and individuals – must regularly review and update resources to ensure they remain accessible. Updating need not be expensive, but the report is a wake-up call to each one of us to ensure proper and continuing attention to our digital records.”

It is important to create long-term pro-active preservation plans, and allocate adequate budget and resource to implementing practical solutions. “Organisations that create large volumes of digital information need to recognise the benefits of retaining long-term information in digital form so that these can be balanced against the costs of active preservation.”

No comments: