Friday, May 16, 2008

Digital Preservation Matters - May16, 2008

Keeping Research Data Safe: a cost model and guidance for UK Universities. Neil Beagrie, et al. JISC. 12 May 2008. [169 p. PDF]. Executive Summary.

Digital research raises issues relating to access, curation and preservation. Fund institutions are now

requiring researchers to submit plans for data management or preservation. The extremely detailed study includes a framework for determining costs variables, a cost model, and case studies. The service requirements for data collections will be more complex than many have thought previously. Accessioning and ingest costs were higher than ongoing long-term preservation and archiving cost:

1. Acquisition and Ingest .................... ca. 42%
2. Archival Storage & Preservation ...... ca. 23%
3. Access ............................................ ca. 35%

Ten years of data from the Archaeology Data Service show relatively high costs in the early years after acquisition but costs decline to a minimal level over 20 years. Decline of data storage costs, costs for ongoing actions such as file format migrations, and others, provide economies of scale.

Some significant issues for archives and preservation costs include:

  • Timing: Costs vary depending on when actions are taken. Costs for initially creating metadata for 1000 records is about 300 euros. Fixing bad metadata after 10 years may cost 10,000 euros.
  • Efficiency: The start-up costs can be substantial. The operational phases are more productive and efficient as procedures become established, refined, and the volume increases.
  • Economy of scale: Increased volume has an impact on the unit costs for digital preservation. One example is that a 600% increase in accessions only increases costs by 325%.

“While the costs of maintaining digital preservation capacity are not insignificant, the costs of the alternative are often greater.” They consider three staff essential to establish a repository:

  1. Archive Manager: co-ordinate activities;
  2. System Administrator: (half time) to install and manage hardware and software;
  3. Collections Officer: develop and implement appropriate workflow and standards

Tasks for the digital preservation planner include: Implementing a lifecycle management approach to digital materials, continuously assessing collections, their long-term value and formats, and making recommendations for action needed to ensure long-term usability. Also:

  • audit the Library’s digital assets, evaluating their volume, formats, and state of risk.
  • research into preservation methodologies.
  • ensure that preservation actions are carried out on digital assets at risk of loss by
  • formulate and publicize advice to data creators

“A data audit exercise is needed at the outset of scoping a digital archive. This will identify collections and their relative importance to the institution and wider community.”

Also, a library should consider federated structures for local data storage, “comprising data stores at the departmental level and additional storage and services at the institutional level. These should be mixed with external shared services or national provision as required.” The hierarchy should reflect the content, the services required, and the importance of the data.

The real cost of archiving results data roughly drops by 25% as new methods and media become available. The cost of migrations is extremely high. Raw data preservation costs per sample.

1970-1990 Paper records £30.00

1989-1996 Magnetic tapes £21.95

1990-2000 Floppy disks £ 7.25

1997-2003 Compact Discs £ 6.00

2000-present Computer disks £ 2.15

“A data preservation strategy is expected to form part of the university’s overall information strategy.” Start-up costs are higher for the early phases, especially for developing the first tools, standards and best practices.

Library of Congress Digital Preservation Newsletter. Library of Congress. May 2008. [PDF]

There are a number of items in the newsletter of interest, including:

  • LC creates and supports the development of some key open standards for digital content, such as
  1. Office Open XML. These estimate that over 400 million people use the different versions of the Microsoft Office programs. This new standard supports all the features of the various versions of Microsoft Office since 1997. Microsoft has released the specifications of its earlier binary formats and asked the Library of Congress to hold copies.
  2. PDF /A, which is a subset of the PDF format, suitable for preservation.
  • The Data Preservation Alliance for the Social Sciences website is a partnership to identify, acquire and preserve data which is at risk of being lost to social science research.
  • The MetaArchive Cooperative is participating in the NDIIPP digital preservation network. They have added an international member to the participants. The site provides documentation and information for private LOCKSS networks and a “Guide to Distributed Digital Preservation.”

The 29 fakes behind a rewriting of history. Paul Lewis. The Guardian. May 5, 2008.

The article emphasizes the importance and need for archive security and object authentication and verification. It is not just a problem for digital objects. Several books had been written based on forged documents planted in the UK National Archives. The author of the books used 29 documents in 12 separate files to write books on historical events; he is the only person to have checked out the files. An investigation resulted uncovered the fake documents; the Archives takes a serious view of anything that compromises the integrity of the information and the archive.

Friday, May 09, 2008

Digital Preservation Matters - May 9, 2008

Bill targets messy e-records. Ben Bain. Federal Computer Week. May 5, 2008.

The proposed Electronic Communications Preservation Act will require federal agencies to preserve electronic communications in an electronic format. This should preserve e-mail and other messages that would leave gaps in the historical record. NARA would be in charge of this. The existing policies do not require the records be preserved in the native format, nor the authority to enforce the policies during an administration. “The loss of documents through indifference should be viewed with as much alarm as their loss through a system breach.” Having preservation standards to capture, manage, preserve and retrieve electronic communications “could save money in the long run.”

Innovations in Digital Asset Management, Circa 2008. Joseph Bachana. CMS Watch. May 7, 2008.

Overview of the development of the digital asset management market. Digital asset management vendors have also been working to add web services and improve the product. The market wants to integrate digital asset management systems into other management systems, such as web content, customer relationship, editorial workflow, e-learning platforms, marketing resource systems, and others. Most of the vendors in this area are small and have limited resources, or are divisions of larger companies, and can’t address all these areas. Some main products have included eXtensible Metadata Platform, Version Cue, SharePoint. A few of the current trends include:

  • single asset management repository
  • XML content that can be republished, reformatted, or changed into different assets
  • Open source, but the digital asset management development isn’t significant yet

“What is new, from my vantage point: no single DAM vendor has the resources to address all of these varied needs in a timely fashion. In the end, it may take the harnessed fervor of the open source community to bring all these threads and more together in the marketplace over the next 24 to 36 months.”

Demand for Collaboration Driving $330 Million Digital Asset Management Markets. Nicole Fabris. Reuters. May 8, 2008.

The market for Digital Asset Management solutions was more that $330 million in 2007 and is growing. The sheer number of digital assets helps drive this, but also “the need for different departments of an organization to have a seamless workflow in handling the same library of digital content.” The digital environments are no longer a separate silo. The report by ABI Research Report shows that the vendor landscape is fragmented with different types of products and in different sectors.

EThOS: a national OAI and digitisation service for e-theses in the United Kingdom. Chris Awre, et al. Open Repositories. April 2008.

The ETHOS project in the UK is an effort to promote a national repository and the digitization and sharing of theses and dissertations. Just putting items in the repository doesn’t mean that anyone uses it or even harvests the metadata. Promoting a repository increases its use. A toolkit is being created at It helps to outline the culture change needed, the business requirements for the institutional repository, technical options, and training for staff.

Open access does not mean the same thing as free access. There are still costs. The libraries who contribute are charged for digitization, researchers are charged for the added value of the digital file. They now have theses that are moving beyond PDF and going to multi-media theses. The sustainability of these files requires the efforts of the community. The British Library provides e-theses and there are some concerns about whether or not they have the right to put them on the internet. They have decided it is more important to make them available and will take them down for copyright problems if there is an actual take-down request from the owner.

DLESE: A Case Study in Sustainability Planning. Mary Marlino, Tamara Sumner, Karon Kelly and Michael Wright. Open Repositories. April 2008.

There are currently no models for sustainability planning. Need to define the core library components and what needs to be sustained. What happens when the project ends? How to prepare for it? This is important in a partnership. Determine before hand the system administration, the application support, the content processes, workflow, and maintenance. Sustainability planning should be started at the beginning of the project. Distributed infrastructure doesn’t often have sustainability in mind, and in this structure it is critical to develop a disciplined model for cost. Sustaining the community is the most important and difficult aspect of library sustainability-user base, culture, embedded expertise. A definition of sustainability is: meeting the needs of the present without compromising the ability of future generations to meet their own needs (wikipedia).

Building Personal Collections and Networks of Digital Objects in a Fedora Repository Using VUE. Anoop Kumar. Open Repositories. April 2008.

Today’s technology makes it feasible to setup repositories for large institutions and also smaller groups and individual.
This utility was created to be a publishing module to create content maps in a Fedora repository and build a knowledge base of objects, the metadata, and the relationships between them. Anyone can create the objects and upload resources; owners can modify their objects and anyone can add or modify the relationships. VUE (Visual Understanding Environment) can be used with other repositories and allows search, browse and deposit.

Friday, May 02, 2008

Digital Preservation Matters - May 2, 2008

Alternative File Formats for Storing Master Images of Digitisation Projects. Robèrt Gillesse, et al. National Library of the Netherlands. March 7, 2008. [PDF]

This is the result of a study of alternative formats for storing master files of digitization projects. There are so many digitization projects occurring that there needs to be a revision of the current format strategy. Currently, most institutions world wide store master image files in uncompressed TIFF file format. If there are 40 million image files in the current projects, that would mean 650 TB of storage space will be necessary to store in this format. The objective of the study was to describe alternative file formats in order to reduce the necessary storage space; these were Jpeg 2000, PNG, JFIF, and Tiff. The desired image quality, long-term sustainability and functionality had to be taken into account during the study. They looked at

  1. The required storage capacity
  2. The image quality
  3. The long-term sustainability
  4. The functionality

The main conclusion of the study is that JPEG 2000 lossless is overall the best alternative for the uncompressed TIFF file format from the perspective of long-term sustainability. In their summary table they note “JPEG 2000 comes out on top in both the lossless as well as the lossy versions”. The PNG format is also mentioned. Under alternate formats, they also list a number of institutions that are using Motion JPEG 2000 as a standard for digital cinema.

InPhase finally to phase in holographic disk. Paul Roberts. The Register. 26th April 2008.

A version of the holographic disk was demonstrated. The disks will store up to 300GB and are supposed to have a 50-year life. The Tapestry drive is priced about $18,000, and the disks will cost $180 each. There are advantages and disadvantages of the disks compared with LTO3, Blu-ray, and Plasmon, and while they hold more, they are also much more expensive. More information at InPhase.

NBA Expands Relationship With SGI on Groundbreaking Digital Media Management System. Marla Robinson. NewsBlaze. April 14, 2008.

The two organizations have created a new digital workflow and media management system, called the NBA Digital Media Management System. It allows the NBA to “simultaneously ingest and archive footage from up to 14 NBA games, edit the archival content on the fly, provide full game broadcasts, clips and other NBA content to 214 countries worldwide.” The system will add 60,000 hours of video each year, and the NBA archive, with more than 400,000 hours dating from 1946, will eventually be digitized. More frequently requested content will be stored on spinning disk arrays, while rarely needed content will be on a more economical storage method.

Time's running out to preserve our treasures. Deborah Holder. Guardian. 22 April 2008.

Putting documents, pictures and other items on a computer makes them more accessible, but one question now is can we save them? There is a lot of work to create an online archive. First, there is the question of what to archive. This is not just an online archive but an online community. There is an urgency to saving these items. "With sound recordings especially, it's imperative to digitise because old formats are disappearing and, more importantly, the players to play them back on are disappearing." But it is about access as well as preservation.