Friday, March 24, 2006

Weekly readings - 24 March 2006

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade.  Philip Pothen.  Ariadne.  February 2006.

    It is clear that accessing and preserving digital data is increasingly important across a wide range of scientific, artistic and cultural activities.  We need more information about deciding ‘how’ to preserve that ‘if’ we preserve.  Fewer than one software package in ten lasts beyond 10 years.   Overcoming the protectiveness of data is one of the highest priorities in this area.  We need to see spending decisions more as investments with a clear view of the costs and benefits.  It is important to examine the social and organizational benefits of preservation.  While the barriers between libraries, archivist and technical specialists are breaking down slowly, we must address the broader question of training and education.  We need to keep the knowledge to be preserved independently from the underlying systems.  We need to develop certification criteria, checklists to determine complexity and cost, and new research. 


Capturing Analog Sound for Digital Preservation: Report of a Roundtable Discussion of Best Practices for Transferring Analog Discs and Tapes.  National Recording Preservation Board.  March 2006.

      The National Recording Preservation Board was created to sustain sound records for future generations.  “Authoritative manuals on how to create preservation copies of analog audio recordings do not yet exist.”  This report will investigate procedures to reformat analog sound to digital media. It summarizes discussions and recommendations from leading audio preservation engineers concerning the present standards and best practices for migrating analog recordings. It gives an overview of the problems encountered and the needs.  There are some recommendations for actions, competencies that should be developed, and a call to share expertise to help in this area.  Some of these include:

·       For discs: Clean the disc when possible;  choose the correct stylus size and playback speed
·       For tapes: identify and clean the tape; address splices and damage;
·       Know the medium
·       Note all metadata with the original
·       Identify the core competencies needed
·       Develop a web-based clearinghouse for information
·       Identify experts to consult
·       Develop project guidelines and best practices within the organization

      The second half of the document outlines recommended practices, competencies, and commentary from the meeting participants for transferring audio to digital media.  They also list resource documents that we should create, especially suggested equipment to perform digital audio archiving tasks, sources of equipment and supplies. 


Fed up with tape, hospital moves to storage jukebox . Lucas Mearian.  Computerworld.  March 24, 2006.,10801,109880,00.html?source=NLT_PM&nid=109880

    This article gives an example of a hospital which installed a new image and records archiving system late last year.  It chose an optical disk jukebox with spinning disk arrays over magnetic tape because it had stopped trusting magnetic tape.  “If you can’t access the data, then whatever you spent on the tape was a waste.” They had tapes go bad after only 50 uses.  They chose a “near-line” storage system, a 13TB optical jukebox  model containing 30GB platters.  It has a two-tier storage infrastructure, where all data is stored in an array for the first two years and then migrated to optical disk, where it’s copied to two platters; one off-site for disaster recovery and the other on-site for near-line storage.


Archaic Sounds Reach Modern Ears.  Rachel Metz.  Wired. 20 March 2006.,70378-0.html?tw=wn_index_5

    Curators at the UC Santa Barbara Library have digitized 6,000 19th- and 20th-century wax and plastic cylinder recordings of music, vaudeville routines and presidential speeches.  Preserving the sounds is vital because the cylinders are deteriorating.  This area has been neglected for many years.  Until recently it was not possible to create quality digital copies of cylinder recordings because cylinders running at different speeds each required different equipment.  Now a system has been created that can play cylinders of various sizes and speeds and transfer the sound to a computer through a patch bay.  It encodes cylinder music as original-sounding WAV files or cleaned-up MP3 versions.  Since November when the site started, over 700,000 recordings have been downloaded.  The recordings on the site are in the public domain and cleaned-up MP3 versions hold a Creative Commons license.

Saturday, March 18, 2006

Weekly readings - 17 March 2006

Excuse Me... Some Digital Preservation Fallacies?  Chris Rusbridge.  Ariadne.  February 2006.

The article looks at a number of issues with digital preservation that the author feels are fallacies.  They are:

   1. Digital preservation is very expensive [because]

   2. File formats become obsolete very rapidly [which means that]

   3. Interventions must occur frequently, ensuring that continuing costs remain high.

   4. Digital preservation repositories should have very long timescale aspirations,

   5. 'Internet-age' expectations require the preserved object must be easily and instantly accessible in a useable format, and

   6. the preserved object must be faithful to the original in all respects.


All preservation, including paper and book preservation, is expensive.  Digital preservation as a whole compared to paper and book needs may actually be less.  While consumer formats may go out of fashion, very rarely are any formats that are completely obsolete.  Recovery of information from old files can be incomplete.  Mass access to the internet has stabilized the formats.  Part of the key to this is to share the information.  This may be more of a problem with extended time frames.  “Investment in digital preservation is important for cultural, scientific, government and commercial bodies. Investments are justified by balancing cost against risk; they are about taking bets on the future. The priorities in those bets should be: first, to make sure that important digital objects are retained with integrity, second to ensure that there is adequate metadata to know what these objects are, and how they must be accessed, and only third to undertake digital preservation interventions.”


It may not be necessary to look at digital preservation in hundreds or thousands of years.  What institutions have this timescale?  It may be more useful to look at digital preservation as a series of events or a relay.  Make your decisions on the timescale that you can see and that you have the funding for.  Preserve your objects to the best of your ability and hand them intact on to your successor.  The right approach may be to keep the original bits and then produce access copies as you can.  The high cost of accessing the original may be best given to the user who asks for them. 


A restatement of the original issues would be:

   1. Digital preservation is comparatively inexpensive, compared to preservation in the print world,

   2. File formats become obsolete rather more slowly than we thought

   3. Interventions can occur rather infrequently, keeping costs down.

   4. Digital preservation repositories should adjust their timescale to meet their funding and business case, but should be prepared for their succession,

   5. "Internet-age" expectations cannot be met by most digital repositories; and,

   6. Only access versions of the preserved object need be easily and instantly accessible, although the original file and good preservation metadata should be available


The lack of money is the biggest obstacle to effective preservation.  Poor decisions will reduce the amount of material that can be preserved. The right choice may be “fewer and better” or “cheaper and more”. 





Future-Proofing Web Sites.  Maureen Pennock.  Ariadne.  February 2006.

The DCC workshop goal was to provide insight about ensuring ongoing access to web sites over time. This is not just a matter of archiving, but also about how to design and manage a web site so that it is suitable for long-term preservation with minimum intervention.  In one presentation, the key to this is the three R's -Reduce, Replicate and Redirect. Reduce the items to make them easier to preserve, replicate them in multiple formats, and redirect links to the new locations.  It is more ‘future-improving’ rather than ‘future-proofing’.  There need to be selection criteria and guidelines to collect and preserve web sites as part of an organization’s wider preservation strategy.  Standards should be applied preferably at the point of creation rather than a later time.  Persistent identifiers and important, but we should be looking at 15 – 20 years, not longer.  Metadata should  document the technical dependencies and tools; this is more useful than just descriptive metadata. The method of selecting web sites must also be documented. 


Some record management principles require the documents to be saved but not necessarily the web site itself.  An organization can therefore determine what needs to be saved, but it may not have to be the entire site.  There should be a clear delineation of tasks and responsibilities.  The National Library of Australia introduced PANDAS 3, a software tool for managing the process of gathering, archiving, and publishing web site resources.  Authenticity is a key issue for web sites.  Preservation management  must include three key aspects: passive preservation; active preservation; and managing multiple manifestations.  Permission should be obtained before archiving web sites.  The main issues were:


·         think about the records perspective;

·         reduce, replicate and redirect;

·         protect your domain;

·         be archive-friendly;

·         carry out 'not-bad practice';

·         experiment, and;

·         identify unhelpful practice.




Decision Tree for Selection of Digital Materials for Long-term Retention.  Deborah Woodyard-Robinson.  Digital Preservation Coalition.  March 8, 2006.

This is an updated version of a decision tree, which is a tool to construct or test such a policy an organization.  The questions and choices in the tree will assist with the decision to accept or reject long-term preservation responsibility.  An effective policy must also be:

·         Endorsed by senior management

·         Actively circulated throughout the organization

·         Reviewed regularly

·         Accompanied by an appropriate resource commitment

        PDF of the Decision Tree (47KB)


Friday, March 10, 2006

Weekly readings - 10 March 2006

University researchers develop new digital rights technology.  Jaikumar Vijayan.  Computerworld.  March 10, 2006.,10801,109449,00.html?source=NLT_AM&nid=109449

Researchers at the University of Maryland have developed a new digital rights management technology to better protect multimedia content from unauthorized copying and distribution.  The technology embeds a unique ID or fingerprint on individual copies of multimedia content. It is designed to allow owners to trace the content, even if it is pieced together from multiple copies.  This can be applied to images, video, audio and other types of documents. 




Editors' Interview with Victoria Reich, Director, LOCKSS Program.  RLG DigiNews.  February 15, 2006.

The LOCKSS (Lots of Copies Keep Stuff Safe) Program offers libraries a cost effective and easy way to build digital collections of Web-based content. Digital information is extremely fragile and preservation must start from the moment it is put into circulation.  Components of LOCKSS include:

·   Replicate the content in independent repositories.

·   Audit the digital content is fragile. If files are continuously compared and damage automatically repaired, off-line back up can be eliminated.

·   A hands-off approach. Minimal processing is needed 

·   Open source software is critical.

·   Allow no single points of failure. Strive for diversity in administration, funding, and technology.

·   Have extremely cost-effective processes.

The LOCKSS system can preserve content in any format available over the Web as long as it has a stable URL structure, and changes at a moderate pace. The single greatest threat to materials being preserved over the long term is money.  Preservation must be accomplished at marginal expense to avoid the threat of scarce economic times.   The CLOCKSS (Controlled Lots of Copies Keep Stuff Safe) Initiative is designed to test the feasibility of a large, community-managed dark archive.   Members are working towards a production system. 


“The fundamental goal of a digital preservation system is that the content stored in the system remains accessible to future readers for a time much longer than the lifetime of any individual component of the system.”  Digital preservation is concerned with long timeframes.  All information system components are unreliable in the long run. A fundamental design principle of a digital preservation system is that it must tolerate component failures.  LOCKSS has planned for format migration, obsolescence, scalability, and it’s own possible demise.  While future access can never be proven, it can work to increase the odds that the content will be available in the future.




Hitting the ground running: building New Zealand’s first publicly available institutional repository.  Nigel Stanger, Graham McGregor.  University of Otago.  09 March 2006.  

Institutional repositories are becoming more important.  This low cost and fully functional repository was started and went live in 10 days.  It has received a very large number of hits.  The repository was built with ePrints in three phases: technical implementation, content collection and administration.  They decided to restrict the pilot to voluntary contributions in PDF format their business school.  This effort was publicized to the department heads to get early acceptance, and they quickly received materials, mostly departmental working or discussion papers which already had permission to publish online.  Items with uncertain copyright status were restricted until the status was confirmed. They found the SHERPA website valuable for copyright information.  They decided to follow Dublin Core Metadata to make it compatible with other projects.  They were able to establish the repository quickly because it was a proof of concept and not a large scale project that involved many disciplines or other people. Policy and procedural issues which needed institutional decisions were noted rather than addressed.  They used a minimalist approach to the effort, especially with gathering content. The site which is on the internet has had over 18,000 downloads.  The repository, which contains about 220 items, shows what can be done by a dedicated team.




OpenDoc Prescription a Bitter Pill for Microsoft in Massachusetts. Richard Entlich.  RLG DigiNews.  February 15, 2006.

An in depth discussion on Massachusetts requiring the OpenDocument format for new office applications.  The document specified that  “executive branch agencies would be required to migrate office document software to applications able ‘to save office documents by default in the OpenDocument format’ by January 1, 2007 and that ‘any acquisition of new office applications must support the OpenDocument format natively.’ The only other acceptable format mentioned in the document was PDF.”  This informative article also provides a chronology of the events.  It is unknown where this effort will lead, but it has already had an impact as it shows others the need for open formats.  Digital preservation is a process of risk management.  To date, much of the preservation efforts have been reactive, but they need to become more proactive.  These actions may have unforeseen consequences, but without taking a chance we may never know what degree of change is possible.




Friday, March 03, 2006

Weekly readings - 03 March 2006

NEDCC Survey and Colloquium Explore Digitization and Digital Preservation Policies and Practices.  Tom Clareson.  RLG DigiNews.  Feb 15, 2006.

The Northeast Document Conservation Center conducted an online survey to develop a way to assess institutions’ digital preservation readiness. While digitization efforts are increasing, there are a lack of policies to deal with these materials once they are created.  While a majority of institutions had policies dealing with goals, collection development, and emergency preparedness, few of them address the digital holdings specifically.  IT staff are key to the success of digitization projects.  A majority of institutions are involved in digital imaging projects, and over half provide online searching to the public.

·        29% of respondents had a policy on the creation of digital resources

·        63% said that 5% of their budget or less was devoted to any type of preservation activities.

·        9% had no funds whatsoever allocated for preservation activities.

·        31.1% did not have an IT department

·        92% had created digital assets from physical source materials, mostly from flat paper or images, also books, and AV

·        39% said the majority of the items they consider to belong to digital collections are unique, single-copy works

·        66.9% provide access to digital collections through an institutional website

·        83% had created descriptive metadata for the digital assets to help find and use of digital collections

·        50% at least also created technical and administrative metadata

·        25% do not assign any portion of their budget to create digital collections

·        42% do not have budget lines for acquiring digital collections

·        60%  do not have a specific person assigned responsibility/primary activity for digital preservation

·        84% supported staff development and professional education/training for digital preservation, but it does not seem to translate into policy development

·        30% of collections are not adequately protected by a backup strategy.

·        52.8% said they do not insure their digital holdings, while 36.5% did not know

·        29% of the institutions responding to the survey have policy, planning, or procedure documents on the creation of digital resources


Responses to the means of digital preservation included regular data backup, migration, and refreshing the data; maintaining legacy equipment and disks, outsourcing to an externally-managed repository, and emulation. Storage media include network hard drives (78%), or removable magnetic media (65%). Digital collections are most often stored in-house in systems managed by the institution.  A preservation study concluded that “small and medium-sized institutions will need the assistance of experts to assess the preservation status and needs of their expanding digital collections.”




ILM isn’t even clear to it’s OWN user community.  Larry Medina.  Computerworld.  February 17 2006.

Information Lifecycle Management (ILM) is still causing major confusion for many people because there is a question as to what it is. It is being promoted as something new.  Some vendors who push their version of ILM may use scare tactics and mis-information to say that the best  way to accomplish this is to save everything.  While it may be easier right now, it is not the cheapest or best way.  The best way is to evaluate exactly what you need to keep and how long you need to retain it.  Use this knowledge to create a policy and train your organization to manage the information effectively.  It also requires a classification system, and the ability to assign record series to the information.  If the information has a long retention period, you will also need to choose appropriate formats and media.  This method will minimize the information being kept, which will mean shorter indexing times,  faster searches, smaller repositories, quicker backups, lower risks during e-discovery, and a uniform method of managing the information.  Until the practices and responsibilities are defined, “throwing a ‘canned solution’ at an improperly analyzed problem is foolish.”




Microsoft hit with fresh charges over Office, future products.  Simon Taylor.  Computerworld.  February 2, 2006.,10801,108888,00.html?source=NLT_PM&nid=108888

Some of Microsoft’s rivals have charged that it is shutting out competitors.  One of the issues raised is Microsoft's refusal to disclose interoperability information for its Office suite. They are refusing to provide data such as the file formats for .doc, .xls, and .ppt documents, which prevents rival application suites from achieving full compatibility.  This has crucial implications Linux systems. 




Open Document Format (ODF)... let the discussion begin!  Larry Medina.  Computerworld.   March 3, 2006.

In commenting on the Open Document Format discussion, the author cites the Wall Street Journal article: “The data belongs to the people, not to the software vendor that created the file format.”  The most important observation is that a standard controlled by a single company is not a standard. 




A Microsoft Document Fight Brews; IBM, Sun Join New Group Promoting Common Format For Government Records.  Wall Street Journal.  March 3, 2006.

As alliance of software vendors plan to promote use of the OpenDocument Format, a set of software technologies for storing and creating documents.  The Open Document Alliance includes IBM, Sun, Oracle, ALA, and 30 others.  This heightens the debate over whether governments should adopt software that supports the OpenDocument Format.   Microsoft doesn’t support the format.  Backers argue that the format  is more trustworthy for storing documents because it isn't owned by a single company.  Microsoft is using a new format called OpenXML in its Office 12 software, which is also supported by Apple and Intel.  This is a result of the actions in Massachusetts last year when the state's information-technology division decided to standardize its programs on the OpenDocument Format.