Friday, August 29, 2008

Digital Preservation Matters - August 2008

Battle of the Buzzwords: Flexibility vs. Interoperability When Implementing PREMIS in METS. Rebecca Guenther. D-Lib Magazine. July/August 2008.

PREMIS specifies the information needed to maintain digital objects long term. Many look at METS (Metadata Encoding and Transmission Standard) to implement this. Ambiguities between the two need to be clarified. This shows some of the structures, the ambiguities and redundancies. A working group has been established to develop guidelines for using PREMIS and METS to resolve the differences. The PREMIS in METS guidelines are a work in progress, and as institutions experiment with them there will further revisions.


A Format for Digital Preservation of Images A Study on JPEG 2000 File Robustness. Paolo Buonora and Franco Liberati. D-Lib Magazine. July/August 2008.

Many have talked about JPEG 2000 not only as a "better" JPEG delivery format, but also as new "master" file for high quality images and as a replacement for the TIFF format. The authors look at JPEG 2000 from a technical viewpoint. JPEG 2000 file structure is not only robust itself, but there are some enhancements that can make it better to use. One is the utility FixIt! JPEG 2000 that can extract the file header; test and fix corrupted images; and save it in XML format. They conclude the format is a good solution for digital repositories.


New record keeping standards announced. Judith Tizard. Press Release: New Zealand Government. 27 August 2008.

The New Zealand Archives announced two new recordkeeping standards, the

1. Create and Maintain Recordkeeping Standard: identifies the key requirements for successful information management for recordkeeping.

2. Electronic Recordkeeping Metadata Standard: a systematic approach to managing information. "Information management is an essential and important legacy." These standards ensure that information has meaning; it can be found when needed; it can be relied on to be what it sets out to be; and it can be moved safely from one system to another. Archives need to answer who created a record, for what purpose, and whether or not it has been altered.


Dead Sea Scrolls go from parchment to the Internet. CNN. August 27, 2008.

The Dead Sea Scrolls are going digital as part of an effort to better preserve the ancient texts and let more people view them. The initiative, announced Wednesday, will also reveal text that was not otherwise visible. Over the next two years, the Israel Antiquities Authority will digitally photograph and scan every bit of crumbling parchment and papyrus that makes up the scrolls. The images eventually will be posted on the Internet. Israel has assembled an international team of technical people for the project.


Very Long-Term Backup. Kevin Kelly. Weblog. August 20, 2008.

Paper, while destructible and limited, can be a stable media over the long term. Digital storage is not stable over long periods. A project has been underway to create a stable medium. This page provides information (and pictures) on the Rosetta project. The project used technology commercialized by Norsam to etch 13,500 pages of information on a titanium disk. The disk is not digital and requires a microscope to read.


OCLC Crosswalk Web Service Demo. OCLC. August 2008.

The purpose of the Crosswalk Web Service is to translate a group of metadata records from one format into another. For this service, a metadata format is defined as:

· The metadata standard of the record (e.g. MARC, DC, etc)

· The structure of the metadata (e.g. XML, RDF, etc)

· The character encoding of the metadata (e.g. MARC8, etc.)

It requires a client software component. As a demo, only a limited number of records can be translated at a time.


OCLC's new Web Harvester captures Web content to add to digital collections. Press Release. July 29, 2008.

OCLC is now offering Web Harvester, a new an optional product that allows libraries to capture and add Web content to their ContentDM digital collections. It captures content ranging from single, Web-based documents to entire Web sites. Once retrieved, users can review the captured Web content and add it to a collection. Master files of the captured content also can be ingested to the OCLC Digital Archive, the service for long-term storage of originals and master files from libraries' digital collections. The Web Harvester is integrated into library workflows, allowing staff to capture content as part of the cataloging process, which is then sent to the digital collections where it can be managed with other ContentDM content. OCLC is committed to provide solutions for the entire digital life cycle.

Friday, July 25, 2008

ArcMail unveils email archiving appliance with Blu-Ray disks. July 23, 2008.

ArcMail Technology, a provider of email archiving and management technology, announced they will include Blu-Ray disks as part of the product offering. The product can store up to 16 TB. The cost starts at $3000.

Wednesday, July 02, 2008

Digital content management: the search for a content management system. Yan Han. Library Hi Tech. Volume 22 · Number 4 · 2004. [PDF]

Digital content management system: a software system that provides preservation, organization and dissemination services for digital collections. This article analyzes Greenstone, Fedora, and DSpace in the key areas of digital content management. A content management system should also provide tools and support for preservation, control and dissemination of both local documents and external content, and be cost-effective as well. DSpace received the highest marks in the operational analysis, schedule analysis and economic analysis, while Fedora received the highest score in technical analysis. DSpace was ranked first among these systems, then Fedora. The Appendix contains the functional requirements.

Friday, June 20, 2008

Librarians Confer in a Midwinter Meeting of Some Discontents


Andrea Foster. The Chronicle of Higher Education. January 25, 2008.

In part of this article it discusses some of the challenges in building an institutional repository. An Ohio university has more than 21,000 articles, including conference papers, teaching materials, photographs, and multimedia works, in the archive.

"Faculty members will submit research papers to the repository often unaware that they have signed away the rights to their work to a journal publisher, Ms. Davis said. "They are stunned that they have not retained the copyrights," she said. "They're vehemently adamant" that they still have rights to the work."

Some people add other scholars' material to the repository, incorrectly assuming that this is allowed by fair use.

Friday, May 16, 2008

Digital Preservation Matters - May16, 2008

Keeping Research Data Safe: a cost model and guidance for UK Universities. Neil Beagrie, et al. JISC. 12 May 2008. [169 p. PDF]. Executive Summary.

Digital research raises issues relating to access, curation and preservation. Fund institutions are now

requiring researchers to submit plans for data management or preservation. The extremely detailed study includes a framework for determining costs variables, a cost model, and case studies. The service requirements for data collections will be more complex than many have thought previously. Accessioning and ingest costs were higher than ongoing long-term preservation and archiving cost:

1. Acquisition and Ingest .................... ca. 42%
2. Archival Storage & Preservation ...... ca. 23%
3. Access ............................................ ca. 35%

Ten years of data from the Archaeology Data Service show relatively high costs in the early years after acquisition but costs decline to a minimal level over 20 years. Decline of data storage costs, costs for ongoing actions such as file format migrations, and others, provide economies of scale.

Some significant issues for archives and preservation costs include:

  • Timing: Costs vary depending on when actions are taken. Costs for initially creating metadata for 1000 records is about 300 euros. Fixing bad metadata after 10 years may cost 10,000 euros.
  • Efficiency: The start-up costs can be substantial. The operational phases are more productive and efficient as procedures become established, refined, and the volume increases.
  • Economy of scale: Increased volume has an impact on the unit costs for digital preservation. One example is that a 600% increase in accessions only increases costs by 325%.

“While the costs of maintaining digital preservation capacity are not insignificant, the costs of the alternative are often greater.” They consider three staff essential to establish a repository:

  1. Archive Manager: co-ordinate activities;
  2. System Administrator: (half time) to install and manage hardware and software;
  3. Collections Officer: develop and implement appropriate workflow and standards

Tasks for the digital preservation planner include: Implementing a lifecycle management approach to digital materials, continuously assessing collections, their long-term value and formats, and making recommendations for action needed to ensure long-term usability. Also:

  • audit the Library’s digital assets, evaluating their volume, formats, and state of risk.
  • research into preservation methodologies.
  • ensure that preservation actions are carried out on digital assets at risk of loss by
  • formulate and publicize advice to data creators

“A data audit exercise is needed at the outset of scoping a digital archive. This will identify collections and their relative importance to the institution and wider community.”

Also, a library should consider federated structures for local data storage, “comprising data stores at the departmental level and additional storage and services at the institutional level. These should be mixed with external shared services or national provision as required.” The hierarchy should reflect the content, the services required, and the importance of the data.

The real cost of archiving results data roughly drops by 25% as new methods and media become available. The cost of migrations is extremely high. Raw data preservation costs per sample.

1970-1990 Paper records £30.00

1989-1996 Magnetic tapes £21.95

1990-2000 Floppy disks £ 7.25

1997-2003 Compact Discs £ 6.00

2000-present Computer disks £ 2.15

“A data preservation strategy is expected to form part of the university’s overall information strategy.” Start-up costs are higher for the early phases, especially for developing the first tools, standards and best practices.


Library of Congress Digital Preservation Newsletter. Library of Congress. May 2008. [PDF]

There are a number of items in the newsletter of interest, including:

  • LC creates and supports the development of some key open standards for digital content, such as
  1. Office Open XML. These estimate that over 400 million people use the different versions of the Microsoft Office programs. This new standard supports all the features of the various versions of Microsoft Office since 1997. Microsoft has released the specifications of its earlier binary formats and asked the Library of Congress to hold copies.
  2. PDF /A, which is a subset of the PDF format, suitable for preservation.
  • The Data Preservation Alliance for the Social Sciences website is a partnership to identify, acquire and preserve data which is at risk of being lost to social science research.
  • The MetaArchive Cooperative is participating in the NDIIPP digital preservation network. They have added an international member to the participants. The site provides documentation and information for private LOCKSS networks and a “Guide to Distributed Digital Preservation.”


The 29 fakes behind a rewriting of history. Paul Lewis. The Guardian. May 5, 2008.

The article emphasizes the importance and need for archive security and object authentication and verification. It is not just a problem for digital objects. Several books had been written based on forged documents planted in the UK National Archives. The author of the books used 29 documents in 12 separate files to write books on historical events; he is the only person to have checked out the files. An investigation resulted uncovered the fake documents; the Archives takes a serious view of anything that compromises the integrity of the information and the archive.

Friday, May 09, 2008

Digital Preservation Matters - May 9, 2008

Bill targets messy e-records. Ben Bain. Federal Computer Week. May 5, 2008.

The proposed Electronic Communications Preservation Act will require federal agencies to preserve electronic communications in an electronic format. This should preserve e-mail and other messages that would leave gaps in the historical record. NARA would be in charge of this. The existing policies do not require the records be preserved in the native format, nor the authority to enforce the policies during an administration. “The loss of documents through indifference should be viewed with as much alarm as their loss through a system breach.” Having preservation standards to capture, manage, preserve and retrieve electronic communications “could save money in the long run.”


Innovations in Digital Asset Management, Circa 2008. Joseph Bachana. CMS Watch. May 7, 2008.

Overview of the development of the digital asset management market. Digital asset management vendors have also been working to add web services and improve the product. The market wants to integrate digital asset management systems into other management systems, such as web content, customer relationship, editorial workflow, e-learning platforms, marketing resource systems, and others. Most of the vendors in this area are small and have limited resources, or are divisions of larger companies, and can’t address all these areas. Some main products have included eXtensible Metadata Platform, Version Cue, SharePoint. A few of the current trends include:

  • single asset management repository
  • XML content that can be republished, reformatted, or changed into different assets
  • Open source, but the digital asset management development isn’t significant yet

“What is new, from my vantage point: no single DAM vendor has the resources to address all of these varied needs in a timely fashion. In the end, it may take the harnessed fervor of the open source community to bring all these threads and more together in the marketplace over the next 24 to 36 months.”


Demand for Collaboration Driving $330 Million Digital Asset Management Markets. Nicole Fabris. Reuters. May 8, 2008.

The market for Digital Asset Management solutions was more that $330 million in 2007 and is growing. The sheer number of digital assets helps drive this, but also “the need for different departments of an organization to have a seamless workflow in handling the same library of digital content.” The digital environments are no longer a separate silo. The report by ABI Research Report shows that the vendor landscape is fragmented with different types of products and in different sectors.


EThOS: a national OAI and digitisation service for e-theses in the United Kingdom. Chris Awre, et al. Open Repositories. April 2008.

The ETHOS project in the UK is an effort to promote a national repository and the digitization and sharing of theses and dissertations. Just putting items in the repository doesn’t mean that anyone uses it or even harvests the metadata. Promoting a repository increases its use. A toolkit is being created at http://ethostoolkit.rgu.ac.uk/. It helps to outline the culture change needed, the business requirements for the institutional repository, technical options, and training for staff.

Open access does not mean the same thing as free access. There are still costs. The libraries who contribute are charged for digitization, researchers are charged for the added value of the digital file. They now have theses that are moving beyond PDF and going to multi-media theses. The sustainability of these files requires the efforts of the community. The British Library provides e-theses and there are some concerns about whether or not they have the right to put them on the internet. They have decided it is more important to make them available and will take them down for copyright problems if there is an actual take-down request from the owner.


DLESE: A Case Study in Sustainability Planning. Mary Marlino, Tamara Sumner, Karon Kelly and Michael Wright. Open Repositories. April 2008.

There are currently no models for sustainability planning. Need to define the core library components and what needs to be sustained. What happens when the project ends? How to prepare for it? This is important in a partnership. Determine before hand the system administration, the application support, the content processes, workflow, and maintenance. Sustainability planning should be started at the beginning of the project. Distributed infrastructure doesn’t often have sustainability in mind, and in this structure it is critical to develop a disciplined model for cost. Sustaining the community is the most important and difficult aspect of library sustainability-user base, culture, embedded expertise. A definition of sustainability is: meeting the needs of the present without compromising the ability of future generations to meet their own needs (wikipedia).


Building Personal Collections and Networks of Digital Objects in a Fedora Repository Using VUE. Anoop Kumar. Open Repositories. April 2008.

Today’s technology makes it feasible to setup repositories for large institutions and also smaller groups and individual.
This utility was created to be a publishing module to create content maps in a Fedora repository and build a knowledge base of objects, the metadata, and the relationships between them. Anyone can create the objects and upload resources; owners can modify their objects and anyone can add or modify the relationships. VUE (Visual Understanding Environment) can be used with other repositories and allows search, browse and deposit.

Friday, May 02, 2008

Digital Preservation Matters - May 2, 2008

Alternative File Formats for Storing Master Images of Digitisation Projects. Robèrt Gillesse, et al. National Library of the Netherlands. March 7, 2008. [PDF]

This is the result of a study of alternative formats for storing master files of digitization projects. There are so many digitization projects occurring that there needs to be a revision of the current format strategy. Currently, most institutions world wide store master image files in uncompressed TIFF file format. If there are 40 million image files in the current projects, that would mean 650 TB of storage space will be necessary to store in this format. The objective of the study was to describe alternative file formats in order to reduce the necessary storage space; these were Jpeg 2000, PNG, JFIF, and Tiff. The desired image quality, long-term sustainability and functionality had to be taken into account during the study. They looked at

  1. The required storage capacity
  2. The image quality
  3. The long-term sustainability
  4. The functionality

The main conclusion of the study is that JPEG 2000 lossless is overall the best alternative for the uncompressed TIFF file format from the perspective of long-term sustainability. In their summary table they note “JPEG 2000 comes out on top in both the lossless as well as the lossy versions”. The PNG format is also mentioned. Under alternate formats, they also list a number of institutions that are using Motion JPEG 2000 as a standard for digital cinema.


InPhase finally to phase in holographic disk. Paul Roberts. The Register. 26th April 2008.

A version of the holographic disk was demonstrated. The disks will store up to 300GB and are supposed to have a 50-year life. The Tapestry drive is priced about $18,000, and the disks will cost $180 each. There are advantages and disadvantages of the disks compared with LTO3, Blu-ray, and Plasmon, and while they hold more, they are also much more expensive. More information at InPhase.


NBA Expands Relationship With SGI on Groundbreaking Digital Media Management System. Marla Robinson. NewsBlaze. April 14, 2008.

The two organizations have created a new digital workflow and media management system, called the NBA Digital Media Management System. It allows the NBA to “simultaneously ingest and archive footage from up to 14 NBA games, edit the archival content on the fly, provide full game broadcasts, clips and other NBA content to 214 countries worldwide.” The system will add 60,000 hours of video each year, and the NBA archive, with more than 400,000 hours dating from 1946, will eventually be digitized. More frequently requested content will be stored on spinning disk arrays, while rarely needed content will be on a more economical storage method.


Time's running out to preserve our treasures. Deborah Holder. Guardian. 22 April 2008.

Putting documents, pictures and other items on a computer makes them more accessible, but one question now is can we save them? There is a lot of work to create an online archive. First, there is the question of what to archive. This is not just an online archive but an online community. There is an urgency to saving these items. "With sound recordings especially, it's imperative to digitise because old formats are disappearing and, more importantly, the players to play them back on are disappearing." But it is about access as well as preservation.

Friday, April 25, 2008

Digital Preservation Matters - 25 April 2008

Preserving the Data Explosion: Using PDF. Betsy Fanning. Digital Preservation Coalition & AIIM. February 2008. [PDF]

This report looks at PDF standards activities and the relevance to digital preservation. The PDF Reference is an open specification made freely available by Adobe. The various version are listed; in 2000 subsets were created, including PDF/A for archiving, which is being developed by AIIM and an ISO group. They looked at a variety of formats for long term preservation and "PDF was chosen as the file format best suited for long-term preservation due to its wide adoption in numerous applications and ease of creating PDF files from digitally born documents." Long term is defined as "the period of time long enough for there to be concern about the impacts of changing technologies, including support for new media and data formats, and of a changing user community, on the information being held in a repository, which may extend into the indefinite future."

PDF is an open file format but is considered proprietary because Adobe Systems owns patents on the format. However it allows developers to use the specification royalty free. The objectives are to find a format that:

  • is device-independent
  • self-contained for rendering and description
  • does not have restrictive elements to render the document long term
  • wide spread use

PDF does not fit all these and have issues that need to be resolved. PDF/A limits some functions, and there are two levels:

  • PDF/A-1a: may include any features before PDF version 1.4 except those forbidden by the specifiations,
  • PDF/A-1b: must meet all specifications

Adobe products will conform to the ISO PDF standard when approved. But the PDF format is not enough to ensure accurate preservation. Organizations must have appropriate policies, procedures and records management in place. It is important to know that files conform to PDF/A, so tools are needed. "It is safe to say that correctly implementing the PDF/A file format should result in reliable, predictable, and unambiguous access to the full information content of electronic documents long-term." Education and training on PDF/A is needed. "Due to the specific nature of long-term preservation of electronic documents, the field of available file formats that can be used for preservation purposes is very small." Other formats often considered are TIFF, XML, ODF, OOXML, and XPS.


Significant Properties of Digital Objects. Andrew Wilson. JISC Workshop. 7 April 2008.

The fundamental challenge is to preserve the accessibility and authenticity of digital objects over time and across changing technical environments. We must accept the separation of logical information of an object from its physical environment. There are different models of digital preservation that focus on the technology, the data, the processes, or restoring objects later (digital archaeology). Authenticity comes from integrity and accuracy (no unauthorized changes), being able to trust that the item is what it is supposed to be, and the ability to use and view it later. That does not mean that it has not been changed, but that the message it was meant to communicate is unaltered. The model needs to ensure that the essence or significant properties are preserved.


Investigating the significant properties of electronic content over time. Stephen Grace. JISC Workshop. 7 April 2008.

The project is to look at the properties of the digital content. The framework is to catalog the significant properties of a digital object, determine the relative value of the property for the re-creation of the object, designate the level of significance, determine the user community and restrictions. Some properties are more important to others and a judgment has to be made on the value. A numbered scale measures the significance, from essential to not important.


The Significant Properties of Vector Images. David A. Duce. JISC Workshop. 7 April 2008.

They use the data-centric approach which focuses on maintaining digital objects in the current formats rather than the process-centric approach that keeps objects in their original form and attempts to emulate the original environment. The strategy is to transform the original object with related information to create a transformed source that retains the essence of the original. It is a challenge to identify the significant properties and keep them through the transformation process. We need to document why something is being preserved and why the particular methods were used. Some possible formats for these types of graphics are WebCGM (mostly engineering), SVG (an XML application with font and animation capability) and PDF/A. More research is needed.

Friday, April 18, 2008

Digital Preservation Matters - 18 April 2008

Definitions of Digital Preservation (updated link). American Library Association. April 15, 2008.
A working group within the Preservation and Reformatting Section has drafted a definition of ‘digital preservation’ to promote an understanding of digital preservation within the library community. They created a short, medium, and long version to accommodate a variety of needs. They express “the need for a declared intention to preserve, a plan for doing so, and engagement in measurable activities to realize that plan.”
Short Definition: Digital preservation combines policies, strategies and actions that ensure access to digital content over time.
Medium Definition: Digital preservation combines policies, strategies and actions to ensure access to reformatted and born digital content regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time.
Long Definition: Digital preservation combines policies, strategies and actions to ensure the accurate rendering of authenticated content over time, regardless of the challenges of media failure and technological change. Digital preservation applies to both born digital and reformatted content.
Digital preservation policies document an organization’s commitment to preserve digital content for future use; specify file formats to be preserved and the level of preservation to be provided; and ensure compliance with standards and best practices for responsible stewardship of digital information.
Digital preservation strategies and actions address content creation, integrity and maintenance, which are listed in the definition.

The PREMIS editorial committee has updated the data dictionary. It is a resource for preservation metadata in digital archiving systems. Preservation metadata is defined as “the information a repository uses to support the digital preservation process” and includes administrative (including rights and permissions), technical, and structural. It defines core metadata as “things that most working preservation repositories are likely to need to know in order to support digital preservation.”
PREMIS schema are available from The Library of Congress website.

The newsletter includes three items:
A report of the Section 108 copyright study group. Some highlights from that:
  • Museums should be eligible for section 108
  • A new exception should permit qualified libraries and archives to make preservation copies of at-risk published works prior to any damage or loss. Access to these “preservation-only” copies will be limited.
  • A new exception should permit libraries and archives to capture and reproduce publicly available online content for preservation purposes and to make those copies accessible to users for private study, research or scholarship.
  • Libraries and archives should be permitted to make a limited number of copies as reasonably necessary to create and maintain a single replacement or preservation copy.
The Chronopolis project is a datagrid framework being developed by the San Diego Supercomputer Center and others, for preserving content and developing best practices.
The Washington State Digital Archives is leading a multi-state government project for archiving local state government data.

Windows Life-Cycle Policy for XP. Microsoft. Updated: April 3, 2008.
Microsoft has updated the end of life-cycle information for XP license availability and support to June 30, 2008. The end date for XP Home on Ultra Low-Cost PCs is extended to June 30, 2010, or one year after the general availability of the next version of Windows. A final service pack for XP is expected by the end of April: Windows XP SP3 out by end of April.

Thursday, April 17, 2008

Digital Preservation Matters - 11 April 2008

Section 108 Study Group Releases Report. George H. Pike. Information Today. April 10, 2008.

An advisory study group has been created to make recommendations about copyright issues and the role of libraries and archives in preserving information. Section 108, part of the Copyright Act of 1976, does not adequately define archiving web content, preservation of analog and digital works, and digital copies. It currently only recognizes "published" and "unpublished" works. The study group identified a new category of "publicly disseminated" works which includes copyrighted works transmitted by broadcast, online streaming, etc. The group recommended changes to Section 108 to allow libraries and archives to make "a preservation copy of any at-risk" publicly disseminated work.

This new exemption would be limited only to non-commercial unique or rare "at-risk" works that may be lost due to an unstable or ephemeral format or medium. Only libraries and archives that have comprehensive preservation programs would be allowed to make and preserve these copies. Access to these preservation copies would be restricted and not part of a library’s general collection. Only publicly accessible content could be captured. [The full report is available here; it is a 212 page PDF.]


In Storing 1’s and 0’s, the Question Is $. John Schwartz. The New York Times. April 9, 2008.

The amount of digital materials in increasing, but much of the data is ephemeral. It is very fragile; “there’s no one-size-fits-all model for preserving data in the digital age,” and the biggest problem is how to pay for it. The National Science Foundation has started a $100 million program (DataNet) to help develop methods and technologies to preserve data that make economic sense . Choices have to be made about what to keep. It is just as important to keep the right information.


Sun fixes Java SE for a fee. Gavin Clarke. Register Developer. April 7, 2008.

Sun is extending the support program for Java Standard Edition 1.4, which will officially retire this summer. The support program will require payment and will extend to 2017. Otherwise, users must upgrade to the latest edition of Java SE; free support for the software will be three years now instead of six.


Agency under fire for decision not to save federal Web content. Heather Havenstein. Computerworld. April 11, 2008.

NARA has discontinued its policy of taking a "digital snapshot" of all federal agency and congressional public Web sites at the end of congressional and presidential terms, since they believe the content is already saved by each agency as permanent records. "The fact that digital preservation is done by others outside NARA isn't an excuse for NARA to abdicate their responsibility, but an argument that they should be capable of fulfilling it. "As members of Congress and federal agencies increasingly move their work online, robust digital archiving will only become more important."


Seagate Delivers World's First 1TB Drive with SAS Interface. Press Release. April 7, 2008.

Seagate announced it is now shipping a 1 Terabyte enterprise-class hard drives with a Serial Attached SCSI interface. It includes a five-year limited warranty.


Library of Congress Groans Under Data Strain. James Rogers. Byte and Switch. April 9, 2008.

Library of Congress has to find a way of dealing with an unbelievable amount of information. The library currently has more 500 TB of digital data, split across three data centers and many different storage technologies, most are online or nearline, and some on tape. They also need help deciding which digital data needs to be preserved. “This is all about preservation and future-proofing.” They estimate the information produced every 15 minutes is equivalent to all information currently in the Library of Congress.

Friday, April 04, 2008

Digital Preservation Matters - 04 April 2008

Audio and Video Carriers: Recording Principles, Storage and Handling, Maintenance of Equipment, Format and Equipment Obsolescence. Dietrich Schuller. TAPE. February 2008.
This is an introduction to those working with sound and video collections. It outlines the history of various types of audio recordings, including CD and DVDs, how they were made and how stable they are. Also an overview of the passive preservation factors, particularly environment, handling and storage. Humidity and oxidation affect the physical surfaces. Other factors are dust, pollution, light, and magnetic fields. It includes a section on the maintenance of equipment and the obsolescence of formats.

IMLS Will Sponsor Second Conservation Forum for Collecting Institutions. Jill Collins. IMLS. Press Release. March 20, 2008.
This forum “Collaboration in the Digital Age” is intended to help museums and libraries think strategically about digital preservation. It is to be held June 24-25 in Denver. It will emphasized the fundamentals of digital content creation and preservation, emphasizing practical approaches to planning digital projects, increasing access to collections, enabling digital resources to serve multiple purposes, and protecting digital investments. In 2006, online visits accounted for 310 million of the 1.2 billion adult visits to museums and 560 million of the 1.3 billion adult visits to libraries. Yet 60% of collecting institutions do not include digital preservation in their mission.

Audio Tape Digitisation Workflow: Digitisation Workflow for Analogue Open Reel Tapes. Juha Henriksson, Nadja Wallaszkovits. TAPE. March 2008.
A practical web-based workflow for audio tape digitization. Looks at physical factors, such as tape problems, equipment, and conversion. The standard CD sampling rate of 44.1 kHz is outdated and may be inadequate for many types of material. Currently 96 kHz is regarded as a widely accepted standard. IASA recommends a minimum sampling rate of 48 kHz, though some types of material may need 192 kHz. They also recommend an encoding rate of at least 24 bit to capture analog items. Other topics are metadata, recording level, format, and archival masters. After digitization the digital file is now the preservation format. For preservation purposes an asset register should be kept and updated, and should also record the checksum for each file.

White Paper: Representation Information Registries. Adrian Brown. PLANETS. 29 January 2008.
A report on Representation Information Registries. These are a critical component of digital preservation architecture, containing the technical knowledge necessary to support access to digital objects. “Any meaningful digital preservation activity requires some form of knowledge base regarding the technical environments necessary to support access to digital objects.” This is expressed in the OAIS model. Key reasons for the registries are: efficiency of description; knowledge sharing; sustainability. “Preservation planning encompasses all activities which identify the need to perform preservation actions, and the most appropriate actions to perform in order to meet specified objectives.”

Developing Practical Approaches to Active Preservation. Adrian Brown. National Archives, UK. June 2007.
The active preservation methodology comprises three main functions:
  1. characterization: measures the properties of digital objects needed for long-term preservation;
  2. preservation planning: the appropriate preservation actions to be undertaken; and
  3. preservation action: the results of preservation planning, transforming the objects
The PRONOM technical registry supports these functions and is the core of the preservation system. The preservation planning framework determines what preservation actions should be applied to which objects, and the appropriate time to apply them.

Friday, March 28, 2008

Digital Preservation Matters - 28 March 2008


Standards and Requirements for Digital Continuity in UK Government. Digital Continuity Project. UK National Archives. 14 March 2008. [PDF]

This is a draft of standards developed by the National Archives to help assess commercially available digital preservation solutions. This is their description and checklist of what a digital preservation system should do. The principle standards for digital continuity are defined in the OAIS model and the Trustworthy Repositories Audit & Certification Checklist. There is more information on their Digital Continuity project that is worth reading, as well as a brochure. It is estimated that 10 per cent of the Canadian Government’s electronic records are already unreadable. Some of their areas of work are:

-:-

A Possible Way Forward For Developing Cornell’s OAIS Infrastructure. Adam Smith. Blog. March 25, 2008.

A programmer looks at their long-term digital preservation project. In trying to create a system at Cornell, they originally used an object oriented approach. They encountered scaling issues that included both processing speed and memory usage. Addresses topics such as:

  • preserving “virtual” objects which serve to represent virtual relationships to other objects.
  • two broad sets of tasks in preservation processing before ingest, are
    1. normalizing the data
    2. gathering information to make a METS XML file
  • look at a functional paradigm instead of a object-oriented (OOP) paradigm
  • specifying collection specific tasks should be as declarative or configuration oriented as possible.

-:-

FACET: The Field Audio Collection Evaluation Tool. Mike Casey. Indiana University. 21 March 2008.

The Field Audio Collection Evaluation Tool (FACET) is an open-source tool to rank audio field collections based on preservation condition, including the level of deterioration they exhibit and the degree of risk they carry; to assess the characteristics, preservation problems, and deterioration of various tape formats. It includes the software, manual, format information, and worksheets.

-:-

On the Road With Fedora and Atos Origin in Paris. Carol Minton Morris. Fedora HatCheck Newsletter. March 12, 2008.

The Bibliotheque Nationale de France has contracted with an information technology services company to create a Fedora-based repository system. The library has chosen to use the OAIS model for the repository and the Fedora architecture.

-:-

Kofax® Wins $2.1 Million Contract with National Archives and Records Administration. Press Release. Business Wire. March 19, 2008.

Kofax will provide NARA’s Federal Records Centers with an enterprise level solution for capturing and processing documents. This is part of an initiative to “create and provide electronic records for preservation and use by the government and citizens”.

-:-

Web Curator Tool Project: 1.3.0 Released. Sourceforge. March 3, 2008.

The Web Curator is a tool to manage the web harvesting process. It was designed by the National Library of New Zealand the British Library. The tool supports the selection, harvesting and quality assessment of online information, either entire web sites or a portion. The workflow helps with the various tasks involved in the process, permissions, description, scope, and deposit. The latest version is now available for download.

-:-

Evaluating File Formats for Long Term Preservation. Judith Rog, Caroline van Wijk. National Library of the Netherlands. February 2008. [PDF]

Most documents deposited in the Koninklijke Bibliotheek have been in the PDF format. Because other formats need to be handled, the library has developed a quantifiable file format risk assessment method which can define strategies for specific file formats. At the time of the object’s creation, the file format can influence the long-term access. The method they developed has seven weighted criteria for file formats: Openness, Adoption, Complexity, Technical Protection Mechanism, Self-documentation, Robustness, Dependencies. They give recommendations but do not restrict deposits to specific file formats. One partner does not consider PDF/A suitable for archiving. Web archiving with different format types presents the biggest challenge.

-:-

Friday, March 21, 2008

Digital Preservation Matters - 21 March 2008

The Fifth Blackbird: Some Thoughts on Economically Sustainable Digital Preservation. Brian F. Lavoie. D-Lib Magazine. March 2008.
The article looks at digital preservation as an economically sustainable activity. This is an area that has had little progress. There has been little discussion or systematic analysis on how to make it last after the current funding ends. A task force and website has been created to examine those issues. While some contend that the answer is a Simple Matter Of Resources, he feels the principles are not known; “we have not yet established a systematic mapping between general economic models of resource provision and particular digital preservation contexts.” The task force will look at the issue for two years and hope to make the choice between different economic models a little clearer. It is unlikely that most institutions will be able to develop a preservation ability, but it is likely that a network of preservation repositories will emerge. “The ease with which we create information in digital form tends to obscure the true cost of maintaining it over long periods of time. It has almost become a truism to say that our capacity to produce digital materials far exceeds our capacity to maintain them over time.”


Rethinking Personal Digital Archiving, Part 1: Four Challenges from the Field. Catherine C. Marshall. D-Lib Magazine. March 2008.

Technical discussions about digital archiving are usually based on two assumptions:

1- preservation will rely on the ability to render digital objects in the future

2- trusted repositories will be used to store and exchange these digital objects

Will this really address the needs for consumers? Individuals rarely think that their own stuff needs preserving, or they may use various methods to put the objects in a ‘safe’ place. Benign neglect is the most common attitude. “Digital belongings are ultimately stored according to what people are planning to do with them….” The same characteristics that make digital assets attractive make the digital stewardship more difficult.


Rethinking Personal Digital Archiving, Part 2: Implications for Services, Applications, and Institutions. Catherine C. Marshall. D-Lib Magazine. March 2008.

Archiving services and applications must be able to assess value in a way that makes intuitive sense to individuals in the future. Digital assets are often in different locations, so creating a union catalog of the objects is an urgent first need. Many assume that digital stewardship is simply storing the data once and viewing it sometime in the future. Regular maintenance is needed, such as checking and refreshing media, migrating files to better formats, and virus and malware checks, etc. “It is more important to know what we have and where we've put it than it is to centralize all of our stuff into a single repository.”


IT is Not Responsible for Records Retention. Brian D. Jaffe eWeek Midmarket. March 10, 2008.

Deciding which records to keep on regulatory grounds and how to save everything in case of disaster are not compatible skill sets. A backup strategy is not the same as a record retention policy. IT should be responsible for backup, but not the content, which would mean they would need to know the content of the documents, how long they should be kept, and what to do with them at the end of that period. A retention policy is also not a recovery mechanism. So there is a need for both. Saving everything is not an option technologically nor legally. IT is responsible for keeping the data safe. The users are the owners of the data and should decide what to do with it. A written policy is needed to know the different roles and requirements. “Data is the most valuable asset that IT is responsible for, but it is a responsibility that can't be borne by IT alone.”


Do you really know where your e-mail slept last night? E-Mail Compliance – What does it mean? Andy Whitaker. IT Security. March 4, 2008.

It is important to verify the integrity of the organization’s communication and provide auditability, especially with data protection laws. With the increased emphasis on electronic records, it is important to know what information is captured, if there have been alterations, if the data can be retrieved, and if you can show who has access to the archives.