Monday, February 22, 2010

Digital Preservation Matters - February 22, 2010

Appraisal Actions and Decisions. Chris Prom. Practical E-Records. February 15, 2010.

Most development work on digital repositories focuses on the requirements of the OAIS reference model. But OAIS doesn’t say how records should be selected for deposit. While each archive has a different focus, selecting records for inclusion in an archive is heavily debated. The appraisal process requires careful and intelligent decision making by a person. When appraising electronic records, several tools are needed:

  • examine, identify, compare, delete, rename, and reorganize records
  • manage information concerning records surveys/assessments.
  • manage submission agreements
  • ensure that appraisal actions are documented.

A set of tools is needed to examine, characterize, delete and possibly, reorder records quickly. This would make it easier to decide if the records are within the scope of the archives policy, then take appropriate actions concerning them.


E-Library Economics. Steve Kolowich. Inside Higher Ed. February 10, 2010.

Two studies from the Council on Library and Information Resources examine the implications of libraries changing to digital collections. Libraries seem to be headed in the direction of primarily digital infrastructures but the journey is slow going. Digital standards, such as those for eBooks, are still changing. “While they enjoy the searchability of electronic documents and databases, academics still prefer holding a book in their hands to read it.” The studies point to an average of $4.26 per book per year to keep the book on the shelf. The cost for digital is much less; the digital media repository Hathi Trust stores five million copies at between $0.15 and $0.40 per volume, per year. Books in high-density storage facilities cost only $0.86 per year to keep in usable condition. “The administrators who provide library budgets may be reluctant to fund new facilities to house print collections and may question large expenditures to support both print and electronic formats. Library directors must consider not only the immediate expectations of faculty, but also the long-term goals for the library.”


Studies Cite Argument for, Resistance to Increased Digital Library Collections. Library Journal. February 11, 2010.

A reaction to the E-Library Economics article. The keys to success are to communicate with and educate the students and faculty why the changes are important; to emphasize the preservation of resources, security, and the benefits; and to make the electronic resources available without barriers. One concern, the “move to electronic collections requires certainty about access to digital collections and their persistence. Also, removing books would not change the fixed costs of the building. The report authors also acknowledge “that the business model for ebooks remains unsettled and that print plays an important role for resources that don't yet work so well in digital format."


Using DROID for Appraisal. Chris Prom. Practical E-Records. February 17, 2010.

DROID is a tool to help archivists identify file formats. But it may be valuable in the appraisal process to help an archivist understand the components of a records series. By running DROID and analyzing the reports, it is possible to identify particular file formats outside of the proposed collection scope, especially useful if they are deep in a directory structure. Specific examples and processes used are outlined.


Film Institute launches first digital archive in Wales. BBC News. 9 February 2010.

The British Film Institute has launched its first "digital jukebox" in Wales, allowing people to access its archive. The Mediatheque is already available in England. The system allows people to watch films and TV programs, currently 1,500 titles, from the national archive free of charge; 85% of the titles had not been released on DVD or online.


Innovation: We can't look after our data – what can? Tom Simonite . New Scientist. 11 February 2010.

Anyone worried about the fragility of digital data and civilization’s chances to survive would do well to look to their own data stores first. “Most of us today are blithely heading for our own personal data disasters” because of benign neglect. Data is often lost more from disorganization than from a technological catastrophe, though that happens too. Two possible approaches are mentioned: the Self Archiving Legacy Toolkit (SALT); and the Pergamum project. We are in need of tools to help with diverse, disorganized digital archives which are becoming the norm.


Court Finds E-Mails Stored on Old Archiving System Reasonably Accessible; Costs Exaggerated. Kroll Ontrack. Recent ESI Court Decisions. February 2010.

A court case where the defendant argued that e-mails archived on the company's "cumbersome" old system were not reasonably accessible. “The court found that the plaintiff should not be disadvantaged since the defendant, a "sophisticated" company, chose not to migrate the e-mails to the now-functional archival system and thus determined that the e-mails were reasonably accessible.”

Tuesday, February 16, 2010

Digital Preservation Matters - February 16, 2010

Library of Congress Digital Preservation Newsletter. Library of Congress. February 2010.
The Library of Congress has:
The International Internet Preservation Consortium has released a web archives registry. The registry provides an overview of member web archiving efforts as well as access. It currently includes 21 archives from around the world.

ALA is launching its first Preservation Week on May 9 - 15, 2010.


Results of Digital Preservation Costs Survey now available. Neil Beagrie. 03 February 2010.
The Keeping Research Data Safe 2 survey of digital preservation cost information is now available. The project was to identify institutions with cost information for preservation of digital research data and to conduct a survey of them. The collections will then be the basis of further study. The Summary Analysis of Data Survey Responses can be downloaded as a Word file, as well as each survey response. Survey questions included:
  • Principal data file formats included
  • Size of collection
  • Identification of which types of costs they were tracking

The Online Guide to Open Access Journals Publishing. Directory of Open Access Journals. February 5, 2010.
The online guide is now available and updated. It provides practical information and tools for those producing independent Open Access journals. The guide sections discuss: Planning, Setup, Launch, Publish, and Manage. It refers to several other guides, and provides an input ability for others to add their experiences.


British Library to offer free ebook downloads. Richard Brooks. Times Online. February 7, 2010.
Over 65,000 19th-century works of fiction from the British Library will be available for free downloads this spring. The library, in partnership with Microsoft, began digitizing items several years ago. They will be available online for free, but printed copies will also be available from Amazon. The online and printed versions will look like the rare 19th-century editions. “Altogether, 35%-40% of the library’s 19th-century printed books — now all digitised — are inaccessible in other public libraries and are difficult to find in second-hand or internet bookshops.” They hope to extend this effort to books out of copyright dating from the early 20th century.


Doc or Docx? Which Office Format to Use. Minda Zetlin. Inc. Feb 15, 2010.
The new Office formats have caused user irritation in trying to read the documents. Microsoft has free conversion programs but many refuse to use them. The newer file formats have an x at the end of the file extension, meaning they are based on Extensible Markup Language or XML. With docx, pptx, and .xlsx, Microsoft made a fundamental change with how the files are created. The files also use file compression to reduce the file size and to hopefully reduce the possibility of the full document becoming corrupted. Some suggest keeping the .doc format as the default. Some “save every document in three formats: .doc, .docx, and .pdf.”


The National Geospatial Digital Archive: A Collaborative Project to Archive Geospatial Data. Tracey Erwin; Julie Sweetkind-Singer. GIS and Science. February 8, 2010.
This is a collaborative project to collect, preserve, and provide long-term access to at-risk geospatial data. he project partners created preservation environments at both universities, created and populated a format registry, collected more than ten terabytes of geospatial data and imagery, wrote collection development policies governing acquisitions, and created legal documents designed to manage the content and the relationship between the two nodes.” The article was published in the Journal of Map And Geography Libraries. The difference with geospatial data is that it may reside in complex, multi-file objects, and that it “can remain dynamic indefinitely due to the lifetime of the generating program and the need to be periodically reprocessed.” One of the preservation strategies is to attempt to create multiple copies, with varying capabilities. Preserving context is difficult because the data is voluminous. “It is now understood that access is inextricably linked to preservation.” “The results of the NGDA experience are multifaceted. In practical terms, the successful ingestion of data into working repositories is the most significant outcome.”


Digital doomsday: the end of knowledge. Tom Simonite, Michael Le Page. 02 February 2010.
Recent doomsday article about the loss of digital data. “The current strategy for preserving important data is to store several copies in different places, sometimes in different digital formats. This can protect against localised disasters such as hurricanes or earthquakes, but it will not work in the long run.” “There really is no digital standard that could be counted on in the very long term….”

Tuesday, February 09, 2010

Digital Preservation Matters - February 8, 2010

Online Recordkeeping: It's All in a Name. Mimi Dionne. Internet Evolution. February 2, 2010.

The born-digital record lifecycle has five stages, in chronological order: creation; distribution and use; storage and maintenance; retention; and disposition or archival preservation. All five stages are important. One of the best practices for born-digital records is uniform file naming protocols, including location, to encourage strong content management. These should align with the records retention policies. Organizations are better off if they select the information they need to retain and destroy what they don’t need. “The benefits of implementing a records program that includes regular records destruction have far-reaching influence not only on compliance issues and maintenance of a company’s IT environment but also the health of its budget.”


SPIE to Preserve E-Books in Portico. Press Release. Portico. 2 February 2010.

Portico has agreed with SPIE (the international society for optics and photonics) to preserve its collection of e-books, currently 93 items. It already participates with Portico to preserve its e-journals. Portico now holds over 34,000 e-books and over 10,000 e-journals. The SPIE has also announced the launch of their digital library, which includes 120 SPIE Press titles from the Field Guides, Monographs, and Tutorial Texts series.


Long-Term Preservation Of Web Archives – Experimenting With Emulation And Migration Methodologies. Andrew Stawowczyk Long. IIPC. December 2009. [54 p. PDF]

The decision to emulate or migration are largely based on personal beliefs, rather than on any particular evidence. We do not know which of these is more useful in the long term. All objects change over time, so ensuring long-term, useful access to collections requires we first define the most important aspects of an object that needs to be preserved. The “Preservation Intent” may be useful for this, which is what the institution intends to preserve for any given digital object and for how long. Also needed is the creator’s intent, the contextual information and the technical information.

Two possible approaches for institutions may be:

  1. preserve digital objects over the next twenty years;
  2. find means of preserving objects for longer.

Or an approach may include both: preserve items for 20 years while the search for longer preservation mechanisms continues. “Significant properties” means the properties of a digital object that are essential to the representation of the intended meaning of that object.

The author does not recommend either emulation or migration as a perfect solution to the problem at this current time. Also, their findings and recommendations include:

  1. There are no tools suitable for long-term preservation of very large web archives
  2. All preservation actions need to be based on a clearly defined “Preservation Intent”
  3. Migration and emulation offer some time extensions to for short term access to digital objects.
  4. Emulation seems to present higher risks as a long-term preservation methodology.

It is not possible to preserve it all. Priorities need to be established for practical, long-term preservation solutions. The best hope for adequate long-term preservation, lies in continuous and systematic work, researching various preservation methodologies, and improving our understanding of the future use of web archives.


Is NAND flash about to hit a dead end? Lucas Mearian. Computerworld. February 4, 2010.

IM Flash Technologies has said that shrinking the technology much further may not be possible because of problems with bit errors and reliability. The number of electrons that can be stored in the memory cell decreases with each generation of flash memory, making it more difficult for the cells to reliably retain data.


CNRI Digital Object Repository™. Corporation for National Research Initiatives. 19 January 2010.

(CNRI) has developed a new version of its Digital Object Repository Software. It is open source, flexible, scalable, secure, and has a suite that provides a common interface for accessing all types of digital objects. Redundancy is supported by a mirroring system with software to ensure that replicated objects are kept in sync.

Friday, February 05, 2010

Digital Preservation Matters - January 29, 2010

Preserving Born-Digital Legal Materials…Where to Start? Sarah Rhodes . VoxPopuLII. Cornell Law School. January 10, 2010.

Article well worth reading. We have already all heard the arguments for investing in digital preservation, how the digital world is ubiquitous, how ephemeral digital data is. “There is no denying the urgent need for libraries to take on the task of preserving our digital heritage.” Law libraries have a critically important role to play in preservation. The digital preservation field will always be in a constant state of change because technology is always changing. But there has been progress in creating tools, services, and best practices.

In the law libraries that they looked at, less than 7% of the digital preservation projects involved preserving born-digital materials. The remaining 93% involved preserving digital files created by digitizing physical originals. But those who responded to the survey said, by a margin of 2 to 1, that born-digital materials were in more urgent need of preservation than print materials. This may be a problem of not knowing where to start. In selecting materials for the digital archive, each library should each library establish digital archive selection priorities based on the unique institutional mandates and the research needs of its users. There is a strong case for preserving materials that may be redundant. In addition, academic law libraries should take responsibility for preserving digital content cited within their institutions’ law reviews to ensure that future researchers will able to reference source materials, and possibly the law reviews themselves.

The article looks at standards and systems. “Digital preservation represents an opportunity in the digital age for law libraries to reclaim their traditional roles as stewards of information, and to ensure that our digital legal heritage will be available to legal scholars and the public well into the future.”


File Formats for Preservation. Malcolm Todd. DPC Technology Watch Report. December 2009.

Preserving intellectual content requires a firm grasp of the file formats used to create, store and disseminate it. Five main criteria for file format selection:

  1. adoption: the extent to which use of a format is widespread
  2. technological dependencies: whether a format depends on other technologies
  3. disclosure: whether file format specifications are in the public domain
  4. transparency: how readily a file can be identified and its contents checked
  5. metadata support: whether metadata is provided within the format

These criteria should be used as a tool to create a clear preservation strategy appropriate to the repository. This must be done to manage the risk of obsolescence for the materials.


Google Docs to allow storage of any type of file. Juan Carlos Perez. Computerworld. January 12, 2010.

Google is opening up its Docs hosted office suite so that users can store any type of file in it. It doesn’t mean that they can be worked on or edited online. File size limit has been increased to 250 MB. This is in addition to the G-drive online storage service The G-drive is alive! 1GB free, $.25/GB/year after that.


Release of Web Curator Tool (WCT) version 1.5. Sourceforge. 2009.

The Web Curator Tool is an open-source workflow management application for archiving web sites. It is designed for use in libraries; it supports:

  • Harvest Authorization: obtaining permission to harvest and make publicly accessible;
  • Selection, scoping and scheduling: deciding what to harvest, how, and when;
  • Description: adding basic Dublin Core metadata;
  • Harvesting: downloading the selected material from the internet;
  • Quality Review: ensuring the harvested material is of sufficient quality for archival purposes;
  • Archiving: submitting the harvest results to a digital archive.

It was designed by the National Library of New Zealand, the British Library, and others. It is integrated with the Heritrix web crawler and supports job scheduling and the collection of descriptive metadata.