Friday, October 30, 2009

Digital Preservation Matters - 30 October 2009

Copyright and Cultural Institutions: Guidelines for U.S. Libraries, Archives, and Museums. Peter B. Hirtle, et al. October 2009. [Book and 275p. PDF]

This book can help libraries learn how they can use the internet to provide access to their collections and comply with copyright laws. The book is also available in PDF form on the website. It “addresses the basics of copyright law and the exclusive rights of the copyright owner, the major exemptions used by cultural heritage institutions, and stresses the importance of “risk assessment” when conducting any digitization project.” A section on ‘Digital preservation and replacement copies’ is important to read to learn more about what we can do and can’t do.


Universities offering new perks to broke students. Carol Warner. HigherEdMorning. October 25, 2009.

An initiative in Florida makes more than 120 textbooks available to students to download for free. It also sells the books at a discount because findings show:

  • 22% of students are “uncomfortable” reading from a computer screen
  • 75% of students prefer to read print copy,
  • 60% of students would buy a discounted book even if the textbook was available for free online.


Microsoft opens Outlook format, gives programs access to mail, calendar, contacts. John Fontana. HigherEdMorning. October 27, 2009.

Microsoft announced it will provide patent- and license-free use rights to the format behind its Outlook Personal Folders. It will document and publish the .pst format, which is used for the email, calendar, and contact functions. This will explain how to parse the contents of the file and how to access that data from other software applications.


iTunes for college courses? It’s true. Carin Ford. HigherEdMorning. October 24, 2009.

The University of Virginia now offers over 1,000 lectures, videos, etc. as a free digital download from iTunes U. Others have done this for a long time. But they have combined other features. An interesting feature is that if a student subscribes to a specific course, new material will be downloaded automatically to his iTunes library.


Obama Drupal-ing around; goes open source. Richi Jennings. Computerworld. October 26, 2009.

White House has chosen Drupal, an open-source content management software, to run the Web site.


High Volume Document Storage Creates New Headaches for Content Managers. Steve Jones. CMS Wire. Oct 26, 2009.

Digital storage centers are filling rapidly. Some look at ‘single-instance storage’ as a viable option for organizations trying to reduce digital storage requirements. Single instancing is based on the principle of keeping one copy of a digital file that multiple users share and eliminating duplication.


Open Access to Research Is Inevitable, Libraries Are Told. Jennifer Howard. The Chronicle of Higher Education. October 15, 2009.

A panel told ARL libraries that public access to research is "inevitable," but it will take work to get there. Faculty are starting to understand that open access to research has to happen in order to have the most scholarly opportunities. The US is far behind other countries regarding access to research. Researchers who don’t have the latest research can't fully participate in the academic discussions. The National Science Foundation plans to build an international, large-scale data-curation network.

Monday, October 26, 2009

Digital Preservation Matters - 23 October 2009

Sidekick Data Restoration Has Started, Microsoft Says. Barry Levine. NewsFactor. October 20, 2009.

Danger, a Microsoft subsidiary using ‘cloud computing’, experienced a system problem that erased all the users' contacts, calendar entries, to-do lists, and photos for those using the Sidekick smart-phone. Much of the data may be eventually recovered, but effective data backup and protection measures were not being followed. It shows the importance of using reliable vendors and have data backups. [This is the first major loss of ‘cloud – data’ that I know of.]


Millennial disc guarantees data preservation. Logan Bradford. Daily Universe. September 15, 2009.

Barry Lunt, a BYU information technologies professor, will launch a product with the company, Millenniata, that produces a disc just like a CD or DVD that will last up to 1,000 years. He learned, through his seven years working for IBM in computer data, that data on CDs and DVDs would decay and be lost over just a few years because of optical discs’ ephemeral qualities, such as when they are exposed to sunlight and humidity. [We have been testing these discs and writers.]


Wellcome Library to use JPEG2000 image format. Library blog. September 18, 2009

The Wellcome library in London has been using TIFF images as their archival storage format. But, anticipating adding over 30 million images, they wanted to find a way to efficiently store the digital content but still maintain high levels of quality and open standards required for long-term preservation. To do this they have chosen to use the JPEG2000 format in its digitization program. But the difficulty is that the JPEG2000 format has multiple versions. They wanted to know which version is best for long-term storage and access, so they commissioned a study by Kings College: JPEG 2000 as a Preservation and Access Format for the Wellcome Trust Digital Library. Robert Buckley, Simon Tanner.

Based on the study will adopt a "visually lossless" lossy compression to gain at least 75% storage savings in comparison to a TIFF version. “The recommended compression parameters will produce an image with no visible difference in image quality, but the compression is irreversible - i.e. the original bit stream will not be possible to reconstruct. As the Library will be digitising physical items that can (if necessary) be re-digitised, it was considered an acceptable compromise.” Some materials may be candidates for JPEG2000 lossless compression. They are also recommending that “JPEG 2000 be used with multiple resolution levels.”


The Swedish Research Council requires free access to research results. Press release. October 8, 2009.

Researchers granted funds by the Research Council should publish their scientific research in publications that are available according to Open Access guidelines within a maximum period of six months. "We consider that publication of research which has been paid for out of public funds should be made freely accessible to all." The Open-Access rules apply so far only to scientifically assessed texts in journals and conference reports, and not to monographs and chapters of books.


Sound archive of the British Library goes online, free of charge. Mark Brown. The Guardian News. 3 September 2009.

The British Library has made its archive of world and traditional music freely available on the internet. The Archival Sound Recordings archive contains about 28,000 recordings, estimated at 2,000 hours of sound. These recordings are from around the world and the oldest are from wax cylinders made in 1898. The Library wants to change the perception that “things are given to libraries and then are never seen again – we want these recordings to be accessible."


Keeping Research Data Safe2: Data Survey added to project website. Neil Beagrie. Blog. 26 Sep 2009.

Information about the project and link to the website. The project is to identify long-lived datasets for the purpose of cost analysis will be ending soon. It refers to the previous project. In the activity model it mentions it will look at the development of an archive’s selection policy, also staff training and development. One area of concern was of OAIS terminology potentially being a barrier to understanding for some user groups.

Friday, October 23, 2009

Hardware updates

Micron boosts NAND flash endurance six-fold. Lucas Mearian. Computerworld. October 19, 2009.

Often techniques are used to increase flash memory performance, but they also cut the capacity by 90%. Micron says its technique to increase density also increases the number of times data can be written to the device.


Engineers create material that could hold 1TB of data on fingernail-sized chip. Lucas Mearian. Computerworld. October 21, 2009.

Engineers have created a material that could hold a terabyte of data in a chip the size of a fingernail.

Friday, October 16, 2009

Digital Preservation Matters - 15 October 2009

PREMIS Implementation Fair 2009. Reports. October 7, 2009.

This fair concerning PREMIS (digital preservation metadata) was presented following the iPres conference in San Francisco. The online agenda for the PREMIS fair now includes the PowerPoint and PDF presentations. They include:

  • Status of PREMIS (Brian Lavoie)
  • Implementation in METS (Rebecca Guenther)
  • PREMIS Rights implementation at University of California San Diego (Bradley Westbrook)
  • PREMIS for geospatial data (Nancy Hoebelheinrich)
  • Towards Interoperable Preservation Repositories (TIPR) project (Priscilla Caplan)


New E-Book Company to Focus on Older Titles. Motoko Rich. The New York Times. October 13, 2009.

A new company has been formed that will republish old titles, and seeks new authors willing to be published in electronic format. They look at to an aggressive marketing campaign, and are looking at both the backlist and the electronic format.


The ARL Preservation Statistics 2006-07. David Green. Association of Research Libraries. September 25, 2009.

The latest edition of the preservation statistics. It looks mainly at physical preservation, and includes information on personnel, expenditures, conservation treatment, preservation treatment, and preservation microfilming. The section dealing with digital preservation states:

“Digitizing for preservation purposes is the reproduction of bound volumes, pamphlets, unboundsheets, manuscripts, maps, posters, works of art on paper, and other paper-based materials for thepurpose of:

a) making duplicate copies that replace deteriorated originals (e.g., by digitizing texts and storing them permanently in electronic form and/or printing them on alkaline paper);

b) making preservation master copies to guard against irretrievable loss of unique originals (e.g., by making high-resolution electronic copies of photographs and storing them permanently and/or printing them; or

c) making surrogate copies that can be retrieved and distributed easily, thereby improving access to information resources without exposing original materials to excessive handling; or some combination of these factors.


Alfresco Achieves DoD 5015.02 Records Management Certification. Marisa Peacock. CMS Wire. Oct 5, 2009.

Alfresco is certified as DoD 5015.2 compliant, the first open source content management system to do so. The Department of Defense standard outlines mandatory requirements for Records Management programs and is recognized as a base standard by many organizations.

Monday, October 12, 2009

netarchive Newsletter - August 2009

The two institutions behind the, the Royal Library and the State and University Library, have developed a system for web archiving called NetarchiveSuite, which has become open source.

An update on web archiving activity. The Act on Legal Deposit of Published Material of 22 December 2004 provides their legal foundation. The major issue for web archiving is access, which is currently limited to researchers. These come under their data protection act.

Their archive contains about 112 Terabytes with about 3.5 billion objects. The Top Level Domain DK now has more than 1.3 mill. domain names of which about 1 mill. are active. In addition they harvest about 44.000 Danish sites on other domains.

The system provides technical information on harvests but it is important to document the decisions made on what to collect and not collect so that future researchers may know the content of the archive.

Friday, October 09, 2009

Digital Preservation Matters - 9 October 2009

Preservation: Evolving Roles and Responsibilities of Research Libraries. ARL. September 15, 2009. Webcast, PDF.

This site has the webcast and a pdf of the original report. Some notes from the webcast:

There isn’t enough funding to meet all preservation needs

  • Need to align preservation activities with institutional concerns.
  • Can’t preserve without metadata. It is what connects preservation to the rest of the library.
  • Must maintain the context and address rights issues.Research libraries must determine what researchers will need in the future, then preserve the content so that it can be us
  • Need to balance print and digital collections
  • Our special collections are what differentiate us from other research libraries, and we need to make broad access available to these collections

Safeguarding Collections at the Dawn of the 21st Century: Describing Roles & Measuring Contemporary Preservation Activities in ARL Libraries. Lars Meyer. ARL. May 2009.

Notes: Preservation is a core function of the research library and a key element of the stewardship and access missions of research organizations. It is not just the responsibility of one department. Three perspectives for libraries:

  1. Reallocate priorities and resources in response to changing trends in publishing, research, and teaching activities.
  2. Expand practices related to preserving digital content though Web archiving, digital repositories, and efforts to preserve e-journals and other born digital content
  3. Build collaborative activities to effectively address digital preservation challenges

“Digital Preservation is a subset of library preservation.” A definition of data curation: ““is the active and on-going management of data through its lifecycle of interest and usefulness to scholarship, science, and education. Data curation activities enable data discovery and retrieval, maintain its quality, add value, and provide for re-use over time. [It] includes authentication, archiving, management, preservation, retrieval, and representation.” Preservation includes managing the relationship between content, context, and access. Rather than discussing analog vs. digital, or access vs. preservation, it is more important to ask if relevant preservation work is being done throughout the life of digital content, and how or when is it done.

Preservation of born digital content begins with:

  1. decisions about the form in which a library should acquire digital content;
  2. a clear understanding of the library’s rights to preserve such content;
  3. policies, technical infrastructure, and staff to realize the ongoing work;
  4. possibly joining or develop cooperative digital preservation networks.

Digital Repositories: It is “vitally important for all research libraries to be engaging with digital repository development projects in some fashion.” These are not the same as preservation repositories; both are needed. One example given is the Illinois Digital Environment for Access to Learning and Scholarship (IDEALS) is the institutional repository for the University of Illinois at Urbana-Champaign, which has very well defined policies. Digitization, to be considered a preservation process, must include an institutional plan to preserve the digital content.

Web resources: many government documents which used to be printed are now available on online. Research libraries have a responsibility to determine how the community can best share the effort of identifying, describing, and preserving Web-based publications.

It is unlikely that all the needed preservation expertise will exist in one library so ARL members need to develop partnerships. Preservation is a continual learning process.

Some recommendations:

Institutions need well-developed policies, strategies, and practices.

  1. Preservation decisions to be made strategically throughout the life-cycle of the resources
  2. Community agreed-upon practices are needed for preserving digital surrogates.
  3. Digital curation: an active set of activities that requires active partnerships
  4. Digital preservation requires an understanding of rights, technical infrastructure, staffing
  5. Repositories: include preservation activities, work with faculty on continuing access.

Google lets you custom-print millions of books. Ryan Singel. Wired. September 18, 2009.

Out of print books can be printed individually with the Espresso Book Machine through a venture with Google Book Search and On Demand Books. The $100,000 printer can print a 300 page gray-scale book with a color cover in about 4 minutes; it then trims and binds the book. The materials cost about $3. The machine can print about 60,000 books a year. The books can be printed from pdf files. “Some 80 percent of the public-domain books are looked at in a given month.” “Neller said he'd love to see the day when Google Book Searchers can press a button next to a search result and find the closest local printer, but Google says that's a long way off.”

Friday, October 02, 2009

Digital Preservation Matters - 01 October 2009

Media Preservation Survey: A Report. Mike Casey. Indiana University Bloomington. 1 October 2009. [132 p. pdf]

An excellent and very detailed report from Indiana University Bloomington concerning the 560,000 audio and video recordings and reels of film on the campus. The report looks at the characteristics and condition of only one of the many groups of materials and the preservation challenges. This was a ten-month study by a team of archivists. The next step is developing a campuswide preservation plan. These historical "jewels" will be lost if not preserved soon. A few preservation activities exist on campus but they are too small to be effective or are not sustainable. They have 51 different media formats. They have over 180,000 digital files in the collection. "These formats require active preservation services from the moment of creation if their content is to survive."

Redundancy is a key strategy is saving the materials, but only 11% have a copy. "One copy is no copy." Preservation of audio and video objects require transferring to digital. Storing AV materials at the correct temperature and relative humidity is the "single most important factor in slowing the physical degradation of audiovisual media." At the current rate it will take 120 years to digitize the AV holdings. There is a very useful chart of Selected High and Medium Risk Formats in their collection.

Among their recommendations:

  • Appoint a campuswide taskforce to advise on preservation
  • Create a centralized media preservation and digitization center for the entire campus
  • Develop special funding to digitization the materials quickly
  • Create an appropriate and centralized physical storage space for the materials
  • Provide archival appraisal and control across campus
  • Develop cataloging services to accelerate research opportunities and improve access
  • Completion of a digital preservation repository

Bagit: Transferring Content for Digital Preservation. Library of Congress. September 29, 2009.

Short video on YouTube about BagIt, a tool from The Library of Congress, California Digital Library and Stanford University. They have developed guidelines for creating, moving and verifying standardized digital containers, called "bags." BagIt requires a bag declaration, list of contents, and the actual content.

Archiving Is For E-discovery; Backup Is For Recovery. Mathew Lodge. The Metropolitan Corporate Counsel. September 01, 2009.

There are challenges with court requests for the discovery of information from backup tapes. If backup tapes are used for information retrieval then they are accessible for e-discovery. But they were never designed for this. Many are doing archiving, but that has different meanings to people. "Active archiving is different: it's a way of centrally managing the storage, retention and hold of information while ensuring "live" (or active) access to any item." It means to move objects to a central repository and provide access to users.

Purple Cows and Fringy Propositions. Carol Minton Morris. D-Lib Magazine. September/October 2009.

Notes from the Fringe Festival. "At every stage of the Bodleian Library's development, Oxford changed practices and policies, and improved – first analog and later digital – technologies in response to changes in the world beyond the Library. Realization is still a catalyst for change." The most useful metaphor for the repository is the internet. Institutions like Oxford create institutional repositories as components of larger library service platforms, not stand-alone silos. Clifford Lynch said we may be better with incremental, structured assessments rather than open-ended preservation commitments. We should aim at preserving digital objects "for the next 20 years with subsequent assessments instead of aiming to preserve them forever." Repositories should look at collecting works from scholars at the end of their careers and create a legacy. Repositories will move beyond educational organizations, so we should look at being involved in that.