Digital Preservation Matters - 14 December 2007

CNI in DC: Integrated Digital Library on the Fedora Platform. David Kennedy. December 12, 2007.
This is one item in a blog report of the CNI conference and the Digital Curation Conference: National Perspectives conference. It is worth reading the others also. University of Maryland uses Fedora not for the IR (they use DSpace), but for the digital collections. They wanted to use it to build in sustainability and transitions. Some of their organizational issues were institutional support, development time, off the shelf vs. Fedora-type system, and others. It took almost 18 months of development. They found working with Fedora similar to java, and "programmer friendly." They use a hybrid metadata schema with METS wrappers. What have they learned?
  • metadata - uses a complex schema, but don't force users to understand the underlying schema
  • authentication - not dealt with yet, but need to do more work
  • archival storage - greater need for more space
  • need to have Quality Control standards when modifying objects and creating metadata

They have at least three or four developers working on the project, as well as a number of other team members. Since they use their own metadata scheme, it may not be possible to offer their work to others, so if they were to do it again, they may use a standard metadata schema.

New 1 day AIIM PDF/Archive Training Program. Atle Skjekkeland. AIIM Knowledge Center Blog. December 12, 2007.
The AIIM organization intends to introduce a new PDF/A training program next year. It will be focused on the use of PDF/A and its use as a file format in the archiving of data. The concept of PDF/Archive began as an AIIM standards committee in 2002 and has been accepted as an ISO standard.

Digital Preservation Pioneers: Margaret Hedstrom. Resource Shelf. December 13, 2007.
A brief bio about Margaret Hedstrom who has done a great deal for digital preservation. Her works include several articles that are definitely worth reading: Digital Preservation: A Time Bomb for Digital Libraries, It’s About Time, Invest to Save, and Incentives for Data Producers to Create Archive-Ready Data Sets.

Pooling Scholars’ Digital Resources. Andy Guess. Inside Higher Ed. December 12, 2007.
Access to documents and copyright issues have been two factors slowing the development of online scholarly repositories. George Mason University seeks to bypass libraries entirely and go directly to scholars by creating an open archive of scholarly resources in the public domain. They are creating a way for scholars to upload existing documents, make them text –searchable, and put them in a database available to the public. It will use the Zotero plug-in for Firebox, which stores web pages, collects citations and lets scholars annotate and organize online documents. It is funded by a two year Mellon grant.

Manakin: A New Face for DSpace. Scott Phillips et al. D-Lib Magazine. November/December 2007.
The increasing online scholarly communication makes digital repositories more important for preserving and managing information. This looks at Manakin which was designed to help create individual, customized repository interfaces separate from the underlying repository, which is currently DSpace. It helps a library ‘brand’ its content, better understanding of the metadata, and provides tools to create extensions of the repository. It uses schema, aspects and themes as the basic components. There is a movement to adopt Manakin as the default DSpace user interface.

SOA. IT Strategy Guide. Dave Linthicum. InforWorld. December 10, 2007. [pdf]
The essence of an organization must be identified so all activities influencing that can be identified and improved. This is the first step in realizing the benefits of a service-oriented architecture (SOA). This requires not only technology, but also a shift in the way business and IT work together. Organizations need to adopt clearly defined roles within an organization, allowing the stakeholders to understand each other’s goals and tasks. This includes understanding both the human aspects and the lifecycle management of the services. Management support for the strategy is crucial. This requires an investment in people and technology to establish the appropriate context for the strategy. “the hardest part isn’t the technology; it’s redrawing the business processes that provide the basis for the architecture — and the often contentious reshuffling of roles and responsibilities that ensues. It is important to define the value, get investment and commitment from the top, and concentrate on the long term.”

Census of Institutional Repositories in the U.S. Soo Young Rieh, et al. D-Lib Magazine. November/December 2007.
There are great uncertainties underlying institutional repositories regarding practices, policies, content, systems, and other infrastructure issues. This article looks at IR’s in five areas: leaders, funding, content, contributors, and systems, and how they are perceived. Some notes:

  • college and university libraries are the driving force behind most IRs,
  • vast majority of survey respondents have done no planning of IRs to date
  • only 10.8% respondents have actually implemented an IR
    • 52.1% have been operational less than one year,
    • 27.1% have been operational between one and two years,
  • respondents agree that the funding comes or will come from the library, typically by absorbing costs into routine library operating expenses
  • Majority of existing IRs contain fewer than 1000 items
  • DSpace is the most prevalent system for pilot-testing and use. Fedora and ContentDM are regularly pilot-tested but rarely implemented.

“Once each academic institution has a clear vision and definition of what the IR will be for its own community, subsequent decisions such as content recruitment, software redesigning, file formats guaranteed in perpetuity, metadata, and policies can flow from that vision.”

Digital Preservation Matters - 07 December 2007

Ten years after. Priscilla Caplan. Library Hi Tech. Editorial. Vol. 25 N. 4 2007.

This editorial from Priscilla reflects on the progress made in digital preservation in the past 10 years. Digital preservation in no longer a little known concept, but a problem to be solved. It is part of the mainstream. Much has been accomplished, though there is still a lot of progress to be made. Europe has a different approach; it sees this as “part of a set of curation activities.” Their approach would “help reduce our apparent confusion between institutional repositories and preservation repositories.” Few institutions will have the resources to run a true preservation repository. “Digital curation may be departmental, and archiving institutional, but I believe preservation will have to be consortial.” The US approach has been to focus on short term projects rather than long term infrastructure. There are still some basic infrastructure needs: schema, conversion utilities, and registries. We also need to develop centers to promote and assist digital preservation. We need to provide more education for both data creators and data curators.

Standards Group Accepts PDF. Sumner Lemon. IDG News Service. December 05, 2007.

Adobe PDF 1.7 has been approved as an ISO standard. The ballot for approval of PDF 1.7 to become the ISO 32000 Standard was passed by a vote of

13-1. Specialized subsets of PDF (PDF/Archive etc) had been proposed or approved as standards by ISO. The approval of PDF 1.7 is now an "umbrella" standard to unify these different subsets. Adobe gives up some control over the development of future versions.

Project SPECTRa: JISC Final Report. March 2007.

The principal aim of the SPECTRa project (Submission, Preservation and Exposure of Chemistry Teaching and Research Data) was to provide the high-volume ingest and reuse of experimental data through institutional repositories. It used the DSpace platform because of existing infrastructure and previous experience. They developed Open Source software tools and customizations which could easily be incorporated within chemists' workflows. Metadata was based on Dublin Core. They felt that serious preservation work must be at the institutional, rather than departmental, level. The metadata, identifiers, and normalizing data in open formats would make long-term preservation more possible. Preservation of chemistry data file formats is a difficult area. Their approach was to capture essential metadata at submission or extract it automatically from the data files if possible. All files should be validated against specifications. Depositing files in an institutional repository should guarantee against the loss or corruption of the raw data, but this is insufficient to ensure future usability. A policy of format migration will be necessary for much of the data.

Other project's findings included:

• it has integrated the need for long-term management of experimental chemistry data with the maturing technology and organizational capability of digital repositories;

• scientific data repositories are more complex to build and maintain than are those designed primarily for text-based materials;

• the specific needs of individual scientific disciplines are best met by discipline-specific tools, though this is a resource-intensive process;

• institutional repository managers need to understand the working practices of researchers in order to develop repository services that meet their requirements;

• IPR issues relating to the ownership and reuse of scientific data are complex, and would benefit from authoritative guidance based on UK and EU law.

Google Plans Service to Store Users' Data. Kevin J. Delaney. Wall Street Journal. November 27, 2007.

Google is developing a service to let users store contents of their computers, such as word-processing documents, digital music, video clips and images. It would let users access their files via the Internet from different computers and share them online with friends. The service would face questions on issues such as data privacy, copyright, cost, and technical challenges of offering service without interruption.

Iron Mountain Acquires Xepa Digital, LLP. Press Release. November 19, 2007.

Iron Mountain acquired Xepa, a company that deals with converting analog and out of date digital audio and video to high resolution digital file formats. They will offer on-site digital conversion for the items being stored.

IT Disasters

The top 10 IT disasters of all time. Colin Barker 22 Nov 2007.

A list of some of the worst IT-related disasters and failures caused by faulty hardware and software or human error.

  1. Faulty Soviet early warning system nearly causes WWIII (1983)
  2. The AT&T network collapse (1990)
  3. The explosion of the Ariane 5 (1996)
  4. Airbus A380 suffers from incompatible software issues (2006)
  5. Mars Climate Observer metric problem (1998)
  6. EDS and the Child Support Agency (2004)
  7. The two-digit year-2000 problem (1999/2000)
  8. When the laptops exploded (2006)
  9. Siemens and the passport system (1999)
  10. LA Airport flights grounded (2007)