Friday, November 28, 2008

Digital Preservation Matters - 28 November 2008

The Future of Repositories? Patterns for (Cross-) Repository Architectures. Andreas Aschenbrenner, et al. D-Lib Magazine. November/December 2008.

Repositories have been created mostly by academic institutions to share scholarly works, for the most part using Fedora, DSpace and EPrints. While it is important to look at manageability, cost efficiency, and functionalities, we need to keep our focus on the real end user (the Scholar). The OpenDOAR directory lists over 1200 repositories. The repository adoption curve shows cycles, trends, and developments. “It is the social and political issues that have the most significant effect on the scholarly user and whether or not that user decides to use a repository.” The repository's primary mission is to disseminate the university's primary output. Researchers, not institutions, are the most important users of repositories. The benefits of repositories may not be clear to researchers, and the repository needs to “become a natural part of the user's daily work environment.” To do this we should focus on features such as:

  • Preserve the user's intellectual assets in a long-term trusted digital repository
  • Allow scientific collaboration through reuse of publications as well as primary data
  • Embed repositories into the user's scientific workflows and technology (workbench)
  • Customize the repository to the local user needs and technology
  • Manage intellectual property rights and security

Individual repositories may not be able to address all these issues. Preservation is one of the main motivators for people to use a repository. “Trust in a stable and secure repository service is established through the repository's policies, status among peers, and added-value services.” Users want someone to take responsibility for the servers and tools. Trust depends on:

  • The impact a service has on users' daily lives
  • How the service blends into their routine,
  • If the repository's policies and benefits works for the users.


Managing the collective Collection. Richard Ovenden. OCLC. 6 November 2008. [pdf]

A PowerPoint presentation on managing a collection in the future. Looks at Uniformity vs. Uniqueness, and the sameness of e-resources. The collective collection is now an aggregated digital collection rather than a distributed print collection. Access to the core aggregated collection is no longer a factor of time and craft but one of money. With this new sense of uniformity, uniqueness has a new value.

Local unique: Sensible stewardship of locally-generated assets:

Institutional repositories
University archives
Research Data

Global unique: Selected and curated content that has been actively acquired through competition

“Traditional” Special collections
Personal digital collections
Copy-specific printed books

Personal digital collections: new phenomenon, new problem.

Acquisition from older media

New management issues

Implications of Google: Google is not curated!
Preservation of the unique more important than ever.
Who will bear the cost of keeping print?
New models of collaboration


Expectations of the Screenager Generation. Lynn Silipigni Connaway. OCLC. 6 November 2008. [pdf]

Lot of information here. Some notes: Some attitudes of the newer generation: Information is information; Media formats don’t matter; Visual learners; Different research skills. They meet Information Needs mostly through the Internet or other people. They are attracted to resources based on convenience, immediate answers, and no cost. They prefer to do their own research. They don’t use libraries because they don’t know they, they are satisfied with other sources, the library takes too long or too difficult to use. The image of libraries is Books. They do not think of a library as an information resource. Search engines are trusted about the same as a library. What can we do? Encourage, promote, use creative marketing, build relationships, understand their needs better.


Digital preservation of e-journals in 2008: Urgent Action revisited. Portico. January 2008. [pdf]

The document has been out for a while, but I found it interesting in light of current efforts. It presents the results of a survey concerning eJournals. The survey was designed to:

  1. Analyze attitudes and priorities that can be used to guide Portico
  2. Assist library directors in prioritizing and allocating limited resources.

Here are some of the findings:

  • 76% said they do not yet participate in an e-journal preservation initiative.
  • 71% felt it would be unacceptable to lose access to e-journal materials permanently
  • 82% agreed that “libraries need to support community preservation initiatives because it’s the right thing to do.”
  • 73% agreed that “our library should ensure that e-journals are preserved somewhere
  • 4% believed preservation could be achieved by publishers holding redundant copies of eJournals

Libraries are unsure about how urgent the issue is and whether they need to take any action in the next two years. This appears to follow the interest of the faculty in the issue. Where the library was interested in eJournal preservation, 74% had been approached by faculty on the issue. When the library was not interested, only 34% had ever been approached by faculty, and less than 10% had ever been approached by faculty more than twice. Many libraries feel the issue is complicated and are not sure who should preserve the eJournals. They are uncertain about the best approach, and there are competing priorities. “Research institutions are far more likely than teaching institutions to have taken action on e-journal preservation.” Most libraries do not have an established digital preservation budget, and the money is borrowed from other areas, such as the collections budget.

Friday, November 21, 2008

Digital Preservation Matters - 21 November 2008

Archives: Challenges and Responses. Jim Michalko. OCLC. 6 November 2008. [pdf]

Interesting view of ‘The Collective Collection’. A framework for representing content.

  • Published Content: books, journals, newspapers, scores, maps, etc.
  • Special Collections: Rare books, local histories, photos, archives, theses, objects, etc.
  • Open Web Content: Web resources, open source software, newspaper archives, images, etc.
  • Institutional Content: ePrints, reports, learning objects, courseware, manuals, research, data, etc.
Describes an End-to-End Archival Processing Flow:
Select>Deliver>Describe>Acquire>Appraise>Survey>Disclose>Discover


Managing the Collective Collection: Shared Print. Constance Malpas. OCLC. 6 November 2008. [pdf]

Concern that many print holdings will be ‘de-duped’ and that there will not be enough to maintain the title. Some approaches are offsite storage, digitization, distributed print archives. “Without system-wide frameworks in place, libraries will be unable to make decisions that effectively balance risk and opportunity with regard to de-accessioning of print materials.” The average institutional holdings for in WorldCat: for serials=13; for books=9. Up to 40% of book titles have a single institution holding. There is a need for a progressive preservation strategy.


Ancient IBM drive rescues Apollo moon data. Tom Jowitt. Computerworld. November 12, 2008.

Data gathered by the Apollo missions to the moon 40 years ago looks like it may be recovered after all, thanks to a donation of an “ancient” IBM tape drive. The mission data had been recorded onto 173 data tapes, which had then been 'misplaced' before they could be archived. The tapes have been found but now they did not have a drive to read the data; one has been found at the Australian Computer Museum Society. It will require some maintenance and to restore to working condition. "It's going to have to be a custom job to get it working again," which may take several months.


Google to archive 10 million Life magazine photos. Heather Havenstein. Computerworld. November 18, 2008.

Google plans to archive as many as 10 million images from the Life magazine archives, and about 20% are already online. Some of the images date back to the 1750s; many have never been published. The search archive is here.


PREMIS With a Fresh Coat of Paint. Brian F. Lavoie. D-Lib Magazine. May/June 2008.

Highlights from the Revision of the PREMIS Data Dictionary for Preservation Metadata. This looks at PREMIS 2.0 and the changes made:

  • Update to the data model clarifying relation between Rights and Agents, and Events and Agents
  • Completely revised and expanded Rights entity: a more complete description of rights statements
  • A detailed, structured set of semantic units to record information about significant properties
  • Added the ability to accommodate metadata from non-PREMIS specifications
  • A suggested registry to be created of suggested values for semantic units

Thursday, November 20, 2008

Top five IT spending priorities for hard times

Tom Sullivan. ComputerWorld. November 19, 2008.

With the current economic times, organizations are busy looking to see what costs they can cut. Analysts agree these areas need to be funded.
  1. Storage: Disks and management software. For many the largest expenditure is storage. Data doubles yearly
  2. Business intelligence: Niche analytics. Information and resources to help accomplish keys goals.
  3. Optimizing resources. Get the most out of what you already have.
  4. Security. Keeping the resources secure.
  5. Cloud computing: Business solutions.

PC Magazine will be online only

Stephanie Clifford. International Herald Tribune. November 19, 2008.

Ziff Davis Media announced it was ending print publication of its 27-year-old flagship PC Magazine; following the January 2009 issue, it will be online only. "The viability for us to continue to publish in print just isn't there anymore." PC Magazine derives most of its profits from its Web site. More than 80 percent of the profit and about 70 percent of the revenue come from the digital business. This is not too much of an adjustment since all content goes online first, and then the print version has been choosing what it wants to print.

A number of other magazines have ended their print publications. The magazines that have gone to online only have been those that are declining. "Magazines in general are going to be dependent on print advertising for a long time into the future."

Massive EU online library looks to compete with Google

PhysOrg.com. November 19, 2008.

The European Union is launching the Europeana digital library, an online digest of Europe's cultural heritage, consisting of millions of digital objects, including books, film, photographs, paintings, sound files, maps, manuscripts, newspapers, and documents.

The prototype will contain about two million digital items already in the public domain. By 2010, the date when Europeana is due to be fully operational, the aim is to have 10 million works available of the estimated 2.5 billion books in Europe's more common libraries. The project plans to be available in 21 languages, though English, French and German will be most prevalent early on.

Thursday, November 13, 2008

Digital Preservation Matters - 14 November 2008

Library of Congress Digital Preservation Newsletter. Library of Congress. November 2008.

There are three interesting items in the November newsletter:

1. The NDIIPP Preserving Digital Public Television Project is building infrastructure, creating standards and obtaining resources. The project is trying to create a consistent approach to digital curation among those who produce PBS programs. Their metadata schema includes four elements: PBCore (a standard developed by and for public media organizations), METS rights, MODS and PREMIS. The goal is to put the content in the Library’s National Audio-Visual Conservation Center where it will be preserved on servers and data tapes. This will support digital archiving and access for public television and radio programs in the US. Many stations are unsure about what to do with their programs for the long term and the American Archive is seen as a solution.

2. Digitization Guidelines: An audiovisual working group will set standards and guidelines for digitizing audiovisual materials. The guidelines will cover criteria such as evaluating image characteristics and establishing metadata elements. The recommendations will be posted on two Web sites:

www.digitizationguidelines.gov/stillimages/

www.digitizationguidelines.gov/audio-visual/

3. Data Archive Technology Alliance: A meeting was held to establish a network of data archives to help develop shared technologies for the future. They hope to set standards for shared, open-source and community developed technologies for data curation, preservation, and data sharing. It is critical to clearly define the purpose and outcome of the effort. Those involved will develop a shared inventory of their tools, services, and also list new developments to enhance data stewardship.


JHOVE2 project underway. Stephen Abrams. Email. November 6, 2008.

The JHOVE tool has been an important part of digital repository and preservation workflows. It has a number of limitations and a group is starting a two-year project to develop a next-generation JHOVE2 architecture for format-aware characterization. Among the enhancements planned for JHOVE2 are:

· Support for: signature-based identification, extraction, validation, and rules-based assessment

· A data model supporting complex multi-file objects and arbitrarily-nested container objects

· Streamlined APIs for integrating JHOVE2 in systems, services, and workflows

· Increased performance

· Standardized error handling

· A generic plug-in mechanism supporting stateful multi-module processing;

· Availability under the BSD open source license


Planetarium - Planets Newsletter Issue 5. 22 October 2008 [PDF]

The newsletter includes several items about Planets (Preservation and Long-term Access through Networked Services) which is a European to address digital preservation challenges. Here are a few items from the newsletter: Project Planets will provide the technology component of The British Library digital preservation solution.

The preservation planning tool Plato implements the PLANETS Preservation Planning approach. It looks and guides users through four steps:

  1. define context and requirements;
  2. select potential actions and evaluate them on sample content;
  3. analyze outcomes and;
  4. define a preservation plan based on this empirical evidence.

Digital preservation activities can only succeed if they consider the wider strategy, policy, goals, and constraints of the institution that undertakes them. For digital preservation solutions to succeed it is essential to go beyond the technical properties of the digital objects to be preserved, and to understand the and institutional framework in which data, documents and records are preserved. The biggest barriers to preservation are:

  1. lack of expertise
  2. funding and
  3. buy-in at senior level.


Cisco unveils a router for the 'Zettabyte Era'. Matt Hamblen. Computerworld. November 11, 2008.

Cisco introduced the "Zettabyte Era," and announced the Aggregation Services Router (ASR) 9000, the next generation of extreme networking. They believe service providers need to prepare for petabytes or even exabytes data from video applications which need faster routing. “Instead of needing switching for petabytes or even exabytes of data, the zettabyte will soon be the preferred term, equal to 10 to the power of 18”.


In praise of ... preserving digital memories. Editorial. The Guardian. September 30, 2008.

Some people are thinking centuries ahead. The British Library hosted the iPres conference to work out ways to preserve data for future generations. Since most everything is in digital form now, this is a difficult thing to do. By 2011 “it is expected that half of all content created online will fall by the wayside.” There is no Rosetta Stone for digital but progress is being made.


Skills, Role & Career Structure of Data Scientists & Curators: Assessment of Current Practice & Future Needs. Alma Swan, Sheridan Brown. JISC. 31 July 2008.

The report of a study that looks at those who work with data

It identifies four roles, which may overlap

  • Data Creator: Researchers who produce and are experts in handling, manipulating and using data
  • Data Scientist: Those who work where the research is carried out and may be involved in creative enquiry and analysis
  • Data Manager: Those who take responsibility for computing facilities, storage, continuing access and preservation of data
  • Data Librarian: Librarians trained and specializing in the curation, preservation and archiving of data

There is a continuing challenge to make sure people have the skills needed. Three main potential roles for the library:

  1. Training researchers to be more data-aware
  2. Adopt a data archiving and preservation role; provide services through institutional repositories
  3. Training of data librarians

Caring for the data frees data scientists from the task and allows them to focus on other priorities. Data issues are moving so fast that periodic updating is much more effective than an early, intensive training with no follow-up. Some institutions offer training courses and workshops on data-related topics.

Tuesday, November 11, 2008

JHOVE2 project underway

From: stephen.abrams@ucop.edu [mailto:stephen.abrams@ucop.edu]
Sent: Thursday, November 06, 2008 3:43 PM
JHOVE2 project underway

The open source JHOVE characterization tool has proven to be an important
component of many digital repository and preservation workflows. However, its
widespread use over the past four years has revealed a number of limitations
imposed by idiosyncrasies of design and implementation. The California Digital
Library (CDL), Portico, and Stanford University have received funding from the
Library of Congress, under its National Digital Information Infrastructure
Preservation Program (NDIIPP) initiative, to collaborate on a two-year project
to develop a next-generation JHOVE2 architecture for format-aware
characterization.

Among the enhancements planned for JHOVE2 are:

* Support for four specific aspects of characterization: signature-based
identification, feature extraction, validation, and rules-based assessment
* A more sophisticated data model supporting complex multi-file objects and
arbitrarily-nested container objects
* Streamlined APIs to facilitate the integration of JHOVE2 technology in
systems, services, and workflows
* Increased performance
* Standardized error handling
* A generic plug-in mechanism supporting stateful multi-module processing;
* Availability under the BSD open source license

To help focus project activities we have recruited a distinguished advisory
board to represent the interests of the larger stakeholder community. The board
includes participants from the following international memory institutions,
projects, and vendors:

* Deutsche Nationalbibliothek (DNB)
* Ex Libris
* Fedora Commons
* Florida Center for Library Automation (FCLA)
* Harvard University / GDFR
* Koninklijke Bibliotheek (KB)
* MIT / DSpace
* National Archives (TNA)
* National Archives and Records Administration (NARA)
* National Library of Australia (NLA)
* National Library of New Zealand (NLNZ)
* Planets project

The project partners are currently engaged in a public needs assessment and
requirements gathering phase. A provisional set of use cases and functional
requirements has already been reviewed by the JHOVE2 advisory board.

The JHOVE2 team welcomes input from the preservation community, and would
appreciate feedback on the functional requirements and any interesting test
data that have emerged from experience with the current JHOVE tool.

The functional requirements, along with other project information, is available
on the JHOVE2 project wiki
<http://confluence.ucop.edu/display/JHOVE2Info/Home>. Feedback on project goals
and deliverables can be submitted through the JHOVE2 public mailing lists.

To subscribe to the JHOVE2-TechTalk-L mailing list, intended for in-depth
discussion of substantive issues, please send an email to <listserv at ucop dot
edu> with an empty subject line and a message stating:

SUB JHOVE2-TECHTALK-L Your Name

Likewise, to subscribe to the JHOVE2-Announce-L mailing list, intended for
announcements of general interest to the JHOVE2 community, please send an email
to <listserv at ucop dot edu> with an empty subject line and a message stating:


SUB JHOVE2-ANNOUNCE-L Your Name

To begin our public outreach, team members recently presented a summary of
project activities at the iPRES 2008 conference in London, entitled "What? So
What? The Next-Generation JHOVE2 Architecture for Format-Aware
Characterization," reflecting our view of characterization as encompassing both
intrinsic properties and extrinsic assessments of digital objects.

Through the sponsorship of the Koninklijke Bibliotheek and the British Library,
we also held an invitational meeting on JHOVE2 following the iPRES conference
as a opportunity for a substantive discussion of the project with European
stakeholders.

A similar event, focused on a North American audience, will be held as a
Birds-of-a-Feather session at the upcoming DLF Fall Forum in Providence, Rhode
Island, on November 13. Participants at this event are asked to review closely
the functional requirements and other relevant materials available on the
project wiki at <http://confluence.ucop.edu/display/JHOVE2Info/Home> prior to
the session.

Future project progress will be documented periodically on the wiki.

Stephen Abrams, CDL
Evan Owens, Portico
Tom Cramer, Stanford University

on behalf of the JHOVE2 project team

Friday, November 07, 2008

Digital Preservation Matters - 07 November 2008

Digital Preservation Policies Study. Neil Beagrie, et al. JISC. 30 October 2008. [pdf]

This study will become part of the foundation documents for digital preservation. It provides a model for digital preservation policies and looks at the role of digital preservation in supporting and delivering strategies for educational institutions. The study also includes 1) a model/framework for digital preservation policies; 2) a series of mappings of digital preservation to other key institutional strategies in universities, libraries, and Records Management. This is intended to help institutions develop appropriate digital preservation policies. Some notes:

Long-term access relies heavily on digital preservation strategies being in place and we should focus on making sure they are in place. Developing a preservation policy will only be worthwhile if it is linked to core institutional strategies: it cannot be effective in isolation. One section outlines well steps that must be taken to implement a digital preservation solution. Policies should outline what is preserved and what is excluded. Digital preservation is a means, not an end in itself. Any digital preservation policy must be seen in terms of the strategies of the institution. An appendix has created a summary of the strategy aims and objectives for certain institutions and the implications for digital preservation activities within the organization. Definitely worth studying the approximately 120 pages.


Predicting the Longevity of DVDR Media by Periodic Analysis of Parity, Jitter, and ECC Performance Parameters. Daniel Wells. BYU Thesis. July 14, 2008.

The summarizing statement for me was: “there is currently extreme reluctance to use DVD-R’s for future digital archives as well as justifiable concern that existing DVD archives are at risk.” We have certainly found this in our own experience, having very high failure rates with some collections.

The abstract: For the last ten years, DVD-R media have played an important role in the storage of large amounts of digital data throughout the world. During this time it was assumed that the DVD-R was as long-lasting and stable as its predecessor, the CD-R. Several reports have surfaced over the last few years questioning the DVD-R's ability to maintain many of its claims regarding archival quality life spans. These reports have shown a wide range of longevity between the different brands. While some DVD-Rs may last a while, others may result in an early and unexpected failure. Compounding this problem is the lack of information available for consumers to know the quality of the media they own. While the industry works on devising a standard for labeling the quality of future media, it is currently up to the consumer to pay close attention to their own DVD-R archives and work diligently to prevent data loss. This research shows that through accelerated aging and the use of logistic regression analysis on data collected through periodic monitoring of disc read-back errors it is possible to accurately predict unrecoverable failures in the test discs. This study analyzed various measurements of PIE errors, PIE8 Sum errors, POF errors and jitter data from three areas of the disc: the whole disc, the region of the disc where it first failed as well as the last half of the disc. From this data five unique predictive equations were produced, each with the ability to predict disc failure. In conclusion, the relative value of these equations for end-of-life predictions is discussed.


DCC Curation Lifecycle Model. Chris Rusbridge. Digital Curation Centre Blog. 8 October 2008.

The model they have put together is available in graphical form. Like all models it is of course a compromise between succinctness and completeness. They plan it to use it to structure information on standards and as an entry point to the DCC web site and it is explained in an article in the International Journal of Digital Curation. The model is a high level overview of the stages required for successful curation, and complements OAIS and other standards. The actions for Digital Objects or Databases are:

  • Full Lifecycle Actions: Description and Representation Information; Preservation Planning; Community Watch and Participation Curate and Preserve
  • Sequential Actions: Conceptualise; Create or Receive; Appraise and Select; Ingest; Preservation Action; Store; Access, Use and Reuse; Transform
  • Occasional Actions: Dispose; Reappraise; Migrate

The model is part of a larger plan to take a detailed look at processes, costs, governance and implementation.


WVU Libraries Selected for Digital Pilot Project. September 15, 2008.

The West Virginia University Libraries are among 14 institutions picked to participate in a book digitization pilot project led by PALINET. Each institution will submit five to ten books to be digitized during a pilot project. After that, the initial target will be to digitize 60,000 books and put them in the Internet Archive. “Another benefit of the project is preservation.” The Rare Books Curator, said a dilemma is allowing access and yet providing for the maximum amount of preservation. “These books are old and they’re fragile, and there is always the difficulty of preserving a book that is used a lot. Maintaining that balance is essential. It’s a fine line that we’re always on. Book digitization is a way of providing access and assuring preservation of the original.”