Friday, March 27, 2009

Digital Preservation Matters - 27 March 2009

Farewell to the Printed Monograph. Scott Jaschik. Inside Higher Ed. March 23, 2009.

The University of Michigan Press announced it will shift its scholarly publishing from a traditional print operation to primarily digital. They expect most of their monographs to be released only in digital editions. Readers will still be able to use print-on-demand systems, but the press will consider the digital monograph the norm. They say it's time to stop trying to make the old economics of scholarly publishing work. The press expects to publish more books, and to distribute them electronically to a much broader audience. "We will certainly be able to publish books that would not have survived economic tests and we'll be able to give all of our books much broader distribution." Michigan plans to develop site licenses so that libraries could gain access to all of the press's books over the course of a year for a flat rate.

Other presses are also experimenting with the digital format. Pennsylvania State University Press publishes a few books a year in digital, open access format. All chapters are provided in PDF format, half in a format to download and print, and half in read only. Readers may pay for print-on-demand versions.


PREMIS Data Dictionary for Preservation Metadata. Sarah Higgins. DCC Watch Report. 25 March 2009.

This is a 3 page overview to the PREMIS data dictionary, “the current authoritative metadata standard for digital preservation” and a brief look at its use in an Institutional Repository.


Thomson Introduces mp3HD File Format. Press Release. March 19, 2009.

The company has introduced the new mp3HD format which “allows mathematically lossless compression of audio material while preserving backward compatibility to the mp3 standard.” The mp3HD files have additional information, that when combines with the mp3 portion of the file, can be played on an mp3HD-capable player. Standard mp3 players would play only the mp3 portion of the file. A program can create mp3HD files from stereo material in 16 bit 44.1Khz wav files. It is available on Linux and Windows.


Internet Archive to unveil massive Wayback Machine data center. Lucas Mearian. Computerworld. March 19, 2009.

The Internet Archive has a new computer that fits in a 20-foot-long outdoor metal cargo container filled with 63 server clusters with 4.5 million petabytes of storage and 1TB of memory. They have 151 billion archived web pages in addition to software, books and a moving image collection with 150,000 items and 200,000 audio clips. The Internet Archive also works with curators in about 100 libraries to help guide the Internet crawls.


Tuesday, March 24, 2009

Challenges facing Church history

R. Scott Lloyd. Church News. March 14, 2009.
Lecture presented at the Church History Symposium at BYU on "Preserving the History of the Latter-day Saints."
Mark L. Grover, a subject librarian at BYU who has spent 30 years gathering the history of the Church in Latin America, lamented that original records and documents are often in jeopardy of being destroyed by those who don't understand or appreciate their significance. There are several approaches being taken. "Some significant historical material surely has vanished, but much of it is still intact in private possession, and there is an increasingly greater probability that digital technology will improve the preservation odds."

Friday, March 20, 2009

Digital Preservation Matters - 20 March 2009

International Data curation Education Action (IDEA) Working Group: A Report from the Second Workshop of the IDEA. Carolyn Hank, Joy Davidson. D-Lib Magazine. March 2009.
This is a report of the workshops held in December, with links to programs and resources. In general the article acknowledges that curation of digital assets is a central challenge and opportunity for libraries and other data organizations. In order to meet this challenge, skilled professionals are needed who are trained “to perform, manage, and respond to a range of procedures, processes and challenges across the life-cycle of digital objects.” The presentations discuss developing a graduate-level curriculum to prepare master's students to work in the field of digital curation. Among the curricula at the institutions are: preparing faculty to research and teach in the field; data collection and management, knowledge representation, digital preservation and archiving, data standards, and policy. Collaboration between schools is important since the all recognize that no school can do it all. One item in particular: The skills, role and career structure of data scientists and curators: An assessment of current practice and future needs.


Report on the 2nd Ibero-American Conference on Electronic Publishing in the Context of Scholarly Communication (CIPECC 2008). Ana Alice Baptista. D-Lib Magazine. March 2009.
Some notes from this article:
  • IR (institutional repository) initiatives occur mostly in public universities
  • the main motivation for implementing an IR: answer specific demands and needs to digitally store the institution's scientific memory, rather than support for Open Access principles.
  • 40% of the analyzed IRs are maintained and coordinated by two or more sectors within each university
  • the databases with more than 3,000 documents are, in practice, OPACs with links to the full text versions.
  • the next step forward: provide new metrics on the impact factor (Scientometrics)

Items in this newsletter include:
  • CBS program on “Bye, Tech: Dealing with Data Rot.” Looks at obsolescence of computer hardware, software, and formats. “So the basic lesson is: Look after your own data and make sure that you take steps to keep it moving onto new formats about once every ten years." There are links where you can both read and watch the program. Their conclusions:
1. You should convert whatever you can afford to digital.
2. Store your tapes and films in a cool, dry place.
3. And above all, remain vigilant. As you now know, every ten years or so, you're going to have to transfer all your important memories to whatever format is current at the time, because there never has been, and there never will be, a recording format that lasts forever.
  • Federal Agencies Collaborate on Digitization Guidelines. A working group is developing best practices for digitizing recorded sound and moving images.


Got Data? A Guide to Data Preservation in the Information Age. (Updated link-August 2015.)  Francine Berman. Communications of the ACM. December 2008.
Digital data is fragile, even though we all assume it will be there when we want it. “The management, organization, access, and preservation of digital data is arguably a "grand challenge" of the information age.” This article looks at the key trends and issues with preservation:
  1. More digital data is being created than there is storage to host it.
  2. Increasingly more policies and regulations require the access, stewardship, and/or preservation of digital data.
  3. Storage costs for digital data are decreasing (but other areas are increasing).
  4. Increasing commercialization of digital data storage and services.
These four trends point to the need to take a comprehensive and coordinated approach to data cyber infrastructure. The greatest challenge in this is to develop a economically sustainable model. One approach is to create a data pyramid to the stewardship options. This shows that multiple solutions for sustainable digital preservation must be devised. There is also a need for ongoing research into and development of solutions that address these technical challenges as well as the economic and social aspects of digital preservation. They add 10 guidelines:
Top 10 Guidelines for Data Stewardship
1. Make a plan.
2. Be aware of data costs and include them in your overall IT budget.
3. Associate metadata with your data.
4. Make multiple copies of valuable data. Store some off-site and in different systems;
5. Plan for the transition and cost of digital data to new storage media ahead of time.
6. Plan for transitions in data stewardship.
7. Determine the level of "trust" required when choosing how to archive data.
8. Tailor plans for preservation and access to the expected use.
9. Pay attention to security and the integrity of your data.
10. Know the regulations.

The Library of Congress has been moving into the digital world, and one way is by a scanning project with the Internet Archive that has put 25,000 books online to date. "To preserve book knowledge and book culture means preserving every word of every sentence in the right sequence of pages in the right edition, within the appropriate historical, scholarly and bibliographical context. You must respect what you scan and treat it as an organic whole, not just raw bits of slapdash data." A lot of items that have not literally seen the light of day are being downloaded. The cost is just 10 cents a page.

Friday, March 13, 2009

Digital Preservation Matters - 13 March 2009

Update from the Blue Ribbon Task Force on Sustainable Digital Preservation and Access. Sayeed Choudhury. Preservation and Archiving Special Interest Group (PASIG). November 19, 2008. [11p. pdf]

This is a presentation about the task force. The next item is the report from the task force.

There is a focus in the task force on the economic dimensions of digital preservation. Some notes from the update: “Definition of Economic Sustainability: The set of business, social, technological, and policy mechanisms that encourage the gathering of important information assets into digital preservation systems, and support the indefinite persistence of digital preservation systems, enabling access to and use of the information assets into the long-term future.”

Economically sustainable digital preservation requires:

  • Recognition of the benefits of preservation
  • A process for selecting long term digital materials
  • Ongoing, efficient allocation of resources to digital preservation activities;
  • Organization and governance of digital preservation activities

The task force website: http://brtf.sdsc.edu/


Blue Ribbon Task Force Interim Report: Sustaining the Digital Investment: Issues and Challenges of Economically Sustainable Digital Preservation. December 2008. [78p. pdf]

Digital information is fundamental to modern society but there is no agreement on who is responsible or who should pay for access to and preservation of the information. Creating sustainable economic models for digital access and preservation is a focus of this group. They are looking at the current and best practices and to find or create useful models. This is an urgent task. Access to data in the future requires actions today. “Institutional, enterprise, and community decision makers must be part of the access and preservation solution.” Preserving data now is an investment in the future. Digital information has value far into the future. Sometimes we make best guesses or a hedge against the future. Decisions not made now often cost far more in the future. Without ongoing maintenance digital assets will fall into disrepair. Maintaining the assets is a problem with many sides: technical, legal, financial, and policy. This crosses all industries. There is also an opportunity cost.

Preservation is not a one-time cost; it is an ongoing commitment to a series of costs, requiring ongoing and sustaining resource allocations. Economic sustainability requires

  • Recognition of the benefits of preservation on the part of key decision-makers;
  • Incentives for decision-makers to act in the public interest;
  • A process for selecting digital materials for long-term retention;
  • Mechanisms to secure an ongoing, efficient allocation of resources
  • Appropriate organization and governance of digital preservation activities.

Decision-makers need to be aware of the value-creating opportunities from preservation. Understanding the scope of digital preservation is important. It examines the various economic models now being used; also shows a graph of Types of Information Retained the Longest. It is difficult to separate digital preservation costs from other costs. There is no substitute for a flexible, committed organization dedicated to preserving a collection of digital material. The Final Report of the task force is to be published at the end of 2009.

“Too often, digital preservation is perceived as an activity that is separable from the interests of today’s stakeholders, aimed instead at the needs of future generations. But in practice, digital preservation is very much part of the day-today process of managing digital assets in responsible ways; it is much more about ensuring that valuable digital assets can be handed off in good condition to the next succession of managers or stewards five, ten, or fifteen years down the road than it is about taking actions to benefit generations of users a hundred years hence.”


Fusion-io unveils SSD drives with 1.5GB throughput, 1.2TB capacity. Lucas Mearian. Computerworld. March 13, 2009.

Fusion-io, a Salt Lake based company, has announced a server-based solid-state drive with 1.5GB/sec. throughput. “Currently, the cards come in 160GB, 320GB and 640GB capacities. A 1.28TB card is expected out in the second half of this year.”


Preservation as a Process of a Repository. Tarrant, D. and Hitchcock, S. Sun Preservation and Archiving Special Interest Group. 18 - 21 November 2008. [pdf, ppt, pptx]

The presentation begins with different definitions of repository and Institutional Repository. Lynch defines IR as a set of services and processes, and a commitment to the digital materials created by an organization and its members. Diagrams of processes and OAIS / DCC and other models. Analysis of the preservation process. EPrints and digital preservation and repositories.

Friday, March 06, 2009

Digital Preservation Matters - 6 March 2009

DPE Digital preservation video training course. Digital Preservation Europe. February 2009.

The Digital Preservation Europe group has posted their Digital Preservation Video Training Course on the internet. These videos, from October 2008 cover topics such as

  • Introduction to Digital Preservation
  • OAIS Model and Representation Information
  • Preservation Analysis Workflow and Preservation Descriptive Information
  • Digital Preservation Preparation and Requirements
  • File Formats, Significant Properties
  • Metadata
  • Planning, infrastructure
  • Trusted Repositories

The workshop was to give the participants an understanding of digital preservation, issues, challenges, an understanding of the roles, models and file elements.



Can We Outsource the Preservation of Digital Bits? Peter Murray. DLTJ Blog. March 5, 2009.

With the increasing need large-scale digital preservation storage, The Iron Mountain storage facility may be considered. It is cloud based, and some of the preservation files do not need to be on the expensive SAN storage. A diagram of the architecture is included.



digiGO! — VIdeo Content Support From Front Porch Digital To AFN. Satnews Daily. March 02, 2009.

Front Porch Digital, which recently acquired SAMMA Systems, will install the DIVArchive product for the American Forces Network (AFN) Broadcast Center. The product line includes a semi-automated system for the migration and preservation of videotape to digital files.



iPRES 2008: Proceedings of The Fifth International Conference on Preservation of Digital Objects. British Library. March 2, 2009. [pdf]

This is the compilation of the iPres 2008 proceedings, all 319 pdf pages, which looks at tools and methods for digital preservation. This is the first full collection papers of the conference in addition to presentations of the conference. Some of these have been reviewed earlier, some will be included later.



Samsung stuffs 1.5TB onto three-platter hard drive. Lucas Mearian. Computerworld. March 5, 2009.

Samsung Electronics announced its first 500GB-per-platter hard drive. The hard disk has 1.5TB on three platters. With fewer platters and fewer moving parts, the drive should be more reliable.

Western Digital announced its 500GB per platter, 2TB capacity drive in January. The drive is 40% lower in power consumption in idle mode and 45% lower in reading/writing mode, and has a retail price of $149.



Preservation and Archiving Special Interest Group (PASIG) Fall Meeting. Paul Walk. Ariadne. January 2009.

This is a report on the Sun-PASIG November 2008. There are quite a number of presentation. Some items of interest:

Martha Anderson emphasized the need to preserve 'practice' as well as data, so that even if the technology we use changes, our decisions and thinking behind the processes are preserved. Chris Wood talked about storage: there will be an increase in the use of solid-state storage, but tape and disk will remain viable for some time to come. Tape is a viable storage medium and is still relatively cheap. Blu-Ray optical storage is also a good bet for the medium term. In the next few years, the cost of buying equipment will be less than the cost of running the equipment.

Friday, February 27, 2009

Digital Preservation Matters - 27 February 2009

Understanding PREMIS. Priscilla Caplan. Library of Congress. February 1, 2009. [pdf]

This is an overview of the PREMIS, the preservation metadata standard. There are different types of metadata, descriptive, administrative, structural, and preservation metadata which supports activities intended to ensure the long-term usability of a digital resource. Preservation metadata is "the information a repository uses to support the digital preservation process." PREMIS is not concerned with discovery and access, nor does it try to define detailed format-specific metadata. It defines only that metadata commonly needed to perform preservation functions on all materials; the focus is on the repository system and its management. It can also be a checklist for evaluating possible software purchases. It looks at pieces of information, not elements, which are ways of representing information in a system.

“One of the main principles behind PREMIS is that you need to be very clear about what you are describing.” PREMIS defines five kinds of entities: Intellectual Entities, Objects, Agents, Events and Rights. It also talks about file objects, representation objects, and bitstream objects. It expects that information will be transferred in XML formats.


Preservation at the Network Level: Challenges, Opportunities. Constance Malpas. OCLC. ALA Presentation. January 29, 2009.

Institutional value of print collections is being reassessed as scholarly workflows move to the Web. Digital preservation infrastructure addresses the survival of the content, but doesn’t change the value of print as a distribution medium. Large institutions are shifting resources to digital preservation, while smaller institutions rely on agreements for print preservation. The

MARC 583 tag (Preservation Action Note) is under consideration as a disclosure mechanism. There are 400,000 MARC 583 tags in WorldCat, representing 1 million library holdings. The tags indicate titles for which some type of physical or digital preservation action is scheduled or has already been performed (microfilming, digitization, web archiving, assessment, repair, re-housing, de- acidification, etc.) The greatest challenge may be workflow integration between preservation and technical services. Preservation & Digitization Actions: MARC 583


Fresh start for lost file formats. BBC News. 13 February 2009.

A European project intends to create a universal emulator that can open and play obsolete formats. With this, they hope to ensure that digital materials such as games, websites and multimedia documents are not lost. This will require constant updating to make sure that formats are supported in the future. "Every digital file risks being either lost by degrading or by the technology used to 'read' it disappearing altogether." Without this we risk a “blank spot” in history. “Britain's National Archive estimates that it holds enough information to fill about 580,000 encyclopaedias in formats that are no longer widely available.” They believe that in the long term, emulation is a more workable solution than migration to new formats, which also runs the risk of data corruption and loss.



Probing Question: Can we save today's documents for tomorrow? Adam Eshleman. PhysOrg.com. February 12, 2009.

General article about the difficult of saving digital files. At Penn State, preserving e-mail and text messages is one of the University’s greatest priorities, and especially electronic communications from high-profile people, like the university president. “These files will one day become important historical documents.” Previous presidential papers are on paper, but for the future they will be on a server. “It’s a different research paradigm.” Preserving digital files is not a onetime event. It requires ongoing decisions to keep them readable. There are steps we can take to preserve electronic documents, but there are currently no final answers.


Digital Archivists, Now in Demand. Conrad De Aenlle. The New York Times. February 7, 2009.

Pre-digital information and records need to be adapted to computers and current information needs. “The people entrusted to find a place for this wealth of information are known as digital asset managers, or sometimes as digital archivists and digital preservation officers. Whatever they are called, demand for them is expanding.” Much of their effort is devoted to organizing and protecting material in digital form. Familiarity with information technology is necessary, but there is more to it than that. The need for people in these fields is expected to triple over the next decade, in the public and private sectors.

Thursday, December 04, 2008

Hard drive sounds

Datacent Data Recovery has recorded the sounds of failing hard drives.
Read more about each product and common problems, or listen to the sounds. They advise if you hear the sounds and your drive is still working, to back it up immediately. An interesting site.

PDF/A Competence Center.

The PDF/A Competence Center is a cooperation between companies and experts in PDF technology. The aim is to promote the exchange of information and experience in the area of long-term archiving in accordance with ISO 19005: PDF/A. The site contains links and information about the standard.

Transmitting data from the middle of nowhere.

Lucas Mearian. Computerworld. December 2, 2008.
Interesting look at a marine survey company and the ways it has developed to transmit large amounts of data from remote locations. They have set up servers that reduce the amount of information transmitted by removing duplicate information then replicate the information. The data is backed up using IBM's Tivoli software and a StorageTek tape library with LTO-3 and LTO-4 tape drives for archiving data. "So data movement is the number one problem for anyone in the survey or field scientific world." They use Data Domain storage appliances for data backup / disaster recovery; it has a compression algorithm that is supposed to make sure that transmitted does not already exist on the system at the San Diego data center. It can also stream the data for maximum performance.

Friday, November 28, 2008

Digital Preservation Matters - 28 November 2008

The Future of Repositories? Patterns for (Cross-) Repository Architectures. Andreas Aschenbrenner, et al. D-Lib Magazine. November/December 2008.

Repositories have been created mostly by academic institutions to share scholarly works, for the most part using Fedora, DSpace and EPrints. While it is important to look at manageability, cost efficiency, and functionalities, we need to keep our focus on the real end user (the Scholar). The OpenDOAR directory lists over 1200 repositories. The repository adoption curve shows cycles, trends, and developments. “It is the social and political issues that have the most significant effect on the scholarly user and whether or not that user decides to use a repository.” The repository's primary mission is to disseminate the university's primary output. Researchers, not institutions, are the most important users of repositories. The benefits of repositories may not be clear to researchers, and the repository needs to “become a natural part of the user's daily work environment.” To do this we should focus on features such as:

  • Preserve the user's intellectual assets in a long-term trusted digital repository
  • Allow scientific collaboration through reuse of publications as well as primary data
  • Embed repositories into the user's scientific workflows and technology (workbench)
  • Customize the repository to the local user needs and technology
  • Manage intellectual property rights and security

Individual repositories may not be able to address all these issues. Preservation is one of the main motivators for people to use a repository. “Trust in a stable and secure repository service is established through the repository's policies, status among peers, and added-value services.” Users want someone to take responsibility for the servers and tools. Trust depends on:

  • The impact a service has on users' daily lives
  • How the service blends into their routine,
  • If the repository's policies and benefits works for the users.


Managing the collective Collection. Richard Ovenden. OCLC. 6 November 2008. [pdf]

A PowerPoint presentation on managing a collection in the future. Looks at Uniformity vs. Uniqueness, and the sameness of e-resources. The collective collection is now an aggregated digital collection rather than a distributed print collection. Access to the core aggregated collection is no longer a factor of time and craft but one of money. With this new sense of uniformity, uniqueness has a new value.

Local unique: Sensible stewardship of locally-generated assets:

Institutional repositories
University archives
Research Data

Global unique: Selected and curated content that has been actively acquired through competition

“Traditional” Special collections
Personal digital collections
Copy-specific printed books

Personal digital collections: new phenomenon, new problem.

Acquisition from older media

New management issues

Implications of Google: Google is not curated!
Preservation of the unique more important than ever.
Who will bear the cost of keeping print?
New models of collaboration


Expectations of the Screenager Generation. Lynn Silipigni Connaway. OCLC. 6 November 2008. [pdf]

Lot of information here. Some notes: Some attitudes of the newer generation: Information is information; Media formats don’t matter; Visual learners; Different research skills. They meet Information Needs mostly through the Internet or other people. They are attracted to resources based on convenience, immediate answers, and no cost. They prefer to do their own research. They don’t use libraries because they don’t know they, they are satisfied with other sources, the library takes too long or too difficult to use. The image of libraries is Books. They do not think of a library as an information resource. Search engines are trusted about the same as a library. What can we do? Encourage, promote, use creative marketing, build relationships, understand their needs better.


Digital preservation of e-journals in 2008: Urgent Action revisited. Portico. January 2008. [pdf]

The document has been out for a while, but I found it interesting in light of current efforts. It presents the results of a survey concerning eJournals. The survey was designed to:

  1. Analyze attitudes and priorities that can be used to guide Portico
  2. Assist library directors in prioritizing and allocating limited resources.

Here are some of the findings:

  • 76% said they do not yet participate in an e-journal preservation initiative.
  • 71% felt it would be unacceptable to lose access to e-journal materials permanently
  • 82% agreed that “libraries need to support community preservation initiatives because it’s the right thing to do.”
  • 73% agreed that “our library should ensure that e-journals are preserved somewhere
  • 4% believed preservation could be achieved by publishers holding redundant copies of eJournals

Libraries are unsure about how urgent the issue is and whether they need to take any action in the next two years. This appears to follow the interest of the faculty in the issue. Where the library was interested in eJournal preservation, 74% had been approached by faculty on the issue. When the library was not interested, only 34% had ever been approached by faculty, and less than 10% had ever been approached by faculty more than twice. Many libraries feel the issue is complicated and are not sure who should preserve the eJournals. They are uncertain about the best approach, and there are competing priorities. “Research institutions are far more likely than teaching institutions to have taken action on e-journal preservation.” Most libraries do not have an established digital preservation budget, and the money is borrowed from other areas, such as the collections budget.

Friday, November 21, 2008

Digital Preservation Matters - 21 November 2008

Archives: Challenges and Responses. Jim Michalko. OCLC. 6 November 2008. [pdf]

Interesting view of ‘The Collective Collection’. A framework for representing content.

  • Published Content: books, journals, newspapers, scores, maps, etc.
  • Special Collections: Rare books, local histories, photos, archives, theses, objects, etc.
  • Open Web Content: Web resources, open source software, newspaper archives, images, etc.
  • Institutional Content: ePrints, reports, learning objects, courseware, manuals, research, data, etc.
Describes an End-to-End Archival Processing Flow:
Select>Deliver>Describe>Acquire>Appraise>Survey>Disclose>Discover


Managing the Collective Collection: Shared Print. Constance Malpas. OCLC. 6 November 2008. [pdf]

Concern that many print holdings will be ‘de-duped’ and that there will not be enough to maintain the title. Some approaches are offsite storage, digitization, distributed print archives. “Without system-wide frameworks in place, libraries will be unable to make decisions that effectively balance risk and opportunity with regard to de-accessioning of print materials.” The average institutional holdings for in WorldCat: for serials=13; for books=9. Up to 40% of book titles have a single institution holding. There is a need for a progressive preservation strategy.


Ancient IBM drive rescues Apollo moon data. Tom Jowitt. Computerworld. November 12, 2008.

Data gathered by the Apollo missions to the moon 40 years ago looks like it may be recovered after all, thanks to a donation of an “ancient” IBM tape drive. The mission data had been recorded onto 173 data tapes, which had then been 'misplaced' before they could be archived. The tapes have been found but now they did not have a drive to read the data; one has been found at the Australian Computer Museum Society. It will require some maintenance and to restore to working condition. "It's going to have to be a custom job to get it working again," which may take several months.


Google to archive 10 million Life magazine photos. Heather Havenstein. Computerworld. November 18, 2008.

Google plans to archive as many as 10 million images from the Life magazine archives, and about 20% are already online. Some of the images date back to the 1750s; many have never been published. The search archive is here.


PREMIS With a Fresh Coat of Paint. Brian F. Lavoie. D-Lib Magazine. May/June 2008.

Highlights from the Revision of the PREMIS Data Dictionary for Preservation Metadata. This looks at PREMIS 2.0 and the changes made:

  • Update to the data model clarifying relation between Rights and Agents, and Events and Agents
  • Completely revised and expanded Rights entity: a more complete description of rights statements
  • A detailed, structured set of semantic units to record information about significant properties
  • Added the ability to accommodate metadata from non-PREMIS specifications
  • A suggested registry to be created of suggested values for semantic units

Thursday, November 20, 2008

Top five IT spending priorities for hard times

Tom Sullivan. ComputerWorld. November 19, 2008.

With the current economic times, organizations are busy looking to see what costs they can cut. Analysts agree these areas need to be funded.
  1. Storage: Disks and management software. For many the largest expenditure is storage. Data doubles yearly
  2. Business intelligence: Niche analytics. Information and resources to help accomplish keys goals.
  3. Optimizing resources. Get the most out of what you already have.
  4. Security. Keeping the resources secure.
  5. Cloud computing: Business solutions.

PC Magazine will be online only

Stephanie Clifford. International Herald Tribune. November 19, 2008.

Ziff Davis Media announced it was ending print publication of its 27-year-old flagship PC Magazine; following the January 2009 issue, it will be online only. "The viability for us to continue to publish in print just isn't there anymore." PC Magazine derives most of its profits from its Web site. More than 80 percent of the profit and about 70 percent of the revenue come from the digital business. This is not too much of an adjustment since all content goes online first, and then the print version has been choosing what it wants to print.

A number of other magazines have ended their print publications. The magazines that have gone to online only have been those that are declining. "Magazines in general are going to be dependent on print advertising for a long time into the future."

Massive EU online library looks to compete with Google

PhysOrg.com. November 19, 2008.

The European Union is launching the Europeana digital library, an online digest of Europe's cultural heritage, consisting of millions of digital objects, including books, film, photographs, paintings, sound files, maps, manuscripts, newspapers, and documents.

The prototype will contain about two million digital items already in the public domain. By 2010, the date when Europeana is due to be fully operational, the aim is to have 10 million works available of the estimated 2.5 billion books in Europe's more common libraries. The project plans to be available in 21 languages, though English, French and German will be most prevalent early on.

Thursday, November 13, 2008

Digital Preservation Matters - 14 November 2008

Library of Congress Digital Preservation Newsletter. Library of Congress. November 2008.

There are three interesting items in the November newsletter:

1. The NDIIPP Preserving Digital Public Television Project is building infrastructure, creating standards and obtaining resources. The project is trying to create a consistent approach to digital curation among those who produce PBS programs. Their metadata schema includes four elements: PBCore (a standard developed by and for public media organizations), METS rights, MODS and PREMIS. The goal is to put the content in the Library’s National Audio-Visual Conservation Center where it will be preserved on servers and data tapes. This will support digital archiving and access for public television and radio programs in the US. Many stations are unsure about what to do with their programs for the long term and the American Archive is seen as a solution.

2. Digitization Guidelines: An audiovisual working group will set standards and guidelines for digitizing audiovisual materials. The guidelines will cover criteria such as evaluating image characteristics and establishing metadata elements. The recommendations will be posted on two Web sites:

www.digitizationguidelines.gov/stillimages/

www.digitizationguidelines.gov/audio-visual/

3. Data Archive Technology Alliance: A meeting was held to establish a network of data archives to help develop shared technologies for the future. They hope to set standards for shared, open-source and community developed technologies for data curation, preservation, and data sharing. It is critical to clearly define the purpose and outcome of the effort. Those involved will develop a shared inventory of their tools, services, and also list new developments to enhance data stewardship.


JHOVE2 project underway. Stephen Abrams. Email. November 6, 2008.

The JHOVE tool has been an important part of digital repository and preservation workflows. It has a number of limitations and a group is starting a two-year project to develop a next-generation JHOVE2 architecture for format-aware characterization. Among the enhancements planned for JHOVE2 are:

· Support for: signature-based identification, extraction, validation, and rules-based assessment

· A data model supporting complex multi-file objects and arbitrarily-nested container objects

· Streamlined APIs for integrating JHOVE2 in systems, services, and workflows

· Increased performance

· Standardized error handling

· A generic plug-in mechanism supporting stateful multi-module processing;

· Availability under the BSD open source license


Planetarium - Planets Newsletter Issue 5. 22 October 2008 [PDF]

The newsletter includes several items about Planets (Preservation and Long-term Access through Networked Services) which is a European to address digital preservation challenges. Here are a few items from the newsletter: Project Planets will provide the technology component of The British Library digital preservation solution.

The preservation planning tool Plato implements the PLANETS Preservation Planning approach. It looks and guides users through four steps:

  1. define context and requirements;
  2. select potential actions and evaluate them on sample content;
  3. analyze outcomes and;
  4. define a preservation plan based on this empirical evidence.

Digital preservation activities can only succeed if they consider the wider strategy, policy, goals, and constraints of the institution that undertakes them. For digital preservation solutions to succeed it is essential to go beyond the technical properties of the digital objects to be preserved, and to understand the and institutional framework in which data, documents and records are preserved. The biggest barriers to preservation are:

  1. lack of expertise
  2. funding and
  3. buy-in at senior level.


Cisco unveils a router for the 'Zettabyte Era'. Matt Hamblen. Computerworld. November 11, 2008.

Cisco introduced the "Zettabyte Era," and announced the Aggregation Services Router (ASR) 9000, the next generation of extreme networking. They believe service providers need to prepare for petabytes or even exabytes data from video applications which need faster routing. “Instead of needing switching for petabytes or even exabytes of data, the zettabyte will soon be the preferred term, equal to 10 to the power of 18”.


In praise of ... preserving digital memories. Editorial. The Guardian. September 30, 2008.

Some people are thinking centuries ahead. The British Library hosted the iPres conference to work out ways to preserve data for future generations. Since most everything is in digital form now, this is a difficult thing to do. By 2011 “it is expected that half of all content created online will fall by the wayside.” There is no Rosetta Stone for digital but progress is being made.


Skills, Role & Career Structure of Data Scientists & Curators: Assessment of Current Practice & Future Needs. Alma Swan, Sheridan Brown. JISC. 31 July 2008.

The report of a study that looks at those who work with data

It identifies four roles, which may overlap

  • Data Creator: Researchers who produce and are experts in handling, manipulating and using data
  • Data Scientist: Those who work where the research is carried out and may be involved in creative enquiry and analysis
  • Data Manager: Those who take responsibility for computing facilities, storage, continuing access and preservation of data
  • Data Librarian: Librarians trained and specializing in the curation, preservation and archiving of data

There is a continuing challenge to make sure people have the skills needed. Three main potential roles for the library:

  1. Training researchers to be more data-aware
  2. Adopt a data archiving and preservation role; provide services through institutional repositories
  3. Training of data librarians

Caring for the data frees data scientists from the task and allows them to focus on other priorities. Data issues are moving so fast that periodic updating is much more effective than an early, intensive training with no follow-up. Some institutions offer training courses and workshops on data-related topics.

Tuesday, November 11, 2008

JHOVE2 project underway

From: stephen.abrams@ucop.edu [mailto:stephen.abrams@ucop.edu]
Sent: Thursday, November 06, 2008 3:43 PM
JHOVE2 project underway

The open source JHOVE characterization tool has proven to be an important
component of many digital repository and preservation workflows. However, its
widespread use over the past four years has revealed a number of limitations
imposed by idiosyncrasies of design and implementation. The California Digital
Library (CDL), Portico, and Stanford University have received funding from the
Library of Congress, under its National Digital Information Infrastructure
Preservation Program (NDIIPP) initiative, to collaborate on a two-year project
to develop a next-generation JHOVE2 architecture for format-aware
characterization.

Among the enhancements planned for JHOVE2 are:

* Support for four specific aspects of characterization: signature-based
identification, feature extraction, validation, and rules-based assessment
* A more sophisticated data model supporting complex multi-file objects and
arbitrarily-nested container objects
* Streamlined APIs to facilitate the integration of JHOVE2 technology in
systems, services, and workflows
* Increased performance
* Standardized error handling
* A generic plug-in mechanism supporting stateful multi-module processing;
* Availability under the BSD open source license

To help focus project activities we have recruited a distinguished advisory
board to represent the interests of the larger stakeholder community. The board
includes participants from the following international memory institutions,
projects, and vendors:

* Deutsche Nationalbibliothek (DNB)
* Ex Libris
* Fedora Commons
* Florida Center for Library Automation (FCLA)
* Harvard University / GDFR
* Koninklijke Bibliotheek (KB)
* MIT / DSpace
* National Archives (TNA)
* National Archives and Records Administration (NARA)
* National Library of Australia (NLA)
* National Library of New Zealand (NLNZ)
* Planets project

The project partners are currently engaged in a public needs assessment and
requirements gathering phase. A provisional set of use cases and functional
requirements has already been reviewed by the JHOVE2 advisory board.

The JHOVE2 team welcomes input from the preservation community, and would
appreciate feedback on the functional requirements and any interesting test
data that have emerged from experience with the current JHOVE tool.

The functional requirements, along with other project information, is available
on the JHOVE2 project wiki
<http://confluence.ucop.edu/display/JHOVE2Info/Home>. Feedback on project goals
and deliverables can be submitted through the JHOVE2 public mailing lists.

To subscribe to the JHOVE2-TechTalk-L mailing list, intended for in-depth
discussion of substantive issues, please send an email to <listserv at ucop dot
edu> with an empty subject line and a message stating:

SUB JHOVE2-TECHTALK-L Your Name

Likewise, to subscribe to the JHOVE2-Announce-L mailing list, intended for
announcements of general interest to the JHOVE2 community, please send an email
to <listserv at ucop dot edu> with an empty subject line and a message stating:


SUB JHOVE2-ANNOUNCE-L Your Name

To begin our public outreach, team members recently presented a summary of
project activities at the iPRES 2008 conference in London, entitled "What? So
What? The Next-Generation JHOVE2 Architecture for Format-Aware
Characterization," reflecting our view of characterization as encompassing both
intrinsic properties and extrinsic assessments of digital objects.

Through the sponsorship of the Koninklijke Bibliotheek and the British Library,
we also held an invitational meeting on JHOVE2 following the iPRES conference
as a opportunity for a substantive discussion of the project with European
stakeholders.

A similar event, focused on a North American audience, will be held as a
Birds-of-a-Feather session at the upcoming DLF Fall Forum in Providence, Rhode
Island, on November 13. Participants at this event are asked to review closely
the functional requirements and other relevant materials available on the
project wiki at <http://confluence.ucop.edu/display/JHOVE2Info/Home> prior to
the session.

Future project progress will be documented periodically on the wiki.

Stephen Abrams, CDL
Evan Owens, Portico
Tom Cramer, Stanford University

on behalf of the JHOVE2 project team

Friday, November 07, 2008

Digital Preservation Matters - 07 November 2008

Digital Preservation Policies Study. Neil Beagrie, et al. JISC. 30 October 2008. [pdf]

This study will become part of the foundation documents for digital preservation. It provides a model for digital preservation policies and looks at the role of digital preservation in supporting and delivering strategies for educational institutions. The study also includes 1) a model/framework for digital preservation policies; 2) a series of mappings of digital preservation to other key institutional strategies in universities, libraries, and Records Management. This is intended to help institutions develop appropriate digital preservation policies. Some notes:

Long-term access relies heavily on digital preservation strategies being in place and we should focus on making sure they are in place. Developing a preservation policy will only be worthwhile if it is linked to core institutional strategies: it cannot be effective in isolation. One section outlines well steps that must be taken to implement a digital preservation solution. Policies should outline what is preserved and what is excluded. Digital preservation is a means, not an end in itself. Any digital preservation policy must be seen in terms of the strategies of the institution. An appendix has created a summary of the strategy aims and objectives for certain institutions and the implications for digital preservation activities within the organization. Definitely worth studying the approximately 120 pages.


Predicting the Longevity of DVDR Media by Periodic Analysis of Parity, Jitter, and ECC Performance Parameters. Daniel Wells. BYU Thesis. July 14, 2008.

The summarizing statement for me was: “there is currently extreme reluctance to use DVD-R’s for future digital archives as well as justifiable concern that existing DVD archives are at risk.” We have certainly found this in our own experience, having very high failure rates with some collections.

The abstract: For the last ten years, DVD-R media have played an important role in the storage of large amounts of digital data throughout the world. During this time it was assumed that the DVD-R was as long-lasting and stable as its predecessor, the CD-R. Several reports have surfaced over the last few years questioning the DVD-R's ability to maintain many of its claims regarding archival quality life spans. These reports have shown a wide range of longevity between the different brands. While some DVD-Rs may last a while, others may result in an early and unexpected failure. Compounding this problem is the lack of information available for consumers to know the quality of the media they own. While the industry works on devising a standard for labeling the quality of future media, it is currently up to the consumer to pay close attention to their own DVD-R archives and work diligently to prevent data loss. This research shows that through accelerated aging and the use of logistic regression analysis on data collected through periodic monitoring of disc read-back errors it is possible to accurately predict unrecoverable failures in the test discs. This study analyzed various measurements of PIE errors, PIE8 Sum errors, POF errors and jitter data from three areas of the disc: the whole disc, the region of the disc where it first failed as well as the last half of the disc. From this data five unique predictive equations were produced, each with the ability to predict disc failure. In conclusion, the relative value of these equations for end-of-life predictions is discussed.


DCC Curation Lifecycle Model. Chris Rusbridge. Digital Curation Centre Blog. 8 October 2008.

The model they have put together is available in graphical form. Like all models it is of course a compromise between succinctness and completeness. They plan it to use it to structure information on standards and as an entry point to the DCC web site and it is explained in an article in the International Journal of Digital Curation. The model is a high level overview of the stages required for successful curation, and complements OAIS and other standards. The actions for Digital Objects or Databases are:

  • Full Lifecycle Actions: Description and Representation Information; Preservation Planning; Community Watch and Participation Curate and Preserve
  • Sequential Actions: Conceptualise; Create or Receive; Appraise and Select; Ingest; Preservation Action; Store; Access, Use and Reuse; Transform
  • Occasional Actions: Dispose; Reappraise; Migrate

The model is part of a larger plan to take a detailed look at processes, costs, governance and implementation.


WVU Libraries Selected for Digital Pilot Project. September 15, 2008.

The West Virginia University Libraries are among 14 institutions picked to participate in a book digitization pilot project led by PALINET. Each institution will submit five to ten books to be digitized during a pilot project. After that, the initial target will be to digitize 60,000 books and put them in the Internet Archive. “Another benefit of the project is preservation.” The Rare Books Curator, said a dilemma is allowing access and yet providing for the maximum amount of preservation. “These books are old and they’re fragile, and there is always the difficulty of preserving a book that is used a lot. Maintaining that balance is essential. It’s a fine line that we’re always on. Book digitization is a way of providing access and assuring preservation of the original.”

Friday, October 31, 2008

Digital Preservation Matters - 31 October 2008

Google Settles Book-Scan Lawsuit, Everybody Wins. Chris Snyder. Wired. October 28, 2008.

Google settled a lawsuit by agreeing to pay $125 million to authors and publishers. In addition, out of print, copyright protected books will still be scanned and publishers have the option to activate a “Buy Now” button so readers can download a copy of the book. Google will take a 37 percent share of the profits, plus an administrative fee of 10 to 20 percent, and the remaining goes to authors and publishers. This creates a market for out-of-print works that were not likely to get back into "print" any other way, and it establishes a new non-profit Book Rights Registry to manage royalties.

Universities and institutions can buy a subscription service to view the entire collection, and U.S public libraries will have terminals for students and researchers to view the catalog for free.


Christian Science Monitor Goes All in on the Web. Meghan Keane. Wired. October 28, 2008.

The Christian Science Monitor plans to halt publication of its Monday through Friday newspaper in favor of daily web content. They are also creating a weekly Sunday magazine. This will cut The Monitor's subscription revenue in half, but it will also cut overhead in half as well. "Maybe the reason newspapers could go out of business is because they think they're in the newspaper business instead of the news gathering and dissemination business. To hang on to a two century old technology just because that’s the way we’ve always done it, that’s a recipe for failure."


Transition or Transform? Repositioning the Library for the Petabyte Era. Liz Lyon. UKOLN. ARL / CNI Forum. October 2008. [PowerPoint]

A recent study shows that data is continually re-analysed and new analytic techniques add value to older data. Data-sharing is seen as a form of trade or gift exchange: “give to get” rather than “give away”.

Preservation & sustainability Recommendations:

  • Use DRAMBORA for self-assessment of data repositories
  • Add PREMIS preservation metadata
  • Collect representation information
  • Examine that repository conforms to OAIS Model
  • Survey partner preservation policies
Need to develop a Data Audit Framework for departmental data collections, awareness, policies and practice for data curation and preservation”. Steps include: plan, identify and classify assets, assess management of data assets, report and recommendations. Also need to formalize the role of data librarians.

Some challenges:
  • Understand and manage risks
  • Building a consensus in the community
  • Appraisal and selection criteria
  • Document the data; add metadata validate
  • Data provenance, authenticity


Mourning Old Media’s Decline. David Carr. The New York Times. October 28, 2008.

There have been a number of newspapers having difficulties, not just the Christian Science Monitor. “The paradox of all these announcements is that newspapers and magazines do not have an audience problem … but they do have a consumer problem.” People get their information on the internet more than paper, but why does it matter? “The answer is that paper is not just how news is delivered; it is how it is paid for.” Part of the difficulty is that the move to digital media means that there are fewer people now employed in the industry who provide or report the information. The Google CEO said if the trusted brands of journalism vanish, the internet becomes a “cesspool” of useless information.

Wednesday, October 29, 2008

Christian Science Monitor Goes All in on the Web

Christian Science Monitor Goes All in on the Web. Meghan Keane. Wired. October 28, 2008.

The Christian Science Monitor plans to halt publication of its Monday through Friday newspaper in favor of daily web content. They are also creating a weekly Sunday magazine. This will cut The Monitor's subscription revenue in half, but it will also cut overhead in half as well. "Maybe the reason newspapers could go out of business is because they think they're in the newspaper business instead of the news gathering and dissemination business. To hang on to a two century old technology just because that’s the way we’ve always done it, that’s a recipe for failure."

Google Settles Book-Scan Lawsuit, Everybody Wins

Google Settles Book-Scan Lawsuit, Everybody Wins. Chris Snyder. October 28, 2008.

Google settled a lawsuit by agreeing to pay $125 million to authors and publishers. In addition, out of print, copyright protected books will still be scanned and publishers have the option to activate a “Buy Now” button so readers can download a copy of the book. Google will take a 37 percent share of the profits, plus an administrative fee of 10 to 20 percent, and the remaining goes to authors and publishers. This creates a for out-of-print works that were not likely to get back into "print" any other way, and establishes a new non-profit Book Rights Registry to manage royalties.

Universities and institutions can buy a subscription service to view the entire collection, and U.S public libraries will have terminals for students and researchers to view the catalog for free.