Friday, April 30, 2010

Digital Preservation Matters - April 30, 2010

Digital Preservation: An Unsolved Problem. Jonathan Shaw. Harvard Magazine. April 27, 2010.

With the advantages of digital, why do libraries not embrace the digital future now? One of the main obstacles is the issue of preservation. For books: "the greatest risks to printed material are the environment, wear and tear, security, and custodial neglect." For digital: using data is one of the best ways to preserve it because you know it is usable; digital data must be read and checked constantly to ensure integrity. Another concern about digital is that current formats may not be readable in the future (reference to June 2009 New Yorker cover). Born digital materials are not as easy to save since they have many different formats. This is difficult for librarians keeping records of the university's intellectual life, because of both the legal and digital challenges. "We are in a period of unprecedented lack of documentation of academic output."


Gutenberg 2.0. Harvard's libraries deal with disruptive change. Jonathan Shaw. Harvard Magazine. April 27, 2010.

In the scientific disciplines, information, from online journals to databases, must be recent to be relevant. Books in libraries to some seem more like a museum. Some think that massive digital projects will make research libraries irrelevant. The future of libraries is clearly digital. "Yet if the format of the future is digital, the content remains data. And at its simplest, scholarship in any discipline is about gaining access to information and knowledge." Access to the information will mean different things and be done in different ways. In the meantime, "Who has the most scientific knowledge of large-scale organization, collection, and access to information? Librarians."

How do we deal with large scale collections and the access to the information? "We ought to be leveraging that expertise to deal with this new digital environment. That's a vision of librarians as specialists in organizing and accessing and preserving information in multiple media forms, rather than as curators of collections of books, maps, or posters." The role of libraries isn't going away, but it is changing.

The idea that libraries will be stewards of vast data collections raises very serious concerns about the long-term preservation of digital materials. The worry is that the longevity of the resources has not been tested. There are 3 copies of the 109 TB Harvard repository. It is in a constant process of checking and refreshing to make sure everything is readable.


The Floppy is Dead: Time to Move Memories to the Cloud. Lance Ulanoff. PC Magazine. Apr 26, 2010.

The decision by Sony to stop producing 3.5-inch disks marks an end to that format. The end of any popular format can have a ripple effect on the technology world. If the data is not migrated to later formats it could "trapped on its obsolete format". All media will become obsolete sometime, it is the natural progression of technology. Since change is inevitable the article suggests everyone consider cloud-based backup storage options. It suggests that this is better than storing data on eventually-to-be-obsolete media.


Google is not the last word in information. Lia Timson. Sydney Morning Herald. April 29, 2010.

Interesting article concerning primary and secondary sources, what is on the internet and how it gets there, special collections, etc.

  • "Better still is the lesson and the realisation that information and history don't just appear on Google. Someone has to publish it onto the web, put it there in the first place."
  • "As educators we must ask that assignment bibliographies include more than just "three websites". We must insist on a variety of media as sources, including interviews with real people, be they witnesses, historians or surviving relatives, and even insist on trips to the local library."
  • … researching is much wider and deeper than searching online.


A Gentle Reminder to Special-Collections Curators. Todd Gilman. The Chronicle of Higher Education. April 29, 2010.

Article and a librarian's experience trying to use special collections. The "job is not to keep readers from your books but just the opposite: to facilitate readers' use of the collections."


Friday, April 23, 2010

Digital Preservation Matters - April 23, 2010

National Archives Reports on Federal Agency Records Management Programs. NARA Press Release. April 19, 2010.

NARA issued a mandatory records management self-assessment to 245 Federal cabinet-level agencies and related groups, and 91% responded. The goal was to determine how effective Federal agencies are in meeting the statutory and regulatory requirements for records management. The study showed that 79% of agencies are falling short in their responsibilities. The long-term success of the Open Government initiative and the ability to ensure access to the records of our government, hinges on the ability of each Federal agency to effectively manage its records.

View the 93 page report.


Library of Congress Digital Preservation Newsletter. Library of Congress. April 2010.

The newsletter includes information about a number of digital preservation initiatives. Some of them are:

  • A new video "Why Digital Preservation is Important for Everyone" which also includes a transcript. The main theme is that digital materials, which can fail or be lost, require active management. The three minute video is worth watching.
  • The Federal Agencies Digitization Guidelines Initiative is helping government agencies preserve audio-visual information.
  • Links to The Blue Ribbon Task Force on Sustainable Digital Preservation and Access and their recent report, Sustainable Economics for a Digital Planet: Ensuring Long-Term Access to Digi­tal Information.
  • Link to a podcast "Conversations about Digital Preservation" about the Library's challenges to build an efficient, scalable digital repository, how the Library's repository works and future plans for the repository
  • A group of institutions have developed an automated way to preserve official e-mail records produced by Microsoft Outlook and capture the necessary long-term preservation metadata. This is part of the Persistent Digital Archives and Library System project (PeDALS) to develop a shared curatorial framework for preserving digital public records across multiple states.
  • May 10th will be the Personal Archiving Day at the Library of Congress.


NEW Blog from the DuraSpace Preservation & Archiving Solution Community. Carol Minton Morris. DuraSpace Preservation & Archiving. April 21, 2010.

A new blog has been set up by the Preservation and Archiving Solution Community. The blog is a vehicle for an open exchange of ideas and initiatives around preservation & archiving solutions. All are welcome to participate. It had started as a group using Fedora Commons, but is actually looking at all preservation issues, not just those for Fedora or DSpace.


Digital Preservation and the Challenge. Ron Jantz. DuraSpace Preservation & Archiving. April 21, 2010.

Institutions around the world are grappling with the technology, processes, and organizational structures that will result in digital preservation becoming a reality. The challenge to preserve information goes back centuries to those trying to preserve books in the past, and mentions a example when the Reformation dissolved the monasteries, and the books were not preserved. Can we demonstrate that we are preserving what we have now? We should be looking at self assessment tools to see how we are doing with preservation.


Crowdsourcing: How and Why Should Libraries Do It? Rose Holley. D-Lib Magazine. March/April 2010.

Crowdsourcing is a new term referring to undefined groups of people in a community "taking tasks traditionally performed by an employee or contractor and outsourcing it to a group (crowd) of people or community in the form of an open call." It may be the "most useful tool a library can have in the future." The work can be done as a group or as an individual. Libraries already know about the first step of crowdsourcing: social engagement with individuals, but need to improve in the second step: defining and working towards group goals. This can bring benefits to libraries and users, especially by adding value to data by adding comments, tags, ratings, reviews. Some successful examples include collections at the National Library of Australia, FamilySearchIndexing and Latter Day Saints: Text transcription of records, Wikipedia, etc. These released their services 'quietly' with little or no advertising, but clear group goals. The article looks at the volunteer profile, motivational factors, types of acknowledgement and rewards, managing volunteers, and tips for successful crowdsourcing. "Freedom is actually a bigger game than power. Power is about what you can control. Freedom is about what you can unleash".

Some of the tips:

  1. Have a transparent and clear goal on your home page
  2. Have a transparent and visible chart of progress towards your goal.
  3. Make the overall environment easy to use, intuitive, quick and reliable.
  4. Make the activity easy and fun; it must be interesting.
  5. Keep the site active by addition of new content/work.
  6. Give volunteers options and choices
  7. Make the results/outcome of your work transparent and visible.
  8. Let volunteers identify and make themselves visible if they want acknowledgement.
  9. Reward high achievers by having ranking tables and encourage competition.
  10. Give the volunteers an online team/communication environment to build a dynamic, supportive team environment.
  11. Treat your 'super' volunteers with respect and listen to them carefully.
  12. Assume volunteers will do it right rather than wrong.


Friday, April 16, 2010

Digital Preservation Matters - April 16, 2010

State Of America's Libraries Report 2010. American Library Association. April 11, 2010.

Interesting report about libraries. As the recession continues, Americans turn to libraries in ever larger numbers for access to resources for employment, continuing education, and government services. The local library has become a lifeline of resources, training and workshops. Even in the age of Google, academic libraries are being used more than ever. During a typical week in fiscal 2008, academic libraries in the United States had more than 20.3 million visits, answered more than 1.1 million reference questions, and made more than 498,000 presentations to groups attended by more than 8.9 million students and faculty, increases over the previous years. Over 43% of libraries provide access to locally produced digitized collections.


A National Conversation on the Economic Sustainability of Digital Information. Blue Ribbon Task Force on Sustainable Digital Preservation and Access. April 1, 2010. [Silverlight video.]

This page has the agenda and video presentations from A National Conversation on the Economic Sustainability of Digital Information, a recent meeting hosted by the Blue Ribbon Task Force on Sustainable Digital Preservation and Access.

BRTF's Featured Agenda and Presentations:

  • Research Data, Daniel E. Atkins, Wayne Clough,
  • Scholarly Discourse, Derek Law, Brian Schottlaender,
  • Economics of Collectively-Created Content, George Oates, Timo Hannay
  • Commercially-owned Cultural Content, Chris Lacinak, Jon Landau
  • Economics of Digital Information, William G. Bowen, Hal R. Varian, Dan Rubinfeld
  • Summary by Clifford Lynch.


How Tweet It Is!: Library Acquires Entire Twitter Archive. Matt Raymond. Blog. Library of Congress. April 14, 2010.

The Library of Congress is digitally archiving every public tweet made since Twitter started in 2006. "Expect to see an emphasis on the scholarly and research implications of the acquisition." Amazing to think what we can "learn about ourselves and the world around us from this wealth of data. And I'm certain we'll learn things that none of us now can even possibly conceive." The Library of Congress has been archiving information from the web since 2000. It now has more than 167 terabytes of web-based information, including legal blogs and political websites.


Library of Congress: We're archiving every tweet ever made. Nate Anderson. Ars Technica. April 16, 2010.

Comments about the Library of Congress archiving tweets:

  • There's been a turn toward historicism in academic circles over the last few decades, a turn that emphasizes not just official histories and novels but the diaries of women who never wrote for publication, or the oral histories of soldiers from the Civil War, or the letters written by a sawmill owner. The idea is to better understand the context of a time and place, to understand the way that all kinds of people thought and lived, and to get away from an older scholarship that privileged the productions of (usually) elite males."
  • Digital technologies pose a problem for the Library and other archival institutions, though. By making data so easy to generate and then record, they push archives to think hard about their missions and adapt to new technical challenges."


Aligning Investments with the Digital Evolution: Results of 2009 Faculty Survey Released. Roger C. Schonfeld, Ross Housewright. Ithaka. April 07, 2010. [37p. PDF]

An excellent report for academic libraries especially, Faculty Survey 2009: Strategic Insights for Librarians, Publishers, and Societies, that looks at faculty attitudes towards the academic library, information resources, and the scholarly communications system. A few quotes from the report:

  • Faculty most often turn to network-level services, including both general purpose search engines and services targeted specifically to academia.
  • Of all disciplines, scientists remain the least likely to utilize library-specific starting points;
  • Network-level services are increasingly important for discovery, not only of monographs and journals but archival resources and other primary source collections.
  • The library must evolve to meet these changing needs.
  • 90% of faculty members view the library buyer role as very important, 71% and 59% now view the archive and gateway roles as very important, respectively. Archiving is the 2nd highest role.
  • Despite the reported declines in importance of all the library's roles other than as a buyer, the 2009 study saw a slight rise in perceived dependence on the library
  • The declining visibility and importance of traditional roles for the library and the librarian may lead to faculty primarily perceiving the library as a budget line, rather than as an active intellectual partner.
  • Faculty members most strongly support and appreciate the library's infrastructural roles, in which it acquires and maintains collections of materials on their behalf.
  • Faculty members sense of the significance of long-term preservation of electronic journals has steadily increased over time
  • Effective and sustainable models for the preservation of electronic journals must be developed
  • Scholars, regardless of field, indicate a general preference that digital materials be preserved.
  • Less than 30% of faculty members have deposited any scholarly material into a repository; nearly 50% have not deposited but hope to do so in the future
  • Faculty attitudes and practices are at the strategic core. Greater engagement with and support of trailblazing faculty disciplines may help develop the roles and services to serve faculty needs into the future. The institutions that serve faculty must also anticipate them, both to ensure that the 21st century information needs of faculty are met and to secure their own relevance for the future.

Friday, April 09, 2010

Digital Preservation Matters - April 9, 2010

Blu-ray Disc Association Announces Additional Format Enhancements. Press Release. April 3, 2010.

The Blu-ray Disc Association announced two new media specifications:

  • The BDXL specification, targeted at broadcasting, medical and document imaging needs, has write-once discs of 100GB and 128GB capacity, and rewritable capability on 100GB discs. The discs use three to four recordable layers. A consumer version of BDXL is also expected sometime.
  • The Intra-Hybrid Blu-ray Disc has both a 25GB read only layer and 25GB rewritable layer and a single BD-RE layer so both needs can be met with one disc.

The two new types of discs require newly-designed hardware to record and play back.


Effort Will Help Libraries Put Academic Papers in Data 'Cloud'. Jeff Young. April 5, 2010

Some librarians are hoping that cloud computing will help their efforts to build institutional repositories, university wide collections of research papers. A new project sponsored by DuraSpace (a merger of DSpace and Fedora Commons) is called DuraCloud. This project plans to make it easier for librarians to put their repositories in off-site data storage. "A key design feature of DuraCloud is to leave the basics of pure storage to those who do it best (storage providers)." The project is now in the pilot phase, but should be available by the fall of 2010. "The biggest draw of the approach: It can be much cheaper than building new data centers to run on campuses.”


Submission Policy Recommendations. Chris Prom. Practical E-Records. March 24, 2010.

Here are some great policy documents that are an essential first step toward creating an active digital preservation plan. There are links on this page to several documents:

  • E-Records Deposit Policy
  • Preservation/Access Plan
  • Transfer Guidelines
  • E-record Survey Form
  • Submission Agreement Form

There is also a link to the do-it yourself TDR (Trusted Digital Repository). The preservation access plan is especially helpful because it looks at supported formats, both access and preservation formats, access tools for the formats, and migration path.


iPRES 2009: the Sixth International Conference on Preservation of Digital Objects. University of California. March 30, 2010.

The proceedings and videos from iPRES 2009 (held in San Francisco on Oct 5-6 2009) are now available online. The proceedings are available through the California Digital Library’s eScholarship site. The conference program, presentations, and videos are available at this link. There are many excellent resources here.


Friday, April 02, 2010

Digital Preservation Matters - April 2, 2010

Avoiding a Digital Dark Age. Kurt D. Bollacker. American Scientist. March-April 2010.

Data longevity depends on both the storage medium and the ability to decipher the information

The general problem of data preservation is twofold. The first matter is preservation of the data itself: The physical media on which data are written must be preserved, and this media must continue to accurately hold the data that are entrusted to it. This problem is the same for analog and digital media, but unless we are careful, digital media can be more fragile.

The second part of the equation is the comprehensibility of the data. Even if the storage medium survives perfectly, it will be of no use unless we can read and understand the data on it. Unlike in the analog world, digital data representations do not inherently degrade gracefully, because digital encoding methods represent data as a string of binary digits (“bits”). Because any single piece of digital media tends to have a relatively short lifetime, we will have to make copies far more often than has been historically required of analog media. Like species in nature, a copy of data that is more easily “reproduced” before it dies makes the data more likely to survive.

In order to survive, digital data must be understandable by both the machine reading them and the software interpreting them. There are at least two effective approaches: choosing data representation technologies wisely and creating mechanisms to reach backward in time from the future.


A Survey of the Scholarly Journals Using Open Journal Systems. Brian D. Edgar, John Willinsky. Educause Resources. March 4, 2010. [40 p. PDF]

Open Journal Systems (OJS) is an open source, online journal management and publishing platform. This study looks at scholarly communications using the open source software systems. survey to which 998 editors or staff members responded. The results point to how these journals – largely independent, scholar-published titles with roughly half

originating in the developing world – are not otherwise represented. Of the survey, 40 percent published research in the sciences, technology and medicine, 30 percent were social science journals, and 11 percent were in the humanities. 19 percent of the journals in the study were interdisciplinary.

The number of journals using OJS has been growing at an average rate of 81% per year. And the number of new journals that are starting, are using OJS at a rate of 47%. About half the journals using OJS are born digital. OJS looks at the effect that open source tools can have on journal publishing, and adds to the case for rethinking scholarly communication.


Ensuring Perpetual Access: establishing a federated strategy on perpetual access and hosting of electronic resources for Germany. The Alliance of German Science Organisations. Final Report in English. March 30, 2010. [177p. PDF.]

Increasing digital content is a challenge for scientific institutions. This study is a basis for a national hosting strategy to “establish and finance sustainable structures for perpetual access as well as long-term preservation for electronic resources.” Research is critical to the economy. Large investments into the research need to be safeguarded and maintained. Any loss can impair research, and ensuring future access is an important challenge. One of the largest gaps is the “provision for perpetual access for e-journals.” Library access via hosting on publishers’ servers is not “sufficiently robust as a single perpetual access solution long-term,” though it may be the immediate approach. Independent perpetual access with partners is needed, such as Portico. There needs to be a “strategy to create an infrastructure for the storage and long-term preservation of digital documents, and which can guarantee perpetual access to licensed commercial publications and retro-digitised library materials.” PDF and XML with the NLM-DTD are becoming a metadata standard for published material.


Jhove2-0.6.0 Download. Website. March 19, 2010.

A new alpha release of JHOVE2 is now available for download and evaluation. Some features include:

  • Format identification, validation, feature extraction, and message digest.
  • Recursive processing of directories, file sets, etc.
  • Integration with DROID for file identification.
  • Results formatted as text and XML