Friday, July 22, 2011

Digital Preservation Outreach and Education

The Library of Congress has created the Digital Preservation Outreach and Education (DPOE) initiative to encourage individuals and organizations to preserve their digital materials.  This site contains a calendar of courses and workshops, many free and online.  There is a wealth of information here.

They have outlined a number of activities (needs assessment, core principles, education needs, etc.)  and have create six DPOE regions :
  • Midwest: OH, IN, MI, IL, WI, MO, IA, MN, ND, SD, NE, KS
  • Northeast: ME, VT, NH, RI, MA, CT, NY, NJ, PA
  • Northwest: WA, OR, MT, WY, ID, AK
  • Southeast: MS, AL, GA, FL, SC, NC, TN, KY, WV, VA, DC, MD, DE
  • Southcentral: TX, OK, AR, LA
  • Southwest: CA, NV, UT, CO, AZ, HI

Puzzling over digital preservation – Identifying traditional and new skills needed for digital preservation

Puzzling over digital preservation – Identifying traditional and new skills needed for digital preservation.  Thomas Bähr, et al.  IFLA Conference paper, submitted: June 1, 2011. 
This is an excellent paper. The topic is important for all libraries:  “Preservation is a core competency of librarians, however, gaps still exist when it comes to being able to accept that role fully in the digital realm.” The topic is complex: "Digital preservation is not a single task but encompasses all strategies and measures taken to recover and maintain access to digital information." Maintaining digital cultural heritage requires new skills for content experts and technology experts.  In 2010 a Digital Preservation Outreach and Education survey of 481 libraries showed:
  • 27% of surveyed libraries had digital preservation staff;
  • 21% relied on a vendor
  • 14% stated that no one was responsible for digital preservation at their institution. 
  • 38% said various staff was assigned to digital preservation tasks as needed.
 The paper looks at the digital preservation skill needs and identifies gaps that exist.
  • The first step to digital preservation is to know what your holdings are.  Make a meaningful grouping of items to a collection level. 
  • Survey the collection for preservation needs.  Define approaches for surveying offline and online materials. Give attention to capturing carrier type details.  The survey of offline data should include a readability test and a rough overview of the formats.  
  • Know how to handle your digital materials for proper storage and proper processing, such as optical or magnetic media. You need to understand how the media works to do this. Good metadata, technical, bibliographic, etc. is required for proper handling and preservation of data, which increases the skill level needed.  
  • Sustainability of access requires an awareness of what the future is bringing.  Content knowledge in the digital age is more than knowing the intellectual information of an item; it includes the knowledge of presentation and usability of a digital object, as well as the user’s expectations.
  • Preservation strategies and action are always based on risk assessment and planning. These require a profound knowledge of library and technology processes as well as digital preservation practices.
  •  Institutions with a stewardship for digital cultural heritage need “appropriate staff to support activities related to the long-term preservation of the data,” and the digital preservation specialist must be committed to “lifelong learning”.
  • “As digital preservation is a global problem, active involvement in the national and international community is of immense importance.”

Wednesday, July 20, 2011

UK National Archives Digital Preservation Policy

The UK National Archives has published a document for archives on the need for a digital preservation policy. The guide explains the key characteristics of a digital preservation policy, why there is a need for a policy, how it supports digital preservation, what the criteria are and how it relates to other documentation and policies.  It contains links to digital preservation policies of other institutions, as well as links to other sites that would help someone create a digital preservation policy.
  • Active preservation is any proactive action taken in the preservation of digital records (e.g. migration of records). 
  • Passive preservation is any action that supports digital preservation but does not engage with the digital records directly (e.g. management and backing up of servers storing digital records).
  Archives may need to revert to an earlier version of a digital record(s) if a chosen migration path is unsuccessful. To do this a previous instance of a record will need to be retained.
The document also outlines criteria and measures for a successful digital preservation policy.
 Digital preservation policies: guidance for archives (PDF).

CERN pushes storage limits as it probes secrets of universe

Loek Essers. Computerworld.  July 11, 2011.
Sensors at the Large Hadron Collider at CERN generate around 1 petabyte of data per second.  It is not possible to store that amount of data, nor is all of the data needed.  They select the 'must-have' data: "The goal is to try not to drop anything interesting." After filtering, CERN has up to 25 petabytes of data to store each year, most of which is stored on tapes.  They have a capacity of about 34 PB on tape and 45 PB on disk.  Analyzing the data is done by their computer grid.

The death of backup and the rapid rise of the cloud

Data backups have always been expensive, complicated, and prone to errors.  As data has increased, backups have been difficult to complete in the time available.  Increasingly, organizations are outsourcing their backups and data recovery services to cloud vendors. 

This may be a great benefit for system administrators, but it doesn't eliminate the need to understand your archiving requirements, data security and protection issues, and a plan for vendor migration.

Tuesday, July 19, 2011

University Offers Help to Researchers Wrestling with Digital Data Management

Data management is becoming more complicated with larger data sets. More disciplines are becoming data intensive. Some projects can generate up to a petabyte a day. Managing, analyzing and preserving large data sets is a modern data management problem that all research institutions now face to some degree. "Research data haphazardly saved on a hard drive – or worse, a disk stored in a desk drawer – might be recoverable now, but there's no guarantee it will be decades down the line."

Data has a lifecycle to it.  "You begin and collect data, and then you have to go in and process it and manage it, and then you analyze and publish results, and then ideally you archive it. That's as true for digital data as it is for other forms." National foundations have adopted data management requirements, though those requirements are somewhat open-ended and still evolving.

The Scientific Data Consulting Group also is involved in the creation of DMPTool, a new online tool that helps institutions create their own data management plans. "Even if we didn't have compliance requirements set by the federal government, the right thing to do would be to assist faculty members and graduate students with dealing with these data they are collecting,"