Wednesday, September 28, 2011

ADS and the Data Seal of Approval – case study for the DCC.

The ADS and the Data Seal of Approval – case study for the DCC.  Jenny Mitcham and Catherine Hardman. Digital Curation Centre website. 2010.  
This page describes the experience of Archaeology Data Service in applying for the Data Seal of Approval (DSA). It provides some practical information about the DSA application process and outlines issues the ADS faced in undertaking the process, and several potential benefits they see from the self-certification.

“When undertaking to curate data for the foreseeable future (and beyond) the concept of ‘trust’ is of paramount importance. Yet in a young discipline such as digital archiving, it is very difficult to demonstrate the potential for longevity of curation.”

The Assessment Manual can be downloaded from the DSA website, which includes details of the 16 guidelines, the minimum requirements, and some guidance notes.  In the spirit of the openness the DSA recommends that the main policy and procedure documents should be accessible the world at large.  One of the benefits mention is it shows to users and depositors that the archive has a set of standards is meeting them.

Tuesday, September 27, 2011

Fujitsu CTO: Flash is just a stopgap

Fujitsu CTO: Flash is just a stopgap. Chris Mellor. The Register. 8 August 2011.
Flash memory according to the CTO is "beset with problems that will become unsolvable". The increases in flash density come at the expense of the ability to read and write data. "Each shrink in process geometry, from 3X to 2X and onto 1X, shortens flash's endurance", and brings its additional problems of access speed and endurance.

Tomorrow will be too late – (born) digital in library special collections.

Tomorrow will be too late – (born) digital in library special collections. 28 June 2011.
Report from the annual conference of LIBER, the Association of European Research Libraries: It appears that according to recent studies, research libraries are still functioning within an analogue paradigm. Many libraries have digitized collections and provide online access, but their digitization efforts “mostly lack strategic planning, access is still mostly provided in a controlled way (for a limited group of users), preservation issues are still not being addressed adequately, and born-digital material (including audiovisual content) is blatently missing from collections and collection plans.”  If libraries stay in their comfort zone of digital access in a controlled network, their information role will become insignificant.  They need:
  • A knowledge of the user groups and their behaviours
  • Content, access, and preservation strategies
  • Strategic alliances
  • Permanent innovation
  • An open mind
 There are opportunities for libraries but they need to transition their methods.

Google helps put Dead Sea Scrolls online

Google helps put Dead Sea Scrolls online. BBC News.  26 September 2011.
Ultra-high resolution images of several Dead Sea Scrolls are now available on line.  Five scrolls have been digitized (1,200 megapixel images); they are The Temple Scroll, The War Scroll, The Community Rule Scroll, The Great Isaiah Scroll, and The Commentary of Habakkuk Scroll.  

Friday, September 23, 2011

Practical Approaches to Electronic Records: What Works Now (ppt)

Practical Approaches to Electronic Records: What Works Now.  Chris Prom, et al. August 30, 2011.
This is a PowerPoint of a presentation given by several people at SAA.  A few notes from the slides:
Adjectives that are undesirable when describing an archive: Undercounted, undermanaged, inaccessible.
Basic requirements for the digital archive:
  • Perform a virus check
  • Capture descriptive metadata about the folders and files
  • Document the file formats
  • Record checksums for the files
  • Document the actions taken over time

    Meeting the Challenge of Media Preservation: Strategies and Solutions.

    Meeting the Challenge of Media Preservation: Strategies and Solutions." Indiana University Bloomington. September 2011. (128 page pdf.)
    This excellent study is the result of a year of research and planning to address the problems an earlier report in 2009.  It looks at the preservation and conservation of audio, video, and film, including: guiding preservation principles, facility planning, prioritization, digitization methodologies, strategies for film, principles for access, technological infrastructure needs, and engagement with campus units and priorities. It is specifically for the university, but the information and recommendations are of interest to others.  Their mission is to preserve the time-based media holdings of Indiana University so that they may be accessible. They estimate their media holdings at more than 560,000 audio, video, and film objects, and nearly all on obsolete formats. And they estimate they only have a  fifteen- to twenty-year window of opportunity to digitally preserve audio and video holdings. They propose to collect rich descriptive and technical metadata to support digitization and future interpretation and management of digital content.
    • The media preservation crisis impacts every institution with media collections.
    • Because campus holdings are very large and time pressures great, even high-efficiency workflows may not preserve everything in time. 
    • Not every recording is an appropriate candidate for long-term preservation.
    • Research, instruction, curation, and public availability are core university missions supported by media preservation efforts.
    • Access is the end goal of any preservation work, and it must be developed in tandem with
      media preservation efforts.
    • Access to preserved holdings is critical to the success of the project and to the realization of its value to the campus.
    • The vision is an era characterized by a wealth of media content preserved long term and made accessible and integrated into campus research and instruction.
    • We live in a watershed moment in which acute challenges demand a coordinated effort to address dramatic technological and cultural changes in the way users access time-based media.
    • Target: high resolution audio preservation and production masters—24 bit, 96 kHz sample rate.
    Some recommendations are that a 10,000-square-foot Indiana Media Preservation and Access Center is built employing 25 staff: administrators, audio and video engineers, film specialists, processing technicians, and IT support. The annual output is projected at 2-3PB of data per year with a total fifteen-year target of 39PB of data storage.  The first year of work will be focused on developing solutions to the challenges posed by legacy media. The second year will begin developing management strategies and workflows for file-based born digital recordings.

    Their guiding principles include: Curatorial Responsibility; Standards and Best Practices; Online Accessibility; Description Services; usability of metadata; copyright strategies; Access Digitizing and Preservation. Their efforts need to be combined into a trusted digital repository.  "Preservation metadata requirements need to be defined, and tools need to be developed to support audio and video preservation package validation, technical metadata capture, and repository ingest."

    This is a long study but is well worth the time to read all the way through it.   This university link also includes other related materials, such as the media preservation survey, and a brochure "Our History is at Risk".

    Friday, September 16, 2011

    Library of Congress To Launch New Corps of Digital Preservation Trainers. Library of Congress.

    Library of Congress To Launch New Corps of Digital Preservation Trainers. Library of Congress.  The Signal.  September 16, 2011. Bill LeFurgy.
    The Digital Preservation Outreach and Education program at the Library of Congress will hold its first national train-the-trainer workshop on September 20-23, 2011, in Washington, DC.

    The DPOE Baseline Workshop will produce a corps of trainers who are equipped to teach others, in their home regions across the U.S., the basic principles and practices of preserving digital materials. Examples of such materials include websites; emails; digital photos, music, and videos; and official records. The intent of the workshop is to share high-quality training in digital preservation, based upon a standardized set of core principles, across the nation.

    Long-term Preservation for Spatial Data Infrastructures: a Metadata Framework and Geo-portal Implementation.

    Long-term Preservation for Spatial Data Infrastructures: a Metadata Framework and Geo-portal Implementation. Arif Shaon, Andrew Woolf. D-Lib Magazine. September/October 2011.
    Geospatial data is increasing, particularly with diverse environmental datasets. Long-term preservation of the data is not typically addressed, but it is very important for current and future use.  Sustained access to environmental data is becoming more important and more difficult because it is increasing so dramatically.
    Without effective long-term preservation, the data face the risk of becoming unusable over time. This article looks at the requirements, particularly metadata, for preserving this data.  The authors have implemented a web-based portal prototype that demonstrates some functions of a preservation interface, such as data discovery using geospatial metadata, data downloading, metadata creation and validation.  There is more to be done in this area.

    Tuesday, September 13, 2011

    Preserving Your Personal Digital Memories

    This is a free online course on preserving your personal digital materials: photos, documents, and other media.  These are fragile and require special care to keep them useable. But preserving digital information is a new concept that most people have little experience with. As new technologies appear for creating and saving our personal digital information, older ones become obsolete, making it difficult to access older content. Learn about the problem with digital preservation of materials and hear about some simple, practical tips and tools to help you keep digital files safe.

    Evaluating Open Source Digital Preservation Systems: A Case Study.

    Evaluating Open Source Digital Preservation Systems: A Case Study.  Angela Jordan. Practical E-Records. August 18, 2011.
    The University of Illinois Archives has implemented the “Practical E-Records Method,” a project that provides recommendations to help  make digital curation and digital preservation systematic institutional functions.  They tested Archivematica, which is essentially "an Ubuntu (Linux) distribution with extensions to support digital preservation actions using a web-based preservation dashboard."  The test "started with elementary electronic records such as Microsoft Office documents and PDFs, then moved to complicated, larger file types, such as audio-visual objects."  Some parts worked well, but there were a number of errors.  "Given the immediate needs of the University Archives, the developing state of Archivematica, and other digital preservation development work taking place within the University Library, we chose not to incorporate the current version into our electronic records work flow."

     There were three remaining concerns:
    1. smaller institutions may lack the hardware or the technological capability to support the system
    2. the installation process is not user friendly
    3. the software is best run from a dedicated virtual server, to which many institutions may not have access.  Running Archivematica on a dedicated virtual machine requires significant help from IT
    The technological ability needed to successfully install and run this system is currently beyond the people who might benefit most. Once some of the issues are worked out in upcoming versions, Archivematica will be useful for smaller institutions that have less IT support than a large research library.

    KRDS Digital Preservation Benefits Analysis Toolkit and KRDS Updates now available.

    KRDS Digital Preservation Benefits Analysis Toolkit and KRDS Updates now available. Neil Beagrie. Website.  05 Aug 2011.
    The Keeping Research Data Safe (KRDS) project  was set up to show the benefits of digital preservation.  The Digital Preservation Benefits Analysis toolkit, now available, has two tools which consist of a detailed guide and worksheet(s):
    1. KRDS Benefits Framework:  an “entry-level” tool requiring less experience and effort to implement and which can also be used as a stand-alone tool for many tasks
    2. Value-chain and Benefits Impact tool: a more advanced tool requiring more experience and effort to implement. It is likely to be most useful in activities such as evaluation and strategic planning.
    The site also has worksheets, guidance documentation and exemplar test cases.

    Thursday, September 08, 2011

    Research Archive Widens Its Public Access—a Bit

    Research Archive Widens Its Public Access—a Bit. Editorial. Technology Review.  7 September 2011.
    JStor, an organization which maintains link to 1,400 journals for subscribing institutions, is providing free public access to articles published prior to 1923 in the United States or before 1870 in other countries, about 6 percent of its content. In a letter to publishers and libraries, JStor refers to plans for "further access to individuals in the future."

    Atempo Digital Archive Helps Simplify and Streamline Broadcast Workflows With Scalable Storage Integration.

    Atempo Digital Archive Helps Simplify and Streamline Broadcast Workflows With Scalable Storage Integration. Press release. Sept. 8, 2011.
    Announcement that Atempo Digital Archive (ADA) has been integrated with MediaGrid, to simplify and streamline broadcast workflows. The software addresses long-term data retention requirements and digital preservation. "Atempo enables organizations to preserve and protect digital information simply and effectively, across any infrastructure, on any platform, over long periods of time. Atempo's comprehensive archiving solutions deliver policy-based and workflow-driven management of rich media files, email and other high-value digital assets to maximize the efficiency and performance of storage systems and reduce long-term storage costs."

    Paper on JPEG 2000 for preservation

    Paper on JPEG 2000 for preservation. Johan van der Knijff.  National Library of the Netherlands (KB). Johans blog. Open Planets Foundation.  June 2011.  This paper, published in D-Lib Magazine, looks at the suitability of the JP2 format for long-term digital preservation.  He identifies issues will be addressed in an amendment to the standard.  It also provides some practical recommendations that may help in mitigating the risks for existing collections.

    Wednesday, September 07, 2011

    A simple JP2 file structure checker

    A simple JP2 file structure checker. Johan van der Knijff.  National Library of the Netherlands (KB). Johans blog. Open Planets Foundation.  1 September 2011.
    The KB is planning to migrate a collection of  TIFF images to JP2. One major risk of such a migration is that hardware failures during the migration process may result in corrupted images. Others have found that corrupted, incomplete JP2 files can still be see as "well-formed and valid" by JHOVE.  This tool was written to detect incomplete code streams in JP2 files.  There are links to the source code of jp2StructCheck, some documentation, and a small data set with some test images. This will be an important tool for digital preservation.

    Sunday, September 04, 2011

    JISC Legal Cloud Computing and the Law Toolkit.

    JISC Legal Cloud Computing and the Law Toolkit. Website. 31 August 2011.
    Documents to help make informed decisions about implementing cloud computing solutions in an  institution. Not specifically for digital preservation, but helpful to think about the policies that will affect data and the life-cycle, such as
    Data Protection, Possession of Data on Termination, etc.  Written for UK educational institutions.

    • Report on Cloud Computing and the Law for UK Further and Higher Education
    • User Guide: Cloud Computing and the Law for IT
    • User Guide: Cloud Computing and the Law for Senior Management and Policy Makers
    • User Guide: Cloud Computing and the Law for Users 
    • User Guide: Cloud Computing Contracts, SLAs and Terms & Conditions of Use 

    Institutional Repository and ETD Bibliography 2011

    Institutional Repository and ETD Bibliography 2011. Charles W. Bailey, Jr.  September 2011.
    This bibliography has over 600 English-language articles, books, and other works about institutional repositories and theses and dissertations (ETDs).  Among other things, it includes digital preservation issues, IR library issues, IR metadata strategies, and institutional open access mandates and policies. Most sources have been published from 2000 through June 30, 2011.  The bibliography includes links to freely available versions of included works.  It is available as a PDF file.

    Saturday, September 03, 2011

    Memory failure detected.

    Memory failure detected. THE: Times Higher Education. 1 September 2011.
    In the future, researchers will have an incomplete record about events taking place in our day: much of the material was never stored or has been only partially archived.  The extent to which content disappears without trace from the web is worrying. Not enough academics are engaging with the topic. "We are taking it for granted that such material will be there, but we need to be attentive. We have a responsibility to future generations of researchers."  "These issues are long term and worthy of investment," The Internet Archive is the most comprehensive of the web archives with more than 150 billion pages from more than 100 million sites [but these are often only partial pages].

    There are also smaller-scale selective archives. Websites are collected around topics, themes or events chosen by library curators, with sites harvested only when the copyright holder's permission has been obtained. The approach lacks breadth, but as the operation is smaller, individual websites can be captured more comprehensively. Most news content that is published only online is simply falling through the cracks. "But the web archiving community's current practices, the report continues, are producing something that is in danger of ending up as a "dusty archive". In this scenario, archiving technology keeps pace with the latest developments and archives are well curated and maintained, but they sit largely unused, gathering "digital dust". "As is too often the case with those who build resources, they are preserving websites without giving any real thought to how they might be used in the future."