Friday, May 29, 2015

Release of Archivematica 1.4 and Storage Service 0.7

Release of Archivematica 1.4 and Storage Service 0.7. Artefactual Website. May 27, 2015.
This version includes a number of bug fixes, new and updated tools plus new features. Some of the  highlights are:

  • CONTENTdm integration for hosted or non-hosted environments
  • DSpace integration, including capture of parent-child relationships
  • Siegfried file identification tool
  • Islandora/Archidora plugin
  • Recovery of a backed up AIP

Thursday, May 28, 2015

File identification tools

File identification tools. Gary . File Formats Blog.
Knowing the version and subtype of a file can be important. And knowing the subtypes can indicate whether or not a file is suitable for archiving.

In Linux or Unix the file command can be used to check the file, its identifiers, character encoding, and possibly the language of a text file.  The basic syntax is file --mime filename.

Tuesday, May 26, 2015

The race to preserve disappearing data

The race to preserve disappearing data. Bina Venkataraman. Boston Globe. May 17, 2015.
"American feature films made before 1950 faced about 50/50 odds of surviving into this century. Many independent and documentary films from the first half of the 20th century, and about 80 percent of the mostly silent movies made in the 1910s and 1920s have been lost." The cloud isn’t yet robust enough for long-term archival of complex datasets and gigantic master movie files. Nor can it keep up with predicted demand.

Three out of four feature movies screened in American cinemas are independent films, Yet most independent filmmakers have neither a plan nor budget to preserve their films. At risk is the archival footage of the future.  More than 60 percent of movie makers do not migrate their digital files to new formats. Rather than dealing with this, people want to pass the responsibility to others. This problem is not unique to the film industry but includes all areas: photos, music, documents, and including scientific research data.A crucial part of science is that researchers must be able to reproduce findings or correct them over time by re-evaluating the original data. Some of the datasets require records that span decades or longer.

A 2013 study of Supreme Court decisions  found that Nearly half of all Supreme Court decisions up to that date and more than 70 percent of law journals from 1999 to 2012 referred to Web pages that no longer existed. While the Library of Congress can preserve digital films if filmmakers share their unencrypted files, less than a dozen filmmakers and studios have done so, and the library has yet to preserve a single born-digital feature-length film. Some groups however are working on this, and digital technology can offer more automated ways to save the digital materials.

Monday, May 25, 2015

Helping Members of the Community Manage Their Digital Lives: Developing a Personal Digital Archiving Workshop

It is estimated that  93 percent of all new information is born digital. Because of the amount of digital material and the ease with which it can be deleted or lost, digital materials are very much at risk. Because libraries and librarians have the means and knowledge to care for digital materials, the responsibility to educate library users about personal digital archiving should fall on library shoulders. One way this can be done is by hosting a personal digital archiving workshop. Some questions to ask in creating such a workshop would be:
  • Who is the audience?
  • What kinds of digital materials do people have? 
  • how do people store their digital belongings now? 
  • what motivates them to maintain personal digital information?  
  • What resources are available?
Some discussion points for the workshop:
  1. Identify what you have and where it is located.
  2. Decide what you want to keep
  3. Organize what you have and how it can be identified
  4. Save the digital materials and manage what you have saved
 It may also be worthwhile in discussing digital estate planning and designating a "digital executor" which should be identified in a legal will.

Friday, May 22, 2015

Saving the digital record

Saving the digital record. Kate Kondayen. Harvard Gazette. May 8, 2015.
Ensuring that digital content lives on past its initial platform is one of the most pressing issues in preservation science. This is an increasingly urgent task. Collections are increasingly coming to libraries with digital material that is already on the brink of decline. Digital degradation doesn’t follow a steady curve like physical items; it may happen suddenly and no one is sure just when that point is.

To retrieve content from an obsolete format, three components are needed: hardware, software, and a skilled technician. Many are turning to digital forensic tools. One open-source tool is XENA (Xml Electronic Normalising for Archives) which recognizes hundreds of old and unusual file formats and migrates them to current standard formats. "Even when the content is retrieved, the original media may need to be retained. Advancements now allow retrieval of content on formats that previously were written off as lost causes."

Thursday, May 21, 2015

Learning from failure: The case of the disappearing Web site

Learning from failure: The case of the disappearing Web site. Francine Barone, David Zeitlyn, Viktor Mayer-Schönberger. First Monday. 4 May 2015.
This paper presents the findings of the Gone Dark Project in 2014 that looks at web sites that disappear and are no longer accessible online. A study in 2012 showed that historically significant social media content decays at a rate of 11 percent within one year, and nearly 30 percent in two years (a rate of .02 percent of resources lost per day). That is compounded by link rot, which a study of academic references showed that over 70 percent of URLs in academic journals and 50 percent found in U.S. Supreme Court opinions have broken or no longer link to the original citation information. The principal concern of the study was with sites which contain substantial or significant content. To understand the the wider landscape, the findings were categorized:
Main types of sites:
  1. Scientific and Academic: Databases, research tools and repositories ranging from the natural to social sciences and humanities. Losses of this type are commonly the result of the end of funding or institutional neglect
  2. Political: Personal homepages of politicians, campaign pages, political speeches and/or repositories of once-public government files.
  3. Historical and Cultural: Includes collated collections of historical documents, genealogies or research portals, as well as more professionally run film, video or music archives.
  4. Specialized project pages or information aggregation sites
  5. Social media:
Main reasons for sites disappearing:
  1. Neglect: Intentional or unintentional neglect is probably the most common reason that a site disappears
  2. Technical: Technical issues are usually bundled with some form of neglect or insufficient financial resources.
  3. Financial: Cost of site maintenance, staff costs, etc.
  4. Natural disaster: Computer hardware is susceptible to fires, floods, rioting and neglect (just as are paper files). 
  5. High-risk situations: Tumultuous political climates are a nightmare for data loss.  
  6. Competition between top Web companies leads to popular services that are abandoned, shut down or absorbed into a larger platform.
 Some sites are going dark across the Web without being archived. "Restoring a single site is a challenging enough task, but when a collection of related Web sites goes dark, prevention strategies are much more difficult to specify (and quantify) as losses can potentially include an entire digital ecosystem of information". Selective archiving is best undertaken by those with firsthand knowledge of essential site. Foresight and intuition for Web preservation is not always coupled with institutional or financial stability. Solutions to the problems of sites going dark will require more awareness from all parties involved. Keeping these disappearing resources requires working with content owners to find permanent homes for at-risk data.

Tuesday, May 19, 2015

Why You Really Need a Digital Asset Management Workflow

Why You Really Need a Digital Asset Management Workflow. Diane Haddad. Blog. May 11, 2015.
Importance of managing your growing digital image collection. Set up a system that fits your needs and accomplishes the basic steps of Digital Photo Management. Digital Asset Management Workflow:
1. Capture images
2. Import images to a single working area
3. Rename image files from the device-generated names to follow standard naming convention
4. Backup images
5. Add appropriate metadata
6. Archive content in a permanent, off-site location
Follow the established workflow. The goal is long-term archiving and access.

Monday, May 18, 2015

Are You Doing Enough to Prevent Link Rot?

Are You Doing Enough to Prevent Link Rot?  Ernie Smith.  Associations Now. May 12, 2015.
Internet web links disappear and make web access and archiving difficult.  There are tools that can help, such as, but what are owners of content doing to keep it in place? Research and academic institutions should be mindful of the technical needs of their underlying content, both old and new, as well as the effects that a redesign can have on a URL structure.  Content preservation requires resources, time commitment, and prioritization. However, "the sooner you create a structure for your information that is flexible and lasting, the easier these changes will be to make down the road." Some tips:
  • Create a well-organized permalink structure that cites the basic subject matter and is self-explanatory
  • Ensure that old links go to the right place.
  • Make old content web-friendly by converting it to a format that is easier to manage on the internet, such as converting documents to html or pdf formats.

Friday, May 15, 2015

What Do We Mean by ‘Preserving Digital Information’? Towards Sound Conceptual Foundations for Digital Stewardship

What Do We Mean by ‘Preserving Digital Information’? Towards Sound Conceptual Foundations for Digital Stewardship. Simone Sacchi. Dissertation University of Illinois at Urbana-Champaign.2015. [PDF]
Preserving digital information is a fundamental concept in digital and data stewardship. This dissertation explains what successfully ‘preserving information’ really is, and provides a framework for understanding when and why failures might happen and how to avoid them. The lack of a formal analysis of digital preservation is problematic. Some notes and quotes from the dissertation:
  • At a high level of generality, bit preservation means enabling the possibility for the same (set of ) bit sequence(s) to be discriminated at different points in time, and, potentially, across changes in the underlying storage technology." 
  • Bit level preservation is a mean, not the goal, in digital stewardship. 
  • As suggested by the OAIS definition of digital preservation, successful digital preservation is about “maintaining” or “preserving” information.
  • Preserving information appears to be a metaphorical expression where a complex set of requirements needs to be satisfied in order for an agent to be presented with intended information
  • The best contemporary theories of digital preservation do not focus on the preservation of any sort of object, but rather on preserving access.
  • it is impossible to preserve a digital document as a physical object. One can only
    preserve the ability to reproduce the document.
  • "You cannot prove that you have preserved the object until you have re–created it in some form that is appropriate for human use or for computer system applications.”
  • “digital records are not stable artefacts”; they last only when certain circumstances are met
  • Bit preservation is only the first required step for successful digital stewardship. Interpreting the bits such that an intended digital material obtains through appropriate performances is essential as well.
  • Successful digital preservation of information can be conceived as sustained and reliable communication mediated by digital technology and agents involved in the communication process.

Thursday, May 14, 2015


Binder. Artefactual Systems and the Museum of Modern Art. Github. May 2015.
Binder is an open source digital repository management application created to meet the digital preservation requirements of museum collections.  Binder aims to facilitate digital collections care, management, and preservation for time-based media and born-digital artworks and is built from integrating functionality of the Archivematica and AtoM projects. The vision is that Binder will help cultural institutions meet the challenges of long-term preservation by providing access to metadata need to inform digital preservation policy and practice.

Currently, Binder is not ready for use in a production environment, and still requires further development for the code to function in a development environment.

Wednesday, May 13, 2015

Robert Darnton closes the book

Robert Darnton closes the book. Corydon Ireland. Harvard Gazette. May 11, 2015.
Article about his retirement. Notes:
  • He and others discussed how to harness the Internet to create a digital library that would “get our cultural heritage available to everyone” for free, leading to the DPLA
  • The goal of the free digital library “was a dream of philosophers of the Enlightenment. We can do what Jefferson only dreamed of. We have the Internet, and he only had the printing press.”
  • Of digital and print: "Both are complementary means of knowledge dispersal and both are thriving."
  • For libraries to prosper requires advancing on two fronts, analog and digital. “We must acquire everything important in all fields of scholarship" along with “electronic outputs of all kinds, partly in cooperation with other libraries.”
  • The future of libraries will require “being connected, and cooperating on a very large scale” regarding acquisition, preservation, and storage.
  • The library still pumps intellectual energy into every corner of campus.

Dataliths vs. the digital dark age

Dataliths vs. the digital dark age. Gary McGath. File Formats Blog. May 4, 2015.
Digital technology has allowed us to store more information at less cost than ever before, but in return this information is very fragile in the long term. The chances that your computer’s disk will be readable in a hundred years are poor. Information needs to be stored in a form that can survive long periods of neglect. We need dataliths; this strategy requires "a storage medium which is highly durable and relatively simple to read. It doesn’t have to push the highest edges of storage density. It should be the modern equivalent of the stone tablet, a datalith."

There are devices which tend in this direction. Millenniata, quartz glass data storage, and others. "Hopefully datalith writers will be available before too long, and after a few years they won’t be outrageously expensive. The records they create will be an important part of the long-term preservation of knowledge."

[M-Discs really are the only solution along these lines at present. They are long lived, inexpensive, easy to create and read, and created according to standards. -cle]

Tuesday, May 12, 2015

A Digital Preservation Environment Maturity Matrix for NSLA Libraries

A Digital Preservation Environment Maturity Matrix for NSLA Libraries. Sarah Slade, David Pearson, Libor Coufal. iPres Proceeduings. October 2014.
The National and State Libraries of Australasia (NSLA) established a Digital Preservation Group to understand the state of digital preservation in the various libraries and to determine the core requirements for managing the preservation of digital collections. They listed and described the functional components of an ideal digital preservation environment and created a matrix of the current stage of development against each component for each NSLA library. Related projects by others include the National Digital Stewardship Alliance and BenchmarkDP.

NSLA Digital Preservation Environment Maturity Matrix
  • Underlying Assumptions
    • actively collecting digital material
    • committed to preserving its digital materials for the long term.
    • staff (or vendor) dedicated to the project
    • intends to comply with OAIS
  • Functional Component
An ideal digital preservation environment should contain a mix of policies, processes and resources (including staff and technologies). The OAIS model calls for organizations to:
  • Negotiate for and accept information from information producers.
  • Obtain sufficient control of the information for long-term preservation.
  • Determine the designated user community.
  • Ensure the information is independently understandable to the designated community without the need of special resources.
  • Follow documented preservation policies and procedures.
  • Make the information available to the designated community.
Instead of just listing the functions, they created a set of questions about the OAIS functionality in order to help responders describe their organization's level of digital preservation maturity, and included the OAIS functions:
  • Pre-ingest Activities
  • Ingest
  • Archival Storage
  • Data Management
  • Administration
  • Digital Preservation Planning
  • Access
  • Maturity Model
The Group modified the Capability Maturity Model, using 5 levels:
  1. Initial. Processes are usually ad hoc. Achievement depends on the competence of the people in the organization and not on the use of proven processes. Products and services usually exceed budget and schedule
  2.  Repeatable. Basic digital preservation processes are established.  Digital preservation achievements are repeatable, though not all activities.
  3. Defined. Digital preservation activities are performed and managed according to documented plans. Processes for digital preservation are established and improved over time. 
  4. Managed. Management can effectively control the digital preservation effort, using precise measurements.  Quantitative quality goal for digital preservation processes
  5. Optimising. Organisation focuses on continually improving process performance. The effects of deployed digital preservation process improvements are measured and evaluated against the quantitative process-improvement objectives.   
"NSLA has identified digital preservation as an area of priority. The importance of this area to NSLA libraries is reflected in the creation of the Digital Preservation Group and its support of the Group’s work to date. The results from the Digital Preservation Environment Maturity Matrix reveal that NSLA libraries are on the right path but have some way to go before digital preservation processes are mature, sustainable and fit for purpose. Collaboration on policies, products and infrastructure will continue to address these needs."

Monday, May 11, 2015

Lessons learned in developing digital preservation tools the right way (and the wrong way).

Lessons learned in developing digital preservation tools the right way (and the wrong way). Paul Wheatley.  iPRES 2014 - Proceedings. October 2014.
There have been difficulties with creating digital preservation tools in the past, including poor technology choices, half measure in adopting open source applications, limited project funding, gaps in capabilities, poor support, and others. The wish list of tools from several years ago hasn't changed much. "Why can't we have digital preservation tools that just work?" But there have been new approaches which focus more on practical applications, sharing ideas with others and engaging with existing projects.

Lessons learned  in developing digital preservation tools:
  • Be agile in the development. Develop, demo, and get feedback from users. Get crude results first, then perfect and polish later.
  • Re-use, don't re-invent. There are often tools outside of the preservation community.Try existing solutions first.
  • Keep it small and simple. Modularize in the face of growing requirements. Make it so the tool can be integrated into others' workflows.
  • Make it easy to use, build on, re-purpose, and maintain. Share the source and package for easy install.
  • Share results, knowledge; learn from others.Let others know what you are doing.

The poster includes:
  • Engage with the community
  • Build on existing work
  • Design for longevity
  • Ally with a more long lived organization as a custodian

Saturday, May 09, 2015

Digital Preservation: Importance of Digitization

Digital Preservation: Importance of Digitization. Heisnam Haridas Mangangcha, Thanga Khunjem Leikai. Manipur Times. May 6, 2015.
Preservation is an umbrella term being applied to a wide variety of collection management responsibilities intended to preserve collections of print and non-print materials for future generations. The preservation of materials ensures that people's thoughts will be accessible to future generations. Digital preservation is the method of keeping digital materials ‘alive’ so that they remain usable as technological advances render hardware and software obsolete.

Friday, May 08, 2015

Digital Curation Tools & Techniques

Digital Curation Tools & Techniques. Nancy McGovern.  Webinar: Metropolitan New York Library Council. April 2015.
This is a recording of Nancy's excellent webinar and it goes along with the management tools website.  As organizations build sustainable digital preservation programs, they work toward being a trusted digital repository by following emerging digital preservation standards. There are now a number of tools to help with technical issues and now some management tools to help with organizational issues. The Digital Preservation Management workshop launched a set of management tools and techniques.

This webinar looks at using a set of digital curation and preservation principles to describe a  framework for developing a digital preservation program. The organizational aspects of digital preservation is more challenging than the technological issues. This helps people to work with tools and techniques, develop good practices, emphasize the organizational aspects and apply the 5 stages:
  1. Acknowledge: understanding that digital preservation is a local concern
  2. Act: initiating digital preservation projects
  3. Consolidate: segueing from project to basic program
  4. Institutionalize: rationalizing local efforts to establish a comprehensive program
  5. Externalize: embracing inter-institutional collaboration and dependency
Each tool and technique is built on the DPM model, the three-legged stool (Organizational Leg, Technological Leg, and Resources Framework) with the five stages of development for a sustainable digital preservation program. There is a tool or technique to help organizations in each of these areas:
  1. Principles: Adopt standards-based principles 
  2. Policy: Develop a high-level policy framework 
  3. Scope: Complete a digital content review to define program scope
  4. Workflow: Document workflows to improve and automate 
  5. Preparedness: Extend disaster preparedness to include digital
  6. Self-assessment: Engage in self-assessment to gauge progress

Thursday, May 07, 2015

Top 50 file formats in the KB e-Depot

Top 50 file formats in the KB e-Depot. Johan van der Knijff. KB Research. April 29, 2015.
Koninklijke Bibliotheek’s digital repository system doesn’t include any tools for automated file format identification yet, but an analysis of the “top 50” most prevalent file extensions shows:
  • The total number of of unique extensions is no less than 1163
  • The most prevalent extension is .gif
  • A long tail of extensions exists (less than 10 file objects), which account for over half of all unique extensions.
The top 10 extensions are:
  1. gif
  2. xml
  3. jpg
  4. sml
  5. pdf
  6. raw
  7. tif
  8. oa3
  9. doc
  10. htm
They grouped all extensions into 12 format categories and checked that the PCs in the reading rooms could render the objects. The conclusion was that most formats in the “Top 50” were sufficiently accessible, though there were some improvements needed.

Quattro Pro for DOS: an obsolete format at last?

Quattro Pro for DOS: an obsolete format at last? Johan van der Knijff. Johan's Blog. October 29, 2014.
Account of researcher at the KB looking at old Quattro Pro for DOS spreadsheet and trying to access them using modern-day software. The Quattro Pro software was released in 1988. Support for Quattro Pro spreadsheets was removed altogether from more recent versions of MS Excel. LibreOffice / OpenOffice support older versions, but not the older Quattro Pro for DOS. Apart from Quattro Pro, which seemed to have some difficulties with the spreadsheets, modern spreadsheet programs offer no support for Quattro Pro for DOS spreadsheets. "The most recent version of Quattro Pro still reads both DOS era formats, although there are some problems. Some of these are formatting-related."

There were some additional tests with the Quattro Pro files. In addition, it appears that Lotus 1-2-3 spreadsheets may also be more problematic than I initially thought.

Wednesday, May 06, 2015

Preparing the Workforce for Digital Curation

Preparing the Workforce for Digital Curation. The National Academies Press. 2015.  
This 105 page report focuses on the need for digital curation education and training in order to provide meaningful use of digital information, now and in the future.  [PDF version] This study defines digital curation as: “The active management and enhancement of digital information assets for current and future use.” Digital curation is more than preserving the digital information in secure storage because curation may add value to digital information and increase its utility.

Digital curation is similar to traditional curation. "Regardless of whether a collection is physical or digital, a curator must appraise its value and relevance to the community of potential users; determine the need for preservation; document provenance and authenticity; describe, register, and catalog its content; arrange for long-term storage and preservation; and provide a means for access and use." But it also has many new challenges: the quantities of material to be curated, the need for active and ongoing management, continually changing uses and technology, and the diversity of organizational contexts in which curation occurs. It is more than simply collecting and storing data and information. Active management denotes planned, systematic, coordinated, purposeful, and directed actions that make digital information fit for a purpose. And to ensure that digital information will remain discoverable, accessible, and useable for as long as users have a need and a right to use it.

A new pattern of data usage puts a greater emphasis on the standardization of digital curation practices so that the data can be shared more easily.  Archiving digital data requires a more active management approach, and a more collaborative partnership between producers, archivists and users.

The Loss of Cultural Heritage Through Deterioration of Records and Technological Change: Sound recordings are a striking example of cultural heritage data at high risk of loss. These include music, oral histories, and radio broadcasts preserved in a wide variety of formats and media.

Some benefits of digital curation include:
  • Increased collaboration and cost sharing;
  • Greater use of data in teaching and research training;
  • New opportunities and uses for data, including data mining;
  • Creation of a more complete record of research;
  • Creation of new areas of research, new industries, or new support services.
Some principal conclusions:
  1. Significant opportunities exist to embed digital curation deeply into an organization’s practices to reduce costs and increase benefits. Digital curation will be increasingly in demand across many sectors of society.
  2. Digital curation can be advanced by various organizations that can serve as leaders, models, and sources of good curation practices, and build trust by preserving assets.
  3. Some barriers to digital curation include: lack of sharing of resources and insufficient resources.
  4. There is a need to identify, segregate, and measure the costs of curation tasks in scientific research and business processes.
  5. Standards and existing practices vary greatly, which can lead to a lack of coordination across different sectors. This in turn can lead to limited adoption of consistent standards for digital curation and fragmented dissemination of good practices.
  6. Automation of at least some digital curation tasks is desirable
  7. The knowledge and skills required of those engaged in digital curation are dynamic and highly interdisciplinary.
Some recommendations include:
  1. Research communities, educational institutions, and others should work together to develop and adopt digital curation standards and good practices.
  2. Work to identify and predict the costs associated with digital curation.
  3. Organizations should identify, explain, and measure the benefits derived from digital curation

Monday, May 04, 2015

A Biological Perspective on Digital Preservation

A Biological Perspective on Digital Preservation. Michael J. Pocklington, et al. iPres Conference Proceedings. October, 2014.
Successful preservation of digital objects requires a solid theoretical framework, which treats the objects as containers of information, exactly as in the genomes of organisms.  This looks at the similarities of biological and digital ecosystems. In both cases, functional information is identifiable in principle by the consequence of actions. Interaction maps are dependency networks objects withing their environment. The significant environment information in the digital ecosystem relates to the object, resource usage, software dependencies, and other digital objects, which all related to the usability and survivability of the objects. The poster looks at an application of the theoretical background and includes first results from a case-study on a software-based art preservation scenario.

Friday, May 01, 2015

Digitization and the Preservation of Knowledge

Digitization and the Preservation of Knowledge. The Media Preservation Initiative at Indiana University Bloomington. October 10. 2013.
Indiana University President Michael McRobbie, in announcing the Media Digitization and Preservation Initiative, said in an address: “For over 25 centuries, the great universities of the world have always had three fundamental missions:
  • the creation of knowledge (that is, research and innovation),
  • the dissemination of knowledge (that is, education and learning), and
  • the preservation of knowledge.
We tend, these days, to mainly associate the first two of these missions with a university."

The advent of the digital age is giving importance of the third mission "the preservation of knowledge". Previously, the preservation of knowledge was the almost exclusive mission of libraries and museums. Digital material is vital to fully realizing the promise of online education. The digital collections are a large investment over many years. The digital collections will continue to grow, as will the scholarly dialog concerning them, which "defines the character, values and heritage of an institution." Preserving the collections in perpetuity fully maximizes the value of all these collections to the university and the community. The transformation of the third mission of universities from the physical to the virtual world of digitization is both essential and irreversible.

Also announced was an initiative would draw all digitization efforts together into a true university-wide strategy. The goal of this Digitization Master Plan is to digitize and store in some form all the existing collections of lasting importance to research and scholarship, and to ensure the preservation of all new research and scholarship that is born digital. This will support research, education and the preservation of knowledge.