Thursday, May 28, 2015

File identification tools

File identification tools. Gary . File Formats Blog.
Knowing the version and subtype of a file can be important. And knowing the subtypes can indicate whether or not a file is suitable for archiving.

In Linux or Unix the file command can be used to check the file, its identifiers, character encoding, and possibly the language of a text file.  The basic syntax is file --mime filename.

Tuesday, May 26, 2015

The race to preserve disappearing data

The race to preserve disappearing data. Bina Venkataraman. Boston Globe. May 17, 2015.
"American feature films made before 1950 faced about 50/50 odds of surviving into this century. Many independent and documentary films from the first half of the 20th century, and about 80 percent of the mostly silent movies made in the 1910s and 1920s have been lost." The cloud isn’t yet robust enough for long-term archival of complex datasets and gigantic master movie files. Nor can it keep up with predicted demand.

Three out of four feature movies screened in American cinemas are independent films, Yet most independent filmmakers have neither a plan nor budget to preserve their films. At risk is the archival footage of the future.  More than 60 percent of movie makers do not migrate their digital files to new formats. Rather than dealing with this, people want to pass the responsibility to others. This problem is not unique to the film industry but includes all areas: photos, music, documents, and including scientific research data.A crucial part of science is that researchers must be able to reproduce findings or correct them over time by re-evaluating the original data. Some of the datasets require records that span decades or longer.

A 2013 study of Supreme Court decisions  found that Nearly half of all Supreme Court decisions up to that date and more than 70 percent of law journals from 1999 to 2012 referred to Web pages that no longer existed. While the Library of Congress can preserve digital films if filmmakers share their unencrypted files, less than a dozen filmmakers and studios have done so, and the library has yet to preserve a single born-digital feature-length film. Some groups however are working on this, and digital technology can offer more automated ways to save the digital materials.

Monday, May 25, 2015

Helping Members of the Community Manage Their Digital Lives: Developing a Personal Digital Archiving Workshop

It is estimated that  93 percent of all new information is born digital. Because of the amount of digital material and the ease with which it can be deleted or lost, digital materials are very much at risk. Because libraries and librarians have the means and knowledge to care for digital materials, the responsibility to educate library users about personal digital archiving should fall on library shoulders. One way this can be done is by hosting a personal digital archiving workshop. Some questions to ask in creating such a workshop would be:
  • Who is the audience?
  • What kinds of digital materials do people have? 
  • how do people store their digital belongings now? 
  • what motivates them to maintain personal digital information?  
  • What resources are available?
Some discussion points for the workshop:
  1. Identify what you have and where it is located.
  2. Decide what you want to keep
  3. Organize what you have and how it can be identified
  4. Save the digital materials and manage what you have saved
 It may also be worthwhile in discussing digital estate planning and designating a "digital executor" which should be identified in a legal will.

Friday, May 22, 2015

Saving the digital record

Saving the digital record. Kate Kondayen. Harvard Gazette. May 8, 2015.
Ensuring that digital content lives on past its initial platform is one of the most pressing issues in preservation science. This is an increasingly urgent task. Collections are increasingly coming to libraries with digital material that is already on the brink of decline. Digital degradation doesn’t follow a steady curve like physical items; it may happen suddenly and no one is sure just when that point is.

To retrieve content from an obsolete format, three components are needed: hardware, software, and a skilled technician. Many are turning to digital forensic tools. One open-source tool is XENA (Xml Electronic Normalising for Archives) which recognizes hundreds of old and unusual file formats and migrates them to current standard formats. "Even when the content is retrieved, the original media may need to be retained. Advancements now allow retrieval of content on formats that previously were written off as lost causes."

Thursday, May 21, 2015

Learning from failure: The case of the disappearing Web site

Learning from failure: The case of the disappearing Web site. Francine Barone, David Zeitlyn, Viktor Mayer-Schönberger. First Monday. 4 May 2015.
This paper presents the findings of the Gone Dark Project in 2014 that looks at web sites that disappear and are no longer accessible online. A study in 2012 showed that historically significant social media content decays at a rate of 11 percent within one year, and nearly 30 percent in two years (a rate of .02 percent of resources lost per day). That is compounded by link rot, which a study of academic references showed that over 70 percent of URLs in academic journals and 50 percent found in U.S. Supreme Court opinions have broken or no longer link to the original citation information. The principal concern of the study was with sites which contain substantial or significant content. To understand the the wider landscape, the findings were categorized:
Main types of sites:
  1. Scientific and Academic: Databases, research tools and repositories ranging from the natural to social sciences and humanities. Losses of this type are commonly the result of the end of funding or institutional neglect
  2. Political: Personal homepages of politicians, campaign pages, political speeches and/or repositories of once-public government files.
  3. Historical and Cultural: Includes collated collections of historical documents, genealogies or research portals, as well as more professionally run film, video or music archives.
  4. Specialized project pages or information aggregation sites
  5. Social media:
Main reasons for sites disappearing:
  1. Neglect: Intentional or unintentional neglect is probably the most common reason that a site disappears
  2. Technical: Technical issues are usually bundled with some form of neglect or insufficient financial resources.
  3. Financial: Cost of site maintenance, staff costs, etc.
  4. Natural disaster: Computer hardware is susceptible to fires, floods, rioting and neglect (just as are paper files). 
  5. High-risk situations: Tumultuous political climates are a nightmare for data loss.  
  6. Competition between top Web companies leads to popular services that are abandoned, shut down or absorbed into a larger platform.
 Some sites are going dark across the Web without being archived. "Restoring a single site is a challenging enough task, but when a collection of related Web sites goes dark, prevention strategies are much more difficult to specify (and quantify) as losses can potentially include an entire digital ecosystem of information". Selective archiving is best undertaken by those with firsthand knowledge of essential site. Foresight and intuition for Web preservation is not always coupled with institutional or financial stability. Solutions to the problems of sites going dark will require more awareness from all parties involved. Keeping these disappearing resources requires working with content owners to find permanent homes for at-risk data.

Tuesday, May 19, 2015

Why You Really Need a Digital Asset Management Workflow

Why You Really Need a Digital Asset Management Workflow. Diane Haddad. Blog. May 11, 2015.
Importance of managing your growing digital image collection. Set up a system that fits your needs and accomplishes the basic steps of Digital Photo Management. Digital Asset Management Workflow:
1. Capture images
2. Import images to a single working area
3. Rename image files from the device-generated names to follow standard naming convention
4. Backup images
5. Add appropriate metadata
6. Archive content in a permanent, off-site location
Follow the established workflow. The goal is long-term archiving and access.

Monday, May 18, 2015

Are You Doing Enough to Prevent Link Rot?

Are You Doing Enough to Prevent Link Rot?  Ernie Smith.  Associations Now. May 12, 2015.
Internet web links disappear and make web access and archiving difficult.  There are tools that can help, such as, but what are owners of content doing to keep it in place? Research and academic institutions should be mindful of the technical needs of their underlying content, both old and new, as well as the effects that a redesign can have on a URL structure.  Content preservation requires resources, time commitment, and prioritization. However, "the sooner you create a structure for your information that is flexible and lasting, the easier these changes will be to make down the road." Some tips:
  • Create a well-organized permalink structure that cites the basic subject matter and is self-explanatory
  • Ensure that old links go to the right place.
  • Make old content web-friendly by converting it to a format that is easier to manage on the internet, such as converting documents to html or pdf formats.

Friday, May 15, 2015

What Do We Mean by ‘Preserving Digital Information’? Towards Sound Conceptual Foundations for Digital Stewardship

What Do We Mean by ‘Preserving Digital Information’? Towards Sound Conceptual Foundations for Digital Stewardship. Simone Sacchi. Dissertation University of Illinois at Urbana-Champaign.2015. [PDF]
Preserving digital information is a fundamental concept in digital and data stewardship. This dissertation explains what successfully ‘preserving information’ really is, and provides a framework for understanding when and why failures might happen and how to avoid them. The lack of a formal analysis of digital preservation is problematic. Some notes and quotes from the dissertation:
  • At a high level of generality, bit preservation means enabling the possibility for the same (set of ) bit sequence(s) to be discriminated at different points in time, and, potentially, across changes in the underlying storage technology." 
  • Bit level preservation is a mean, not the goal, in digital stewardship. 
  • As suggested by the OAIS definition of digital preservation, successful digital preservation is about “maintaining” or “preserving” information.
  • Preserving information appears to be a metaphorical expression where a complex set of requirements needs to be satisfied in order for an agent to be presented with intended information
  • The best contemporary theories of digital preservation do not focus on the preservation of any sort of object, but rather on preserving access.
  • it is impossible to preserve a digital document as a physical object. One can only
    preserve the ability to reproduce the document.
  • "You cannot prove that you have preserved the object until you have re–created it in some form that is appropriate for human use or for computer system applications.”
  • “digital records are not stable artefacts”; they last only when certain circumstances are met
  • Bit preservation is only the first required step for successful digital stewardship. Interpreting the bits such that an intended digital material obtains through appropriate performances is essential as well.
  • Successful digital preservation of information can be conceived as sustained and reliable communication mediated by digital technology and agents involved in the communication process.

Thursday, May 14, 2015


Binder. Artefactual Systems and the Museum of Modern Art. Github. May 2015.
Binder is an open source digital repository management application created to meet the digital preservation requirements of museum collections.  Binder aims to facilitate digital collections care, management, and preservation for time-based media and born-digital artworks and is built from integrating functionality of the Archivematica and AtoM projects. The vision is that Binder will help cultural institutions meet the challenges of long-term preservation by providing access to metadata need to inform digital preservation policy and practice.

Currently, Binder is not ready for use in a production environment, and still requires further development for the code to function in a development environment.

Wednesday, May 13, 2015

Robert Darnton closes the book

Robert Darnton closes the book. Corydon Ireland. Harvard Gazette. May 11, 2015.
Article about his retirement. Notes:
  • He and others discussed how to harness the Internet to create a digital library that would “get our cultural heritage available to everyone” for free, leading to the DPLA
  • The goal of the free digital library “was a dream of philosophers of the Enlightenment. We can do what Jefferson only dreamed of. We have the Internet, and he only had the printing press.”
  • Of digital and print: "Both are complementary means of knowledge dispersal and both are thriving."
  • For libraries to prosper requires advancing on two fronts, analog and digital. “We must acquire everything important in all fields of scholarship" along with “electronic outputs of all kinds, partly in cooperation with other libraries.”
  • The future of libraries will require “being connected, and cooperating on a very large scale” regarding acquisition, preservation, and storage.
  • The library still pumps intellectual energy into every corner of campus.

Dataliths vs. the digital dark age

Dataliths vs. the digital dark age. Gary McGath. File Formats Blog. May 4, 2015.
Digital technology has allowed us to store more information at less cost than ever before, but in return this information is very fragile in the long term. The chances that your computer’s disk will be readable in a hundred years are poor. Information needs to be stored in a form that can survive long periods of neglect. We need dataliths; this strategy requires "a storage medium which is highly durable and relatively simple to read. It doesn’t have to push the highest edges of storage density. It should be the modern equivalent of the stone tablet, a datalith."

There are devices which tend in this direction. Millenniata, quartz glass data storage, and others. "Hopefully datalith writers will be available before too long, and after a few years they won’t be outrageously expensive. The records they create will be an important part of the long-term preservation of knowledge."

[M-Discs really are the only solution along these lines at present. They are long lived, inexpensive, easy to create and read, and created according to standards. -cle]

Tuesday, May 12, 2015

A Digital Preservation Environment Maturity Matrix for NSLA Libraries

A Digital Preservation Environment Maturity Matrix for NSLA Libraries. Sarah Slade, David Pearson, Libor Coufal. iPres Proceeduings. October 2014.
The National and State Libraries of Australasia (NSLA) established a Digital Preservation Group to understand the state of digital preservation in the various libraries and to determine the core requirements for managing the preservation of digital collections. They listed and described the functional components of an ideal digital preservation environment and created a matrix of the current stage of development against each component for each NSLA library. Related projects by others include the National Digital Stewardship Alliance and BenchmarkDP.

NSLA Digital Preservation Environment Maturity Matrix
  • Underlying Assumptions
    • actively collecting digital material
    • committed to preserving its digital materials for the long term.
    • staff (or vendor) dedicated to the project
    • intends to comply with OAIS
  • Functional Component
An ideal digital preservation environment should contain a mix of policies, processes and resources (including staff and technologies). The OAIS model calls for organizations to:
  • Negotiate for and accept information from information producers.
  • Obtain sufficient control of the information for long-term preservation.
  • Determine the designated user community.
  • Ensure the information is independently understandable to the designated community without the need of special resources.
  • Follow documented preservation policies and procedures.
  • Make the information available to the designated community.
Instead of just listing the functions, they created a set of questions about the OAIS functionality in order to help responders describe their organization's level of digital preservation maturity, and included the OAIS functions:
  • Pre-ingest Activities
  • Ingest
  • Archival Storage
  • Data Management
  • Administration
  • Digital Preservation Planning
  • Access
  • Maturity Model
The Group modified the Capability Maturity Model, using 5 levels:
  1. Initial. Processes are usually ad hoc. Achievement depends on the competence of the people in the organization and not on the use of proven processes. Products and services usually exceed budget and schedule
  2.  Repeatable. Basic digital preservation processes are established.  Digital preservation achievements are repeatable, though not all activities.
  3. Defined. Digital preservation activities are performed and managed according to documented plans. Processes for digital preservation are established and improved over time. 
  4. Managed. Management can effectively control the digital preservation effort, using precise measurements.  Quantitative quality goal for digital preservation processes
  5. Optimising. Organisation focuses on continually improving process performance. The effects of deployed digital preservation process improvements are measured and evaluated against the quantitative process-improvement objectives.   
"NSLA has identified digital preservation as an area of priority. The importance of this area to NSLA libraries is reflected in the creation of the Digital Preservation Group and its support of the Group’s work to date. The results from the Digital Preservation Environment Maturity Matrix reveal that NSLA libraries are on the right path but have some way to go before digital preservation processes are mature, sustainable and fit for purpose. Collaboration on policies, products and infrastructure will continue to address these needs."

Monday, May 11, 2015

Lessons learned in developing digital preservation tools the right way (and the wrong way).

Lessons learned in developing digital preservation tools the right way (and the wrong way). Paul Wheatley.  iPRES 2014 - Proceedings. October 2014.
There have been difficulties with creating digital preservation tools in the past, including poor technology choices, half measure in adopting open source applications, limited project funding, gaps in capabilities, poor support, and others. The wish list of tools from several years ago hasn't changed much. "Why can't we have digital preservation tools that just work?" But there have been new approaches which focus more on practical applications, sharing ideas with others and engaging with existing projects.

Lessons learned  in developing digital preservation tools:
  • Be agile in the development. Develop, demo, and get feedback from users. Get crude results first, then perfect and polish later.
  • Re-use, don't re-invent. There are often tools outside of the preservation community.Try existing solutions first.
  • Keep it small and simple. Modularize in the face of growing requirements. Make it so the tool can be integrated into others' workflows.
  • Make it easy to use, build on, re-purpose, and maintain. Share the source and package for easy install.
  • Share results, knowledge; learn from others.Let others know what you are doing.

The poster includes:
  • Engage with the community
  • Build on existing work
  • Design for longevity
  • Ally with a more long lived organization as a custodian

Saturday, May 09, 2015

Digital Preservation: Importance of Digitization

Digital Preservation: Importance of Digitization. Heisnam Haridas Mangangcha, Thanga Khunjem Leikai. Manipur Times. May 6, 2015.
Preservation is an umbrella term being applied to a wide variety of collection management responsibilities intended to preserve collections of print and non-print materials for future generations. The preservation of materials ensures that people's thoughts will be accessible to future generations. Digital preservation is the method of keeping digital materials ‘alive’ so that they remain usable as technological advances render hardware and software obsolete.

Friday, May 08, 2015

Digital Curation Tools & Techniques

Digital Curation Tools & Techniques. Nancy McGovern.  Webinar: Metropolitan New York Library Council. April 2015.
This is a recording of Nancy's excellent webinar and it goes along with the management tools website.  As organizations build sustainable digital preservation programs, they work toward being a trusted digital repository by following emerging digital preservation standards. There are now a number of tools to help with technical issues and now some management tools to help with organizational issues. The Digital Preservation Management workshop launched a set of management tools and techniques.

This webinar looks at using a set of digital curation and preservation principles to describe a  framework for developing a digital preservation program. The organizational aspects of digital preservation is more challenging than the technological issues. This helps people to work with tools and techniques, develop good practices, emphasize the organizational aspects and apply the 5 stages:
  1. Acknowledge: understanding that digital preservation is a local concern
  2. Act: initiating digital preservation projects
  3. Consolidate: segueing from project to basic program
  4. Institutionalize: rationalizing local efforts to establish a comprehensive program
  5. Externalize: embracing inter-institutional collaboration and dependency
Each tool and technique is built on the DPM model, the three-legged stool (Organizational Leg, Technological Leg, and Resources Framework) with the five stages of development for a sustainable digital preservation program. There is a tool or technique to help organizations in each of these areas:
  1. Principles: Adopt standards-based principles 
  2. Policy: Develop a high-level policy framework 
  3. Scope: Complete a digital content review to define program scope
  4. Workflow: Document workflows to improve and automate 
  5. Preparedness: Extend disaster preparedness to include digital
  6. Self-assessment: Engage in self-assessment to gauge progress

Thursday, May 07, 2015

Top 50 file formats in the KB e-Depot

Top 50 file formats in the KB e-Depot. Johan van der Knijff. KB Research. April 29, 2015.
Koninklijke Bibliotheek’s digital repository system doesn’t include any tools for automated file format identification yet, but an analysis of the “top 50” most prevalent file extensions shows:
  • The total number of of unique extensions is no less than 1163
  • The most prevalent extension is .gif
  • A long tail of extensions exists (less than 10 file objects), which account for over half of all unique extensions.
The top 10 extensions are:
  1. gif
  2. xml
  3. jpg
  4. sml
  5. pdf
  6. raw
  7. tif
  8. oa3
  9. doc
  10. htm
They grouped all extensions into 12 format categories and checked that the PCs in the reading rooms could render the objects. The conclusion was that most formats in the “Top 50” were sufficiently accessible, though there were some improvements needed.

Quattro Pro for DOS: an obsolete format at last?

Quattro Pro for DOS: an obsolete format at last? Johan van der Knijff. Johan's Blog. October 29, 2014.
Account of researcher at the KB looking at old Quattro Pro for DOS spreadsheet and trying to access them using modern-day software. The Quattro Pro software was released in 1988. Support for Quattro Pro spreadsheets was removed altogether from more recent versions of MS Excel. LibreOffice / OpenOffice support older versions, but not the older Quattro Pro for DOS. Apart from Quattro Pro, which seemed to have some difficulties with the spreadsheets, modern spreadsheet programs offer no support for Quattro Pro for DOS spreadsheets. "The most recent version of Quattro Pro still reads both DOS era formats, although there are some problems. Some of these are formatting-related."

There were some additional tests with the Quattro Pro files. In addition, it appears that Lotus 1-2-3 spreadsheets may also be more problematic than I initially thought.

Wednesday, May 06, 2015

Preparing the Workforce for Digital Curation

Preparing the Workforce for Digital Curation. The National Academies Press. 2015.  
This 105 page report focuses on the need for digital curation education and training in order to provide meaningful use of digital information, now and in the future.  [PDF version] This study defines digital curation as: “The active management and enhancement of digital information assets for current and future use.” Digital curation is more than preserving the digital information in secure storage because curation may add value to digital information and increase its utility.

Digital curation is similar to traditional curation. "Regardless of whether a collection is physical or digital, a curator must appraise its value and relevance to the community of potential users; determine the need for preservation; document provenance and authenticity; describe, register, and catalog its content; arrange for long-term storage and preservation; and provide a means for access and use." But it also has many new challenges: the quantities of material to be curated, the need for active and ongoing management, continually changing uses and technology, and the diversity of organizational contexts in which curation occurs. It is more than simply collecting and storing data and information. Active management denotes planned, systematic, coordinated, purposeful, and directed actions that make digital information fit for a purpose. And to ensure that digital information will remain discoverable, accessible, and useable for as long as users have a need and a right to use it.

A new pattern of data usage puts a greater emphasis on the standardization of digital curation practices so that the data can be shared more easily.  Archiving digital data requires a more active management approach, and a more collaborative partnership between producers, archivists and users.

The Loss of Cultural Heritage Through Deterioration of Records and Technological Change: Sound recordings are a striking example of cultural heritage data at high risk of loss. These include music, oral histories, and radio broadcasts preserved in a wide variety of formats and media.

Some benefits of digital curation include:
  • Increased collaboration and cost sharing;
  • Greater use of data in teaching and research training;
  • New opportunities and uses for data, including data mining;
  • Creation of a more complete record of research;
  • Creation of new areas of research, new industries, or new support services.
Some principal conclusions:
  1. Significant opportunities exist to embed digital curation deeply into an organization’s practices to reduce costs and increase benefits. Digital curation will be increasingly in demand across many sectors of society.
  2. Digital curation can be advanced by various organizations that can serve as leaders, models, and sources of good curation practices, and build trust by preserving assets.
  3. Some barriers to digital curation include: lack of sharing of resources and insufficient resources.
  4. There is a need to identify, segregate, and measure the costs of curation tasks in scientific research and business processes.
  5. Standards and existing practices vary greatly, which can lead to a lack of coordination across different sectors. This in turn can lead to limited adoption of consistent standards for digital curation and fragmented dissemination of good practices.
  6. Automation of at least some digital curation tasks is desirable
  7. The knowledge and skills required of those engaged in digital curation are dynamic and highly interdisciplinary.
Some recommendations include:
  1. Research communities, educational institutions, and others should work together to develop and adopt digital curation standards and good practices.
  2. Work to identify and predict the costs associated with digital curation.
  3. Organizations should identify, explain, and measure the benefits derived from digital curation

Monday, May 04, 2015

A Biological Perspective on Digital Preservation

A Biological Perspective on Digital Preservation. Michael J. Pocklington, et al. iPres Conference Proceedings. October, 2014.
Successful preservation of digital objects requires a solid theoretical framework, which treats the objects as containers of information, exactly as in the genomes of organisms.  This looks at the similarities of biological and digital ecosystems. In both cases, functional information is identifiable in principle by the consequence of actions. Interaction maps are dependency networks objects withing their environment. The significant environment information in the digital ecosystem relates to the object, resource usage, software dependencies, and other digital objects, which all related to the usability and survivability of the objects. The poster looks at an application of the theoretical background and includes first results from a case-study on a software-based art preservation scenario.

Friday, May 01, 2015

Digitization and the Preservation of Knowledge

Digitization and the Preservation of Knowledge. The Media Preservation Initiative at Indiana University Bloomington. October 10. 2013.
Indiana University President Michael McRobbie, in announcing the Media Digitization and Preservation Initiative, said in an address: “For over 25 centuries, the great universities of the world have always had three fundamental missions:
  • the creation of knowledge (that is, research and innovation),
  • the dissemination of knowledge (that is, education and learning), and
  • the preservation of knowledge.
We tend, these days, to mainly associate the first two of these missions with a university."

The advent of the digital age is giving importance of the third mission "the preservation of knowledge". Previously, the preservation of knowledge was the almost exclusive mission of libraries and museums. Digital material is vital to fully realizing the promise of online education. The digital collections are a large investment over many years. The digital collections will continue to grow, as will the scholarly dialog concerning them, which "defines the character, values and heritage of an institution." Preserving the collections in perpetuity fully maximizes the value of all these collections to the university and the community. The transformation of the third mission of universities from the physical to the virtual world of digitization is both essential and irreversible.

Also announced was an initiative would draw all digitization efforts together into a true university-wide strategy. The goal of this Digitization Master Plan is to digitize and store in some form all the existing collections of lasting importance to research and scholarship, and to ensure the preservation of all new research and scholarship that is born digital. This will support research, education and the preservation of knowledge.

Thursday, April 30, 2015

Why Media Preservation Can’t Wait: the Gathering Storm

Why Media Preservation Can’t Wait: the Gathering Storm. Mike Casey. International Association of Sound and Audiovisual Archives, Journal. January 2015. [Slide presentation]

Media preservation has reached a crisis point for content on physical audio and video
formats. Archival media collections could soon be considered highly endangered. The US National Recording Preservation Board: “ is alarming to realize that nearly all recorded sound is in peril of disappearing or becoming inaccessible within a few generations.” There is a major risk that obsolescence will defeat the efforts of archivists. What is the problem?
  • Large numbers of analog and physical digital recordings
  • Recordings are degrading, some catastrophically
    • For some formats degradation issues are critical
    • Degradation of physical recordings must be addressed before digitization
  • Obsolete audio and video formats
    • All analog and physical digital recordings are now obsolete
    • Playback systems are failing; parts are lacking, and repairs are becoming more difficult.
    • Without functioning systems, digitizing existing recordings is not possible
    • Evolution of obsolescence:
      • End of manufacturing
      • End of availability in the commercial marketplace
      • End of bench technician expertise
      • End of bench technician tools
      • End of calibration and alignment tapes
      • End of parts and supplies
      • End of availability in the used marketplace
      • End of playback expertise
  • There is a relatively short time window to save these recordings
  • The recordings contain content with high research value
The combination of degradation and obsolescence severely undermines preservation efforts. It may still be possible in a few years to digitize audio and video, but digitizing large holdings then may not be affordable. While not every recording is an appropriate candidate for long-term preservation,  many recordings and collections do carry significant value. If these items are to survive they must be digitally preserved "within the next 15 to 20 years - before sound carrier degradation and the challenges of acquiring and maintaining playback equipment make the success of these efforts too expensive or unattainable.” Some institutions are digitizing their recordings now, realizing that they cannot afford to wait until planning is completed or everything is perfectly in place to begin work.

Wednesday, April 29, 2015

Legal Aspects for Digital Preservation Domain

Legal Aspects for Digital Preservation Domain. Barbara Kolany-Raiser, Marzieh Bakhshandeh, José Borbinha, Silviya Yankova. iPres Proceedings. 2014. 
This paper proposes a legal model for the digital preservation domain. This is intended to include different perspectives and facilitate the translation and mapping of legal information in the digital preservation area. A legal perspective is important for technology developments, and when copyright protected data has to be preserved digitally, care must be taken so that the digital preservation system processes do not violate this right. The rights holder for the data must explicitly grant use and preservation rights. "Every digital preservation activity must ensure the authenticity and legitimacy of the performed actions and processes." The paper recommends integrating legal perspectives into the digital preservation process, and it includes a conceptual map of this legal perspective describing the concepts and the relationships.

Tuesday, April 28, 2015

Database Preservation Toolkit

Database Preservation Toolkit. Website. April 2015.
The Database Preservation Toolkit uses input and output modules and allows conversion between database formats, including connection to live systems. It allows conversion of live or backed-up databases into preservation formats such as DBML, SIARD, or XML-based formats created for the purpose of database preservation.

This toolkit was part of the RODA project and now has been released as a separate project. The site includes download links and related publications and presentations.

Preserving digital records and databases

Preserving digital records and databases. Luis Faria. PASIG Presentation. March 13, 2015.
Presentation on tools and models for database preservation. Diagram of  import and export flow using db-preservation-toolkit, as well as their model and the OAIS model. Throughput of row-intensive databases is 10.000 rows/s. Use the SIARD format for preservation. The SIARD-E version is underdevelopment.

Monday, April 27, 2015

Finnish Digital Preservation Service for Cultural Heritage

Finnish Digital Preservation Service for Cultural Heritage. Mikko Tiainen. PASIG Presentation. March 12, 2015. [PDF]
Preservation aspects and focus of the Digital Preservation Service:
  • Semantic preservation:
    • Content knowledge and semantics
    • Descriptive metadata
  • Logical preservation
    • Preservation planning
    • Administrative & technical metadata
    • File formats
    • Preservation actions
  •  Bit-level preservation
    • Materials & replication management
    • Storage device
    • Storage media
They estimate the life cycle of the various components as:
  • Hardware
    • Hard disk storage: 5 years
    • Tape drives & media types: 5 years
    • Tape libraries: 10 years
  • Software
    • Commercial support at  least for 5 years
    • Open source maintained and developed until replaced 
Development of a high quality digital preservation system is a continuous process. 
  • Bit-level preservation
  • Preservation planning and development of preservation actions
  • Preserving the intelligibility
  • Distributed locations
This requires support services, maintenance of specifications, and ongoing management.

Chronopolis and DuraCloud: Doing integration right

Chronopolis and DuraCloud: Doing integration right. Bill Branan, David Minor. PASIG Presentation. March 12, 2015. [PDF]
Duracloud is a hosted digital preservation service. Chronopolis is a digital preservation storage network spanning multiple institutions and geographic regionsbased on active preservation (constant checking of items). The reasons for integrating the services and becoming a DPN node:
  • Digital content preservation is important to the future of society
  • All preserved digital content should be handled equally
  • Need an economically viable option to support the preservation needs of all institutions, regardless of size or technical capability
  • Need to simplify the preservation process as much as possible
These are two very independent existing systems with different workflows and processes. DuraCloud works with real-time data, and Chronopolis works with well defined data collections. Sometimes the best way to integrate two systems is to not require either system to know anything about the other.

Saturday, April 25, 2015

MoMA’s Digital Art Vault

MoMA’s Digital Art Vault. Ben Fino-Radin. The Museum of Modern Art. April 14, 2015.
The museum is working to preserve and digitize its 4,000 videotape collection of analog video art. It has created a digital vault which consists of:
  1. the packager: analyzes all digital collections materials as they arrive; records the results in an obsolescence-proof text AIP stored with the materials themselves, and generates a checksum.
  2. the warehouse: a digital storage RAID system maintained by their IT department. This type of disk-based storage becomes "an untenable expense" with very large amounts of data. They project 1.2PB. It would be irresponsibly expensive to continue using hard drive storage, as it was not quite intended for this scale of data.
  3. the indexer, which is not discussed at present.
There is the issue of how to ensure that our successors will understand what a given stream of bits is supposed to represent, and there is also the problem of authenticity.  These archival packages contain the digital content as well as the information needed in the future to understand what the materials are and to confirm their authenticity.

Friday, April 24, 2015

PREFORMA Starts Prototyping Phase

PREFORMA Starts Prototyping Phase. OPF Blog. 22 April 2015.
The PERFORMA prototyping phase has started with three groups that will work on:
  1. the compliance checker for the PDF/A standard for documents; 
  2. the TIFF standard for digital still images; and
  3. a set of open source standards for moving images
This phase will last until December 2016. It is important that libraries and archives understand what is in the digital objects they are preserving.  These tools will increase the knowledge about these formats.

Format Migrations at Harvard Library: An NDSR Project Update

Format Migrations at Harvard Library: An NDSR Project Update. Joey Heinen. The Signal. April 17, 2015.
Digital material is just as susceptible to obsolescence as analog formats. There are different strategies that can be implemented, and they are developing a format migration framework for migration projects at Harvard. The viability of this framework will be tested by migrating three obsolete formats within the Digital Repository Service: Kodak PhotoCD, SMIL playlists and RealAudio. A first step is determine the stakeholders and responsible parties, since a digital preservation project cannot begin without knowing the stakeholders.

A diagram of the Migration Workflow shows each step of the process from gathering documentation for initial analysis to ingest of the migrated content into the repository. A Migration Pathway diagram shows how content will be transformed by a migration. They hope that analyzing the technical and infrastructural challenges of each format and putting this into a template that can be adapted will help the digital preservation field.

Thursday, April 23, 2015

WebPreserver - Collect Web & Social Media as Legally Admissible Evidence

WebPreserver - Collect Web & Social Media as Legally Admissible Evidence. Website. Apr 08, 2015.
The software package, WebPreserver, uses a Chrome web browser plugin and web-based platform to collect authenticated snapshots of websites, blogs and social media accounts like Facebook, Twitter and Google+ easy. The screen captures, source-code and metadata are authenticated with a 256-bit digital signature and time stamp comply with the Federal Rules of Evidence and other regulatory requirements. Users can organize, tag, collaborate, search, print on demand or download the files as PDF's or Warc files.

Wednesday, April 22, 2015

Because digital preservation won’t just go away

Jisc Archivematica project update ...because digital preservation won’t just go away. Jenny Mitcham. Digital Archiving at the University of York blog. 17 April 2015.
Investigating Research Data Management; recognize that the tools used by digital archivists could have much to offer those who are charged with managing research data. The research will be based around the following questions:
  1. Why are we bothering to 'preserve' research data. What are the drivers here and what are the risks if we don't?
  2. What are the characteristics of research data?
  3. How might research data differ from other born digital data that institutions are archiving and preserving?
  4. What types of files are researchers producing?
  5. How would we incorporate the system into a wider technical infrastructure for research data management and what workflows would we put in place?
Previously they had conducted a survey that looked specifically at software packages used by researchers. They are investigating how existing digital preservation tools would handle these types of data and if these appear in Pronom.

University of Sheffield Selects Ex Libris Rosetta

University of Sheffield Selects Ex Libris Rosetta. Press Release. April 21, 2015.
The University has selected the Rosetta digital asset management and preservation solution to "ensure the sustainable preservation of the digitised objects created by the University Library’s Special Collections Department and the National Fairground Archive—a unique collection of videos, texts, audio files, and pictures about the culture and history of travelling fairs and entertainment—as well as the large collection of University’s born-digital material which includes scholarly research data, administrative records, and departmental publications."

“In line with the University’s strategy to establish an enduring digital archive, Rosetta will enable us to develop a sustainable digital preservation programme, underpinned by a full lifecycle infrastructure for the management and preservation of digital objects.”

Saturday, April 18, 2015

Digital Curation and Doctoral Research: Current Practice

Digital Curation and Doctoral Research: Current Practice. Daisy Abbott. International Journal of Digital Curation. 10 February 2015.[PDF]
More doctoral students are engaging in research data creation, processing, use, management, and preservation activities (digital duration) than ever before. Digital curation is an intrinsic part of the skills that students are expected to acquire.

Training in research skills and techniques is the key element in the development of a research student. The integration of digital curation into expected research skills is essential. Doctoral supervisors "should discuss and review research data management annually, addressing issues of the capture, management, integrity, confidentiality, security, selection, preservation and disposal, commercialization, costs, sharing and publication of research data and the production of descriptive metadata to aid discovery and re-use when relevant." Those supervisors may not necessarily have those skills themselves. And there is a gap in the literature about why and how to manage, curate, and preserve digital data as part of a PhD program.

While both doctoral students and supervisors can benefit from traditional resources on the topic, the majority of guidance on digital curation takes the form of online resources and training programs. In a survey,
  • over 50% of PhD holders consider long-term preservation to be extremely important. 
  • under 40% of students consider long-term preservation to be extremely important.
  • 90% of doctoral students and supervisors consider digital curation to be moderately to extremely important. 
  • Yet 74% of respondents stated that they had limited or no skills in digital curation and only 10% stated that they were “fairly skilled” or “expert”. 
And generally researchers were not are of the digital curation support services that are available. The relatively recent emphasis on digital curation in research nature of or the processes, present problems for supervisors. Developing the appropriate skills and knowledge to create, access, use, manage, store and preserve data should therefore be considered an important part of any researcher’s development. Efforts should be taken to
  • Ensure practical digital curation is understood
  • Encourage responsibility for digital curation activities in institutional support structures
  • Increase the discoverability and availability of digital curation support services

Tech talk in the archives: how can we redefine our processes & priorities in the digital age?

Tech talk in the archives: how can we redefine our processes and priorities  in  the  digital  age?  Erin  O’Meara. PASIG Presentation. March 12, 2015. [PDF]
Tech talk for archives usually revolves around workflow, data clean up and translation, infrastructure needs, and digital content pathways. Traditional archives have a physical focus, work is long term and ongoing, and priorities are established based on perceived use. For digital archives, the focus is on the digital objects, which tends to be more of a project based approach that needs dependency analysis before work is done. Prioritization is based on a grid of preservation needs, use and access, and an impact on the larger repository.

Get to know your complete digital holdings (servers as well as boxes of disks); the infrastructure, your capabilities for acquiring, processing and preserving digital holdings; staff and their training needs; and the gaps in all of this. Your processes need to change from "as-is" to "to-be".
  • Gain momentum and resources "building the ship while flying"
  • Geta business analysis and outsider perspective
  • Manage processes and know when to change
  • Continue building, and allow the archive to grow and mature
  • Allow time to review and reflect
  • Clarify and integrate roles between archive and tech staff

In determining your directions:
  • State the  larger goal then document  it
  • Break it down into executable chunks
  • Engage and educate stakeholders and leadership
  • Talk to colleagues outside your workplace
  • Curiosity and tinkering  encouraged
  • Give your team a break and recognition

Friday, April 17, 2015

Trustworthiness of Preservation Systems

Trustworthiness  of  Preservation Systems. David  Minor. PASIG Presentation. March 11, 2015. [PDF]
We  all  want  to  trust  systems, especially preservation  systems. Trust is an iterative process to verify and clarify. The principles of trust include:
  •  Institutional commitment to collections
  •  Infrastructure demands
  •  Technical system and staffing capabilities
  •  Sustainability (particularly funding, technology, collaboration)
  •  Identify and communicate risks to content, examining “what if” questions

There are three levels of auditing
  •  "Basic certification” is a simple self assessment
  •  "Extended certification" represents a plausibility checked assessment
  •  "Formal certification" is an audit driven by external experts

Major auditing frameworks include:
  •  Data Seal of Approval (Basic)
  •  nestor (Extended)
  •  TRAC/ISO 16363 (Formal)
  •  DRAMBORA (Range)

  1.  Identify organizational context
  2.  Document policy and regulatory framework
  3.  Identify activities, assets, and their owners
  4.  Identify risks
  5.  Assess risks
  6.  Manage risks
In the future, we need to know how these audit frameworks apply to distributed digital preservation environments, and how flexible the questions and the audit models are.

Thursday, April 16, 2015

Sony and Memnon announce partnership to enhance digital preservation capabilities

Sony and Memnon announce partnership to enhance digital preservation capabilities. Press release. April 13, 2015.
The partnership is offering their technology and experience in delivering large-scale digital preservation projects involving audio, video and film content. Some of the existing customers include Danish Radio, the British Library, Bibliothèque Nationale de France and Indiana University, BBC Worldwide and Sony Pictures. The need for large-scale digital preservation in organizations have accelerated due to the continuous physical deterioration of media carriers and the need for interoperability, lower storage costs and a stable long-term digital storage format. The research suggests that only 21% of broadcasters have completed digitisation of their tape libraries, and other the others, they average more than 100,000 legacy tape. “As a result, many content owners have assets that are literally depreciating, yet simultaneously have increased opportunities for reusing and monetising their digital content, once it is made readily accessible.”

“The time to tackle this challenge is undoubtedly now, but any successful digital preservation project is reliant on proven technological and operational expertise. We believe that large-scale digitisation is a distinct discipline that requires industrial processes and methodologies for high efficiency and consistent quality.

Wednesday, April 15, 2015

Tracking Digital Collections at the Library of Congress, from Donor to Repository

Tracking Digital Collections at the Library of Congress, from Donor to Repository. Mike Ashenfelder. The Signal, Library of Congress. April 13, 2015.
An interesting look at the processing of content by the Library of Congress specialists.
When a collection is first received the contents are reviewed and if digital media devices are found, they are transferred to the digital collections registrar, who then records that the materials were received, including the collection name, collection number, a registration number and any additional notes. The following tasks are performed:
  1. Physical inventory of the storage devices (and photograph of the medium)
  2. Write protecting, documenting, and transfer of the files using the Bagit tool
    1. a directory containing the file or files (data)
    2. a checksummed manifest of the files in the bag
    3. a “bagit.txt” file
  3. The content is cataloged, described, and inventoried. 
  4. Transfer of the files to the Library’s digital repository for long-term preservation.
If there are difficulties accessing the content, other tools can be used, such as the Forensic Recovery of Evidence Device (FRED), the Forensic Toolkit, or BitCurator. The final step is to shelve the original digital hardware and software for preservation.

Researchers visiting the Library of Congress can access copies of some of the digital collections but access depends on copyright and the conditions established by the collection donor. There are also technological challenges to serving up records.  Access is currently available only onsite. Also, the Library does not have the software or drives to read every file format. Not all researchers require a perfect rendering of the original file. A lot of researchers "are just interested in the information. They don’t care what the file format is. They want the information.”  For the Library, access and appraisal of digital collections is an ongoing issue.