Monday, January 28, 2019

Introduction to Digital Preservation: What is Digital Preservation?

Introduction to Digital Preservation: What is Digital Preservation? Bodleian Libraries, Oxford LibGuides. Aug 28, 2018.    
     Digital preservation at Bodleian Libraries is defined as: "The formal activity of ensuring access to digital information for as long as necessary. It requires polices, planning, resource allocation (funds, time, people) and appropriate technologies and actions to ensure accessibility, accurate rendering and authenticity of digital objects.  A “lifecycle management” approach to digital preservation is taken, where action is done at regular intervals and future activity is planned. This includes policies and recommendations for appraising and selecting digital information to preserve, acknowledging resources are finite."

There are two different kinds of digital preservation:
  1.  Bit-level Preservation: a "very basic level of preservation of the digital object as it was submitted (literally preserving the bits forming a digital object)." It is a beginning step to the more complete set of digital preservation practices and processes that ensure the survival and usability of digital materials over time.
  2. Logical Preservation: The part of preservation management that ensures the continued usability of content by ensuring the existence of a usable manifestation the digital object. Sometimes  referred to as format preservation or active preservation, it includes
  • Understanding what digital materials are in the repository.
  • Identifying threats to the materials and planning actions to be taken for at-risk digital materials
  • Putting things into action 
Defining other terms:
  • "Digital curation involves maintaining, preserving and adding value to digital files throughout their lifecycle—not just at the end of their active lives. This active management of digital files reduces threats to their long-term value and mitigates the risk of digital obsolescence. Digital curation includes digital preservation, but the term adds the curatorial aspects of: selection, appraisal and ongoing enhancement of the object for reuse."It is commonly used in the science and social sciences for research data and is often being replaced with research data management, especially when referring to active digital files.
  • Digital archiving is often used interchangeably with digital preservation in archives. It has two main definitions used by computing personnel and archivists and librarians respectively. Recognize both definitions of the term and be aware of the audiences that use this term differently.
    1. The process of storage, backup and ongoing maintenance as opposed to strategies for long-term digital preservation
    2. the long-term storage, preservation and access to information that is "born digital" or for which the digital version is considered to be the primary archive 
  • Digital Stewardship, more commonly used in the US, "combines both curation and preservation—the active life of a digital asset and its continual preservation afterwards for long-term use. But this school of thought splits digital curation & digital preservation into two separate categories and then uses digital stewardship as the umbrella term."
The Bodleian Librarians consider digital preservation to be a "holistic term that includes aspects of digital curation and stewardship". They work with creators to organise and manage their digital objects to preserve them, to follow best practice for creation and managing active files so that they will be easier to manage and provide access to in the long-term.


See also:

Wednesday, December 19, 2018

Nothing succeeds like success: An approach for evaluating digital preservation efficacy

Nothing succeeds like success: An approach for evaluating digital preservation efficacy. Stephen Abrams. Paper, iPres 2018. [PDF]
     Digital preservation encompasses the theory and practice ensuring purposeful future use of digital resources. But how can one tell whether it has been effective or not? The evaluation of the effectiveness of preservation actions has two dimensions: trustworthiness of managerial programs and systems; and successful use of managed resources.
  • Preservation should be viewed as facilitating meaningful communication across time and cultural distance.
  • The preservation field has not yet matured to a point of having established metrics for evaluating the success or failure of its outcomes
  • We should be asking what measures can be used to evaluate the success of the digital preservation efforts
  • A proper model for digital preservation should be viewed as human communication rather than data management and evaluating success through operational, not just descriptive evidence. 
  • The goal of that communication is to transfer an intangible but intentional unit of meaning from the producer to a consumer across temporal, technical, and cultural distance
  • Like any formal discipline, digital preservation should be viewed as a complex of actors, policies, technologies, and practices; its maturity is dependent on its capacity for reflective self-evaluation
  • There are two primary measures of preservation efficacy: trustworthiness of managerial systems and programs; and successful use of preserved resources.
  • Because of the open-ended time horizon of preservation commitments, preservation success should be understood as a provisional, rather than absolute value. One can’t make categorical assertions beyond the ever-forward-moving point of now, since the consequences of the future cannot be fully anticipated
  • A model of the digital preservation enterprise provides a way to analyze, explicate, and understand the domain. It can lead to new criteria and metrics for evaluating success. It also will form the basis for rational prioritization of strategic goals, allocation of programmatic resources, and transparent accountability to stakeholder communities.

Friday, December 14, 2018

In-House Digitization with the Lossless FFV1 Codec At the University of Notre Dame Archives. AMIA Poster

In-House Digitization with the Lossless FFV1 Codec At the University of Notre Dame Archives. Erik Dix and Angela Fritz, University of Notre Dame Archives. AMIA 2018. Poster. [pptx].
     An interesting poster at AMIA which shows their digitization workflow and processing steps from accessioning to preservation system.
WHY FFV1 as a codec for Digital Preservation Masters?
1. Lossless compression (no quality loss)
2. A Standard Definition FFV1 file is ca. 46 % of the size of the uncompressed file.
    A High Definition FFV1 file is ca. 57 % of the size of the uncompressed file.
3. FFV1 is part of the FFmpeg project and open source
4. It is safe for long term preservation.
5. Encoding into FFV1 can be done with low cost Windows PCs.
6. The video is captured in FFV1 in real time.
7. Standard definition FFV1 files can be played with the VLC media player

Digitization Workflow

Accessioning as Processing:
  • Archives conducts a preliminary inventory, assigns collection code,  creates CMS record 
  • AV materials transferred to AV Archivist for a preservation and digitization assessment
  • Descriptive and technical metadata gathered
  • Analog materials reorganized and stabilized for long-term storage.
Basic Metadata Creation:
  • AV Archivist creates item–level metadata
  • Descriptive and technical metadata promotes access and discoverability
  • Descriptive metadata added to finding aid and uploaded to the Archives, the IR, and ILS
Inspection & Prep of A-V materials:
  • Only requested AV items or at-risk items will be digitized 
  • Videotapes often require baking or splicing
VCRS without SDI output:
  • The digitization capture card uses SDI [Serial Digital Interface] connections. 
  • VHS, Betamax, and older professional formats, e.g. 1” type C, U-matic don’t have SDI outputs. 
  • A DPS-575 frame synchronizer is used to create a SDI signal from the S-video or output of these items
  • Basic color correction is done at this step if necessary. 
  • The SDI signal from the frame synchronizer is then split in two to feed a Windows PC for the creation of the FFV1 preservation file and to feed a Mac computer to create an Apple ProRes 422 mezzanine file.
VCRs with SDI output:
  • They have VCRs for the DV tape family from Mini DV up to DVCPro HD, DVCam, and HDV, as well as the Betacam tape family from Betcam to HDcam that can output an SDI or HD-SDI signal. 
  • The signal is also split in two to simultaneously create a FFV1 file on a Windows Pc and an Apple ProRes 422 file on a MAC.
Digital Preservation System:
  • Use an LTO tape library for the storage of our digitized files. 
  • Currently, the Archives is evaluating digital preservation systems for implementation. 
  • Archives capabilities will be expanded to provide digital preservation micro-services to ensure continued access to its digital collections.


Thursday, December 13, 2018

Why Is the Digital Preservation Network Disbanding? Lessons from organizational challenges

Why Is the Digital Preservation Network Disbanding? Roger C. Schonfeld. The Scholarly Kitchen; Society for Scholarly Publishing. Dec 13, 2018.
     "The long term stewardship of digital objects and collections through digital preservation is an essential imperative for scholarship and society. Yet its value is intangible and its rewards are deferred. It falls on organizations to invest in preservation, often less out of a sense of anticipated exclusive returns and more out of a sense of contributing to a community mission." It is essential that we discuss the lessons we can learn from organizational challenges.

DPN was a commitment to replicate the data of research and scholarship across diverse environments and to enable existing preservation capacity. It offered an elegant technical solution but the product offering was never as clear as it could have been, and ultimately could not be sustained. Most DPN members did not use the network services and membership declined. Some patterns emerged: 
  • Not every storage need requires a preservation solution, and the members were "in some cases, unsuccessful in distinguishing the added value of a preservation solution from cloud storage."
  • Many library systems were not originally prepared to support DPN’s ingest workflow. For a number of members, the content to be preserved was spread across servers and systems, often with limited curatorial control. 
  • The product definition took too long to emerge and the value proposition was not uniformly understood.
  • DPN’s pricing model did not generate the revenue that DPN’s model anticipated. 
  • Some libraries signed up more out of courtesy or community citizenship than commitment.
  • Membership models are ill-suited to product organizations and marketplace competition.  
There are broader implications in the disbandment of DPN. The article states that  DPN will not be the last closure, merger, or other reorganization. "It seems clear that we are in a period of instability for collaborative library community efforts and more major changes are surely on the horizon."

Wednesday, December 12, 2018

Preservation of AV Materials in Manuscript Collections. Training for AV format identification and risk assessment. Actions to take


Preservation of AV Materials in Manuscript Collections; Internal Training.  Ben Harry. Brigham Young University. November 2018.
     The presentation is not yet available on the internet. Some notes from the training:
“There is now consensus among audiovisual archives internationally that we will not be able to support large-scale digitisation of magnetic media in the very near future. Tape that is not digitised by 2025 will in most cases be lost.”  -NFSA.gov.au, Oct. 2018

The problem with AV is Fragility:
  • Playback equipment is disappearing
  • Knowledgeable experts are disappearing
  • Materials breaking down
  • Untrained handling easily destroys materials
The solution to the fragility is to address materials in a timely manner:
  • Priority and Speed and Efficiency
  • Train transfer operators
  • Untrained handling easily destroys materials
A Challenge of AV is Neglect:
  • Unable to describe AV Content adequately in finding aids or catalogs. 
  • Requires certain level of specific knowledge of formats and physical carriers.
  • Requires machine to read information that may not be available
  • Time-consuming process for little reward
  • Expensive, unstructured, uncoordinated
To overcome the challenge:
  • Digitize material for description in basic processing
  • Time-consuming process for little reward
  • MUST be a lean process to minimize the effect upon processing
Audio-video preservation requires a certain level of specific knowledge. Staff must be trained to recognize and report AV Formats. Also, it is important to have risk assessment guidelines to help make informed decisions. Coordinate efforts and resources to reduce confusion, prioritize and set goals, unify our proposals for equipment and man power.


Actions to take:
  • Prioritize Formats for Migration / Reformatting
  • Maintain Transparent Records on Preservation and Access
  • Link Preservation and Access (one does not happen without the other)
  • Provide Curators with AV Assessment tools
  • Organize a Queue System to keep things equitable (what about 12 items per month, per curator? Adjust as Necessary)
  • Create Digital File Naming guidelines
  • Establish Access and Preservation format standards for AV materials:
 For Access and Preservation, the following standards will be used:

Audio Preservation
  • Preservation Format:  PCM / wav 96 kHZ sampling   24-bit depth. 1 GB/Hour
  • Access Copy: mp3.  Music: 256 kbps. Voice: 192 Kbps.

Video Preservation: Standard Def
  • Preservation Format: ffv1 / mkv 720 x 486. 33 GB/Hour  
  • Access Copy: H.264 / mp4

Video Preservation: Hi Def
  • Preservation Format: ffv1 / mkv Native: 1080i / 1080p. 100 GB/Hour?  
  • Access Copy: H.264 / mp4

Film Preservation
  • Preservation Format: RGB ffv1 / mkv 1080i scan (MPS capability ceiling). 100 GB/Hour?  
  • Access Copy: H.264 / mp4

Archive and delivery methods:
  • Preservation: Rosetta
  • Access: various options are available. 


Monday, December 10, 2018

A Preservationist’s Guide to the DMCA Exemption for Software Preservation

A Preservationist’s Guide to the DMCA Exemption for Software Preservation. Kee Young Lee and Kendra Albert. Software Preservation Network and the Cyberlaw Clinic @ the Berkman Klein Center. December 10, 2018.  [PDF]
     "The Library of Congress recently adopted several exemptions to the Digital Millennium Copyright Act (DMCA) provision prohibiting circumvention of technological measures that control access to copyrighted works. The exemptions went into effect on October 28, 2018 and last until October 28th, 2021. This guide is intended to help preservationists determine whether their activities fall under the new exemption."  The Software Preservation Network has obtained temporary exemptions which remove the legal liability for circumventing technological protection measures for preserving the software or resulting files, provided that certain conditions are met. These exemptions do not remove legal liability for copyright infringement of the underlying software itself.

The guide provides excellent information on the issues and the exemptions. The exemptions are  generally directed to preservation activities  by libraries, archives, and museums, but there are five criteria required in order to claim the exemption. The library, archive, or museum must:
  1. Make its collections open to the public or routinely available to unaffiliated outside researchers.
  2. Ensure that its collections are composed of lawfully acquired or licensed materials.
  3. Implement reasonable digital security measures for preservation activities.
  4. Have a public service mission.
  5. Have trained staff or volunteers that provide services normally provided by libraries, archives, or museums
In addition, there are requirements for using the preserved software:
  • The computer program must have been lawfully acquired.
  • The software must no longer be reasonably available in the commercial marketplace.
  • The sole purpose of the circumvention activity must be for lawful preservation of the computer program or digital materials that are dependent on a computer program.
  • The computer programs cannot be used for commercial advantage.
  • Use of the exemptions can only be for non-infringing uses of the software.
  • Copies of the computer programs cannot be made available outside of the physical premises of the library, archive, or museum.
These exemptions are only for three years, so evidence of software preservation activities will help to renew the exemptions.

The Guide also includes a DMCA Exemption for Software Preservation Checklist.


Saturday, December 08, 2018

Make The Case for Digital Preservation in Your Organisation

Make The Case for Digital Preservation in Your OrganisationDigital Preservation Coalition. 2018.
     This page provides some useful guides, examples and other resources that can help with building a business case and more broadly making the case for digital preservation within your organisation.
When preparing a business case or briefing, these resources will provide useful an array of helpful information to assist in the construction of a business case, from planning and preparation all the way through to polishing and communicating the finished case for digital preservation in your organisation.
Communication is critical for understanding your stakeholders and creating a foundation for establishing digital preservation within your organisation. These resources provide guidance on engaging with particular audiences

Thursday, December 06, 2018

3 Principles for Selecting a Digital Preservation Solution


3 Principles for Selecting a Digital Preservation Solution. Daniel Greenberg. Ex Libris. November 29, 2018.
   This post was in honor of World Digital Preservation Day and lists some important elements to remember when reviewing digital preservation systems:

1. Interoperability different types of data and integrating with other systems
  • Support common protocols for harvesting, publishing and searching, e.g. Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and SRU (Search/Retrieve via URL).
  • Ingest content with multiple methods and structures; e.g., BagIt, METS, CSV, and XML.
  • Providing well-documented external APIs
  • Integrating with other information systems
2. Follow Industry Standards, particularly standard metadata schemas and communication protocols. Benefits of doing this:
  • Interoperability between new and existing services and applications.
  • Compliance with policies and regulations.
  • Introduction of innovative features.
  • Enable a robust exit strategy, in case the vendor goes out of business.
3. Scalability:
  • Architectural scalability: Start small and grow big. Ability to expand the throughput over time without compromising performance.
  • Operational scalability: Ability to customize the system to the institutions’ needs.
  • Informational scalability: Keep up with latest strategies, practices, tools and policies by an active user community.
  • Organizational scalability: Administer multiple institutions with a single installation; support a flexible consortium model.

Wednesday, December 05, 2018

Digital Preservation Network (DPN) Sunset

Community Announcement - DPN SunsetDigital Preservation Network. December 4, 2018.
     The Digital Preservation Network’s Board of Trustees of DPN are ending DPN.  The DPN Board "determined that it is not feasible to design and implement changes that would ensure sustainability." 
"The landscape of digital preservation services has changed considerably in the past six years, as have the community’s preservation needs. Our highest priority is to affect an orderly sunset for the organization’s operations and for the disposition of its deposits."

The ending of a community-based organization to provide long-term digital preservation storage  highlights the numerous challenges with maintaining digital resources long-term.


Thursday, November 29, 2018

The File Discovery Tool - A simple tool to gather file and filepath information, and ingest into our Rosetta Digital Archive


The File Discovery Tool. Chris Erickson. Brigham Young University. November 29, 2018.
     We have created a File Discovery Tool that analyzes directories of objects and prepares a spreadsheet of all the files it discovers for preservation/ingest. This file allows the curators to discover and work with the materials, select those that need to be preserved, and then add collection and other metadata information. The tool fits our workflow, but the source code may be useful for others trying to accomplish a similar task.

A sample command to run the tool:
>> java -jar FileDiscovery.jar [path name of files to check] [output path name for saving the report]
>> java -jar C:\FileDiscovery\FileDiscovery.jar "R:\test\objects"  C:\output\files
 The commands and syntax are outlined in a brief document: File Discovery Outline
  
The spreadsheet that is created has the following column headings:
 FILENAME, ITEM ID, FILEPATH, BYTESIZE, SIZE, COLLECTION, IE_LEVEL, DATE_CREATED, DATE_MODIFIED, TITLE, CREATOR, DESCRIPTION, RIGHTS_POLICY

Metadata can be added as needed before ingesting the content into Rosetta.

The files and the metadata can then be submitted to Rosetta using the csv option in the Rosetta File Harvester tool by adding in a second row of Dublin Core names in order to map the column. A standard template has been created to help in preparing the file for ingest: Rosetta File Ingest template (PDF)
The source is available at https://bitbucket.org/byuhbll/filediscovery


The File Harvester tool - Our tool for ingesting content to our Rosetta Digital Archive


The File Harvester tool. Chris Erickson. Brigham Young University. November 29, 2018. 
     We have created a harvester tool for harvesting, processing, and submitting content to Rosetta. Our Library IT department has made this open source. The tool fits our workflow, but the source code may be useful for others trying to accomplish a similar task.

The File Harvester tool gathers content from several different sources:
  • Our hosted CONTENTdm (cdm)
  • Open Journal System (ojs)
  • Internet Archive (ia)
  •  Unstructured files in a folder with metadata in a spreadsheet (csv)
The tool creates SIPs by adding objects and metadata from the specified source, by creating a Rosetta mets xml file and a Dublin core xml file; and by putting it in the structure for our Rosetta system. The objects can either be on the hosted system or in a source folder. The harvest tool can also submit the content to Rosetta for ingest.

The structure is:
  1. Folder: collection-itemid and it contains the dc.xml and subfolder content 
  2. Sub-Folder: content and it contains the mets.xml and the folder streams 
  3. Sub-Folder: streams which contains the file objects
The commands and syntax are outlined in a brief document:
RosettaFile Harvester outline

The source is available at: https://bitbucket.org/byuhbll/rosetta-tools