Friday, September 15, 2017

Preservation with PDF/A

Preservation with PDF/A (2nd Edition). Betsy A Fanning. DPC Technology Watch Report 17-01. July 2017.
     This report is an updated edition of the original Technology Watch Report 08-02, Preserving the Data Explosion: Using PDF (Fanning,2008). It looks at PDF/Archive as digital document file format for long-term preservation. The PDF/A versions of the PDF format have been developed as a family of open ISO Standards to address preservation of PDF files by removing features that pose preservation risks. It is important for preservation purposes to know how closely a file conforms to the  requirements defined in the standard. There are preservation risks that may exist in the standard PDF file format:
  • any file type can be embedded;
  • the primary document can be conformant as a static document, but the embedded files may not be static;
  • embedded files may be infected by computer viruses;
  • embedded files may have extended metadata requirements, may introduce unexpected dependencies or be subject to format obsolescence;
  • embedded files may complicate matters relating to information security, data protection or the management of intellectual property rights.
By restricting some risk features and thus reducing preservation risks, the PDF/A format seeks to maximize:
  • device independence;
  • self-containment;
  • self-documentation.
Some reasons why an organization might use PDF/A to preserve their digital documents, include:
  • its standardized format for storing digital documents for long periods of time;
  • it allows for digitally signed documents using the very latest digital signature software;
  • it reliably displays special characters for mathematics and languages since all are embedded within the file;
  • it displays correctly on any device as the author intended, including the reading order;
  • platform independence;
  • provision of fully searchable documents through Optical Character Recognition.
History and Features of PDF and PDF/A. The Standard was drafted in multiple in order to make it easier to implement the Standard. "Unfortunately, the committee’s philosophy of multiple parts resulted in confusion in the market place, making it more difficult for users to select the optimum file format." Users  may need to do a file format assessment based on their requirements that can help them decide which PDF/A Standard to implement.

Metadata helps effectively manage a file throughout its life cycle, as well assist in document discovery searches. "Establishing a long-term digital document preservation system requires careful consideration of the metadata that will be needed to locate and render documents years from now." Collecting metadata for the PDF/A documents in optional in the standard, except for the identifier, which is generated when the PDF/A file is created. Preservation metadata should:
  • be appropriate to the materials;
  • support interoperability;
  • use standardized controlled vocabulary;
  • include clear statements on the conditions and terms of use;
  • be authoritative and verifiable;
  • support the long-term management of the document.
Just because a file purports to be a PDF/A does not necessarily mean that it is. Format validation of a file can increase confidence a viewer will be able to render the file correctly.  A number of PDF/A validators are available.The development work on the PDF Standards is a continuing effort. There are additional preservation challenges in the format that are in the process of being addressed.

The report lists some recommendations, which are directed at groups that use the standard. They include:
  • For those evaluating PDF/A as a digital preservation solution:
    • Before adopting PDF/A as a preservation solution it is "essential to understand the organizational requirements and how PDF/A will support" the organization needs.
    • PDF/A is not a preservation solution on its own a part of the wider preservation strategy that must be consistent with other components of the preservation infrastructure, such as backups, integrity checks and documentation.
    • Different versions of PDF/A have different purposes, with different capabilities as well as different preservation risks. These should be understood and decisions should be documented and explained.
    • Different vendors offer different tools to manage PDF/A that should be compared against your requirements..
  • For organizations collecting and preserving digital data:
  • While it may not be possible to control or restrict how documents are produced, it may be useful to give document creators guidance on what is desired.
  • Embed PDF/A validation tools into preservation workflows and record the results to help manage the digital preservation risks associated with PDF/A files received.

Wednesday, September 13, 2017

Self-preservation: The Gibraltar National Archives uses cloud to safeguard its history

Self-preservation: The Gibraltar National Archives uses cloud to safeguard its history. Caroline Donnelly. ComputerWeekly. 13 September 2017.
     Many enterprises are familiar with the concept of retaining corporate data as part of their regulatory and compliance obligations. But some fail to understand that the data must be kept accessible. "While regulatory compliance is the key reason why many enterprises embark on this process in the corporate world, for the Gibraltar National Archives (GNA), digital preservation is an essential part of ensuring the annals of its cultural heritage and democratic history are safeguarded forever." After a long process of digitizing historical content, they realized that digitising content is not the same as preserving it. "The risk was we could have spent all this time and money doing digitisation only to lose [this information] a few years down the line because it is not preserved correctly.” Digital preservation is about:
  • actively managing the file formats
  • ensuring they remain readable in future
  • being proactive and managing the content
Just as it is important to be able to prove the provenance of physical records, the fixity of the digital documents needs to be maintained.  “People often ask me when our digital preservation project will be finished. I tell them never, because every day we are collecting records. Every day we are archiving unique material from newspapers to government records all for generations to come.”

Saturday, August 19, 2017

IBM and Sony cram up to 330 terabytes into tiny tape cartridge

IBM and Sony cram up to 330 terabytes into tiny tape cartridge. Sebastion Anthony. Ars Technica UK. August 2, 2017.
     IBM and Sony have developed a new magnetic tape system capable of storing 201 gigabits of data per square inch, or approximately 330 terabytes in a single palm-sized cartridge. To achieve this density, Sony developed a new type of tape that has a higher density of magnetic recording sites, and IBM Research developed new heads and signal processing technology to process the data from the "nanometre-long patches of magnetism". The new cartridges and tape drives, "when eventually commercialised, will be significantly more expensive because of the tape's complex manufacturing process."

Friday, August 18, 2017

Evaluating Your DPN Metadata Approach

Evaluating Your DPN Metadata Approach.  DPN Preservation Metadata Standards Working Group. July 27, 2017. [PDF, 6 pp.]
     This brief guide can help determine a clear metadata approach to recovering data "in the far future among unpredictable circumstances".  The document can help users create a sound approach to preserving your institution’s data and make decisions that fit with their own institutional needs.

The first section is:
What information is needed to understand and contextualize an object? It examines both descriptive and structural metadata.

Descriptive Metadata: for the purpose of identification and discovery of an object. Dublin
Core, MODS and VRAcore are common standards used for descriptive metadata.  

Structural Metadata: describes relationships between objects, such as pages in a book. The METS Structural Map can express  hierarchical relationships or parent/child relationships. The PREMIS "relationship" element can express version relationships.

The document also looks at how to:
  • understand and contextualize a collection; 
  • connect/relate objects to a collection; 
  • connect/relate versions to each other; 
  • connect metadata records to associated objects and collections;
  • ensuring the authenticity of an object;
  • ensuring the essential characteristics of the original are maintained in a data migration

Thursday, August 17, 2017

DPN: Metadata Considerations for Deposits

Metadata Considerations for Deposits. DPN. August 2017.
     The Digital Preservation Network working groups have provided an overview of the types of metadata to consider while preparing deposits for DPN. Several areas are addressed:
  1. DPN-specific metadata, especially DPN-specific metadata, DPN’s BagIt specification, Tag Directories and Bag Structure.
  2. DuraCloud-specific metadata, while they do not restrict metadata they "indicate that local policies should be used to define metadata approaches".  Each snapshot contains four DuraCloud-created files: checksums (md5, sha265), a content properties file, and a collection-snapshot file  
  3. Core descriptive metadata records. The DPN Preservation Metadata Standards Working Group examined minimal metadata records from a variety of member institutions to find common metadata schemas. This resulted in  a “core record,” or the "minimum level of information needed in order to understand digital assets at a later date," shown in a clear chart.
  4. Significant properties of content. "In order for digital files to be usable and accessible in the long-term, it is important to recognize the importance of significant properties and to ensure that the properties of your digital materials are being documented in some form." They list content types, with examples of common significant properties. 

Tuesday, August 08, 2017

Universal Electronic Records Management Requirements

Universal Electronic Records Management Requirements. Courtney Anderson. National Archives Records Express. August 4, 2017.
     The National Archives has released the Universal Electronic Records Management Requirements as part of the Federal Electronic Records Modernization Initiative (FERMI). Universal ERM Requirements identify high level business needs for managing electronic records. The program requirements are derived from existing NARA regulations, policy, and guidance and are a starting point for agencies to use when developing system requirements. "Records management staff should work with acquisitions and IT personnel to tailor any final system requirements". The document contains an abstract, a glossary, and lists of lifecycle requirements and transfer format requirements.
There are six sections based on the lifecycle of electronic records management:

1.    Capture
2.    Maintenance and Use
3.    Disposal
4.    Transfer
5.    Metadata
6.    Reporting

The requirements are either “program” requirements, relating to the design and implementation of policies and procedures, or “system” requirements, providing technical guidance for creating or acquiring ERM tools, which also indicate “Must Have” or “Should Have”. NARA will be supporting these requirements going forward and will be updating them to stay current with changes in technology, regulations and guidance products.

Saturday, August 05, 2017

Elsevier Acquires bepress

Elsevier Acquires bepress. Roger C. Schonfeld.  Society for Scholarly Publishing; The Scholarly Kitchen. Aug 2, 2017.
     Elsevier announces its acquisition of bepress. In a move entirely consistent with its strategy to pivot beyond content licensing to preprints, analytics, workflow, and decision-support, Elsevier is probably the foremost single player in the institutional repository area. There is some concern this acquisition will allow them to co-opt open access. The bepress product, Digital Commons, has more than 500 participating institutions, predominantly US colleges and universities.

bepress Joins Elsevier, with Exciting Potential for Growth. Press release. bepress. Aug 2, 2017.
bepress has joined Elsevier, the largest content provider in the world. The management is "confident that this is the right choice for bepress and for our community. Both parties are committed to sustaining the elements that make bepress bepress, and supporting your open access initiatives."

Thursday, August 03, 2017

Library Preservation Workflows: Importing, Exporting, and Managing Content

Library Preservation Workflows: Importing, Exporting, and Managing Content. Chris Erickson. June 12, 2017. [PDF slides]
     This is my presentation at the Eight Annual Rosetta's Advisory Group meeting held in June at the wonderful University of Sheffield. This is my favorite conference because of the attendees, the topics discussed, the interaction with the Ex Libris employees who attend, and the many things I learn about digital preservation and Rosetta, in the Advisory meeting and in the accompanying Rosetta Users Group. I hate to see this conference end.

The short presentation is a view of some of the ongoing changes and refinements we have made to our digital preservation workflow in the past year. We have worked to streamline our processes, both because of the increased volume of content we ingest into Rosetta, and also the desire to minimize the file movement and copying during the processing. In addition, we have used our preservation repository to recover documents in our access systems that became unavailable.

During the year we have updated our digital preservation policies to help determine our preservation selection workflow. They include:
The changes have helped with a smoother transition from selection and acquisition, processing and SIP creation and submission to Rosetta, and the preservation disposition.

Wednesday, August 02, 2017

Digital Preservation Workflow Curriculum

Digital Preservation Workflow Curriculum. Mary Molinaro. DPN, AVPreserve. August 2, 2017.
     DPN and AVPreserve have developed a "digital preservation workflow curriculum to share with DPN members and others in the digital preservation community". This workshop curriculum, released with a Creative Commons license, will provide participants with skills and knowledge to implement and manage a digital preservation program within their organization. They ask that the terms of the CC-BY-SA license be observed.

The workshop modules show the requirements of a digital preservation ecosystem from the viewpoints of governance / program management, as well as asset management. This is not an introduction to digital preservation or the OAIS model; instead it looks at the 'why' and 'how' questions of "making digital preservation an underlying, operational function of an organization". The curriculum, which is available here in a zip file, consists of:

Friday, July 21, 2017

ePADD 4.0 Released

ePADD 4.0 Final. July 21, 2017.
    This is the latest release of ePADD, a software tool "developed by Stanford University's Special Collections & University Archives that supports archival processes around the appraisal, ingest, processing, discovery, and delivery of email archives."

The software is comprised of four modules:
  1. Appraisal: Allows users to gather and review email archives 
  2. Processing: Tools to arrange and describe email archives.
  3. Discovery: Tools to share a view of email archives with users through web discovery 
  4. Delivery: Enables repositories to provide access within a reading room environment.
System Requirements:
  • OS: Windows 7 SP1 / 10, Mac OS X 10.10 / 10.11 
  • Memory: 8 GB RAM (4 GB RAM allocated to the application by default) 
  • Browser: Chrome 50/51, Firefox 47/48 
  • Windows installations: Java Runtime Environment 64-bit, 8u101 or later required
ePADD Installation and User Guide
ePADD Github website

Saturday, July 15, 2017

Email preservation: How hard can it be?

Email preservation: How hard can it be? Edith Halvarsson. Digital Preservation at Oxford and Cambridge. 7 July, 2017.
      The post summarises highlights of the Digital Preservation Coalition’s briefing on email preservation. What is email? It is "an object, several things and a verb”, a heavily linked and complex object, like the web. "Retention decisions must be made, not only about text content but also about email attachments and external web links. In addition, supporting features (such as instant messaging and calendars) are increasingly integrated into email services and potential candidates for capture."
Email is also a cultural and social practice; capturing relationships and structures of communication is an additional layer to preserve. 

What is being done, or can be done?  Migration is the most common approach to email preservation. EML and Mbox, which is a family of formats, are the most common formats migrated to. They have  different approaches to storing content. Others choose to unpack content which provides a way to display emails and normalise content within them. The emulation approach provides access to content within the original operating environment. Also, ePADD, an open source tool, provides functions for processing and appraisal of Mbox files, but ha other features

There are still questions and issues still to explore, particularly regarding web links. "Email archives may be more valuable to historians as they acquire critical mass".  Some thing that institutions can do are:
  • Participate with the  Email Preservation Task Force
  • Share your workflows to the Email Preservation Task Force and the community
  • Run trial migrations between different email formats such as PST, Mbox and EML and blog about your finding
  • Support open source tools such as ePADD and make them sustainable! 

Friday, July 14, 2017

Six Priority Digital Preservation Demands

Six Priority Digital Preservation Demands. Somaya Langley. Digital Preservation at Oxford and Cambridge. 13 July, 2017.
     Post discusses the gap between what activities need to be done as part of a digital stewardship end-to-end workflow and the maturity level of digital preservation systems. It presents a list of "my six top ‘digital preservation demands’ (aka user requirements)":
  • Integration with  other systems: A digital preservation ‘system’ is only one piece of a much larger puzzle. In the ‘digital ecosystem’  end-to-end digital stewardship workflows are of primary importance. Metadata and/or files should flow from one system to another.  
  • Standards-based:  Libraries rely on standards. "If we don’t use (or fully implement) existing standards, this means we risk mangling data, context or meaning; potentially losing or not capturing parts of the data; or just wasting a whole lot of time".
  • Error Handling: With more work and few people, we "have to be smart about how we work. This requires prioritisation." The preservation workflows need smarter systems to aid the processes, especially understanding and resolving errors from the many third-party tools. 
  • Reporting: The types of reports needed include: 
    • High-level reporting – annual reports, monthly reports, reports to managers, projections, costings etc.)
    • Collection and preservation management reporting 
    • Reporting for preservation planning purposes, based on preservation plans
  • Provenance: Support for identifying where a file has come from. This is often handled by metadata and documenting changes as Provenance Notes. The essential metadata (administrative, preservation, structural, technical) needs to be captured and retained.
  • Managing Access Rights:  We must ensure we can provide access to the content to support both the content and users in a variety of ways, particularly the new ways they want to use the content. 
"It’s imperative to keep in mind the whole purpose of preserving digital materials is to be able to access them...." Addressing these six concerns may not be easy, but we need to "make iterative improvements, one step at a time."

Thursday, July 13, 2017

Integrating Research Data management and digital preservation systems at the University of Sheffield

Integrating Research Data management and digital preservation systems at the University of Sheffield. Chris Loftus. Digital Preservation Coalition. 31 May 2017.
     The University Library is leading the active management and curation of research data within the institution. This includes implementing a research data catalogue and repository powered by Figshare. They safeguard library collections and University assets of the University using Rosetta, a digital preservation platform from Ex Libris. "We are now working with figshare and Ex Libris to integrate both services to provide seamless preservation of published research data across the research lifecycle." Which will

  • provide a complete lifecycle data management service for the university’s research community; 
  • identify, understand and act on risks associated with preserving data sets; 
  • better inform advice and guidance around use of data formats for sharing and preservation purposes; and 
  • encourage researchers to share their data more openly with others by guaranteeing the long term sustainability of that data.
Initial integration work uses the OAI-PMH protocol and METS packages to transfer content efficiently. Rosetta will be the dark archive, with figshare the interface for researchers and external users.

File formats issues: Research data is often in niche and proprietary formats. Of the material currently deposited in the archive, only a small percentage was recognised by a Droid survey. They will need to invest some time to identify and plan for these formats, and hopefully the work will be of use to the wider digital preservation community.

Metadata: They plan to improve the quality and volume of metadata accompanying research data. Material from researchers often lacks needed metadata, which can cause future data access issues. They are investigating solutions.

Thursday, June 08, 2017

Videotapes Are Becoming Unwatchable As Archivists Work To Save Them

Videotapes Are Becoming Unwatchable As Archivists Work To Save Them. NPR: All Things Considered. Scott Greenstone. June 3, 2017.
     Research suggests that magnetic tapes, like video tapes, aren't going to live beyond 15 to 20 years, sometimes called the "magnetic media crisis." Magnetic information on tapes will slowly fade, and when it diminishes too much, the information on the tape will be lost. There are groups trying to migrate the tapes before the content is unrecoverable. Part of this process is to identify what is on the tapes and which tapes need to be preserved long term.

Friday, June 02, 2017

Ex Libris joins the Open Preservation Foundation

Ex Libris joins the Open Preservation Foundation. Becky McGuinness. Press Release. Open Preservation Foundation. June 1, 2017.
     The Open Preservation Foundation announced that Ex Libris is its newest charter member. "Ex Libris’ Rosetta is an end-to-end digital asset management and preservation solution for libraries, archives, museums and other institutions, enabling institutions to safely and securely collect, manage, publish, deliver, and ensure longevity for digital information of many different types. With Rosetta’s unique content preservation planning module and its Format Library knowledge base, shared by the entire Rosetta community, institutions can identify format risks, evaluate mitigation alternatives, and select the best preservation actions."  "Rosetta reflects Ex Libris involvement in industry standards and commitment to extensibility and open architecture."  "Rosetta itself is based on an open architecture that allows customers to easily use Rosetta with external tools and plugins such as JHOVE and other open-source software. By supporting OPF, we can further improve open-source tools for the benefit of all."

Saturday, May 13, 2017

Design Requirements for Better Open Source Tools

OSS4Pres 2.0: Design Requirements for Better Open Source Tools. Heidi Elaine Kelly. bloggERS! April 25, 2017.
     Free and Open Source Software need to "integrate easily with digital preservation institutional systems and processes.” The FOSS Development Requirements Group created a design guide for to ensure easier adoption of open-source tools and their integration with other software and tools.

Minimum Necessary Requirements for FOSS Digital Preservation Tool Development. The premise is that "digital preservation is an operating system-agnostic field."

  • Provide publicly accessible documentation and an issue tracker
  • Have a documented process so people can contribute to development, report bugs, and suggest new documentation
  • Every tool should do the smallest possible task really well; if you are developing an end-to-end system, develop it in a modular way in keeping with this principle
  • Follow established standards and practices for development and use of the tool
  • Keep documentation up-to-date and versioned
  • Follow test-driven development philosophy
  • Don’t develop a tool without use cases, and stakeholders willing to validate those use cases
  • Use an open and permissive software license to allow for integrations and broader use
  • Have a mailing list or other means for community interaction
  • Establish community guidelines
  • Provide a well-documented mechanism for integration with other tools/systems
  • Provide functionality of tool as a library, separate UI from the actual functions
  • Package tool in an easy-to-use way, that supports any dependencies
  • Provide examples of functionality for potential users
  • Consider the long-term sustainability of the tool
  • Consider a way for internationalization of the tool  

Tuesday, May 09, 2017

Using Open-Source Tools to Fulfill Digital Preservation Requirements

OSS4EVA: Using Open-Source Tools to Fulfill Digital Preservation Requirements. Marty Gengenbach, et al. Code4Lib. 2016-10-25.
     Open-source software has played an increasingly prominent role in digital preservation, such as LOCKSS, DSpace, and DROID. The number and variety of such tools has increased, there was a growing need among preservationists to assess how and when to adopt particular tools so that they could better support their institutions’ specific requirements and workflows.  Open-source projects allows the user community to contribute by developing and documenting tools.

There are some challenges with open source programming.
  • Perceptions of instability:  One challenge is the perception that these tools are "inherently unstable and therefore present a risk". 
  • Resources and funding: Administrators often are reluctant to commit resources to an open source project. Funding problems can threaten the long-term sustainability of open source tools.
  • System updates: Open source tools require regular patches, updates, and upkeep. Without this, the tool would be outdated, and open to security holes. "The choice to maintain an unsupported version of a particular open-source tool simply because it meets (or has been customized to meet) an organization’s needs is problematic. For what an institution may stand to gain from this tool in terms of functionality and local integration, it may stand to lose in terms of the stability of a mainstream code release, the risk to information security, and the likelihood that the tool in question will become increasingly less functional and reliable as it ages".
  • Integration. Integrating open-source tools into institutional workflows can be a challenge, taking into account software dependencies, system requirements, and local configuration to put the tools into a production environment. This can require a considerable time and resources. 
One of the possible benefits is that institutions can customize open source tools for use within a specific context, but that comes with its own hurdles, such as reducing the ability to draw on the user community.  The digital preservation open source landscape has evolved from a scattered set of standalone tools designed to complex software environments. "Nevertheless, these tools still are not watertight." There are real concerns about open-source tools that can pose serious risks to collections.

Thursday, May 04, 2017

Personal Digital Archiving Guide Part 1: Preservation Planning

Personal Digital Archiving Guide Part 1: Preservation Planning. Scott Witmer. Bits and Pieces. April 26, 2017.
      Digital materials require active intervention to be usable over time, since technology is constantly changing. "The more we use these files or transfer them from one technology to another, the greater the potential for data corruption. Digital files also run the risk of deletion due to accident or disaster. Having a preservation plan can mitigate the risks of obsolescence, erasure, or other forms of data loss." This post lists some simple suggestions for organizing digital files for long-term preservation, although everyone will have their own methods. Some digital preservation is better than none.

Preservation Steps for Personal Digital Collections:
  • Identify digital materials to save. Make a list or inventory
  • Gather the files you want to save into one place
  • Select what you really want to safe; define the scope of your digital collection
  • Organize your digital files and add descriptive information to the file name, or other important information
  • Give your files short, meaningful names, preferable when creating the files
  • Use a meaningful directory structure to organize the files 
  • Back-up the files and have multiple copies:  
    • 3 copies 
    • 2 of the copies on 2 different types of storage media 
    • 1 copy in a different location  
Digital preservation is an ongoing process, so files and storage technology should be checked periodically.

Wednesday, April 26, 2017

Sustaining The Value: The British Library Digital Preservation Strategy 2017-2020

Sustaining The Value: The British Library Digital Preservation Strategy 2017-2020. British Library. January 2017.
     The strategy document is intended to guide the Library’s digital preservation activities for the next few years. It identifies strategic priorities as well as the the roles and responsibilities of those who will deliver the strategy.  The digital preservation challenges include technological obsolescence, media integrity, bit rot, digital rights management, metadata and others. Also important are
  • Proactive Lifecycle management
  • Integrity & validation
  • Fragility of storage media
"Digital Preservation is the combination of actions and interventions required throughout the digital content lifecycle to ensure continued and reliable access to authentic digital materials." Digital preservation is not just a technical challenge. "It necessitates an ongoing and typically recursive series of actions and interventions throughout the lifecycle to ensure continued & reliable access to authentic digital objects,for as long as they are deemed to be of value."  

Their vision is to make sure that "end-to-end workflows are in place that deliver and preserve our digital collections in a trusted long term digital repository so that they may be accessed by future users.” Other notes:
  • Control and consistency throughout the lifecycle is therefore an essential aspect of large scale, sustainable preservation.   
  • Priorities include: 
    • Changes to the existing technical repository infrastructure 
    • Ingest digital collections with metadata for long term preservation
    • Management and reporting will be documented and provide assurance and evidence of preservation 
    • Deliver content to users from the long term repository in a timely and reliable manner
  • Also important is to embed the skills and resources needed to sustain this approach into the future.

Related posts:

Monday, April 24, 2017

Three Keys to Digital Preservation: Management, Technology, and Content.

Three Keys to Digital Preservation: Management, Technology, and Content. Edward Corrado, Heather Moulaison Sandy. ACRL Webinar.  Apr 12, 2017.
     This is a webinar by Edward Corrado and Heather Moulaison Sandy that examines the basics of digital preservation, starting with what it is and what it is not. They then examine three fundamental and interrelated concerns in digital preservation: management, technology, and the content. The webinar also looks at:
  • The life cycle of digital objects
  • Things to know before starting digital preservation projects
  • Preservation techniques designed to endure changes in technology, as well as models and technical resources currently available
Some notes from the webinar:
  • Digital preservation is the active management of digital content over time to ensure ongoing access.
  • Digital objects are mediated by technology
  • It is not possible to leave the digital object alone and expect it to survive
  • By definition, digital preservation is a long-term activity. It requires policies to support this
  • A preservation plan must balance priorities over time
  • The greatest danger to digital materials is that we forget the meaning of them
  • Preservation metadata supports the long-term access and use of content
  • It is important to get content creators on board with preserving and describing the content, since they know the field and the content, and they will potentially be the content users
  • Important steps to take now;
    • Identify and organize content
    • Manage multiple copies of the content
    • Do a risk assessment of your digital operations
    • Document your processes and decisions
Digital preservation is an opportunity that can be both challenging and exciting.

Tuesday, April 18, 2017

Understanding PREMIS

Understanding PREMIS. Priscilla Caplan. Library of Congress Network Development and MARC Standards Office. 2017.
     PREMIS stands for "PREservation Metadata: Implementation Strategies". This document is a relatively brief overview of the PREMIS preservation metadata standard. It can also serve as an "gentle introduction" to the much larger document PREMIS Data Dictionary for Preservation Metadata. PREMIS defines preservation metadata as "the information a repository uses to support the digital preservation process."  Preservation metadata also supports activities "intended to ensure the long-term usability of a digital resource."

The Data Dictionary defines a core set of metadata elements needed in order to perform preservation functions, so that digital objects can be read from the digital media, and can be displayed or played. It includes a definition of the element; a reason why it is part of the metadata; also examples and notes about how the value might be obtained and used.  The elements address information needed to manage files properly, and to document any changes made. PREMIS only defines the metadata elements commonly needed to perform preservation functions on the materials to be preserved. The focus is on the repository and its management, not on the content authors or the associated staff, so it can be a guide or checklist for those developing or managing a repository or software applications. Some information needed is:
  • Provenance: The record of the chain of custody and change history of a digital object. 
  • Significant Properties: Characteristics of an object that should be maintained through preservation actions. 
  • Rights: knowing what you can do with an object while trying to preserve it.
The Data Model defines several kinds of Entities:
  • Objects (including Intellectual Entities)
  • Agents
  • Events
  • Rights
PREMIS provides an XML schema that "corresponds directly to the Data Dictionary to provide a straightforward description of Objects, Events, Agents and Rights."

Monday, April 17, 2017

Rosetta Knowledge Center

Rosetta Knowledge Center. Ex Libris. April 17, 2017.
     One of the things that I like about Rosetta, is the Ex Libris commitment to an open system. While the software may be proprietary, the essential content is open. The permanent objects and metadata are stored openly, so that they can be accessed or managed outside of the Rosetta software.

Another area that Ex Libris has opened is their Knowledge Center. This is very helpful in training new employees, learning new things about the software, or refreshing my memory. The open website includes:
  • Product Documentation
  • Training: Learn new skills with tutorials, recorded training and other materials
  • Release Notes about the features and capabilities of each product version
  • Implementation Guides that explain the methodology and requirements
  • Knowledge Articles providing answers to help answer questions

Saturday, April 15, 2017

ETD+ Toolkit

ETD+ Toolkit. Dr. Katherine Skinner, et al. Educopia Institute. April 10, 2017.
     Very helpful website for dealing with ETDs. The Toolkit is an open set of six modules to help students create, store, and maintain their research outputs. It was designed to:
  • Help administrators understand the digital research outputs students are creating
  • Help administrators assess what to collect and care for as part of the institutional memory
  • Help students make sure that research outputs are in durable formats and on durable devices;
  • Help students make informed decisions about file formats, documentation, and rights.
The Modules, which include "Learning Objectives, a one-page Handout, a Guidance Brief, a Slideshow with full presenter notes, and an evaluation Survey", are:
  1. Copyright: How can students gain appropriate permissions and how can students signal copyright for their own works?
  2. Data Organization: How can students structure, describe, store, and deposit data and other research files for reuse and/or future access?
  3. File Formats: How will the formats students choose make future access to their research easier or more difficult?
  4. Metadata: How can students store information describing their files to make sure they can tell what they are in the future?
  5. Storage: How can students make well informed choices about where to store their research materials?
  6. Version Control: What mechanisms can students use to make it easier to see the history of a file with multiple versions?
"In a 2014 survey of nearly 800 students across nine universities, students reported that non-PDF files - including research data, video, digital art, and software code - are either as important or more important than the Electronic Thesis and Dissertation (ETD) PDF as research outputs and evidence. Fully 80% of these students are producing non-PDF research outputs, most commonly tabular data (43%), digital images (38%), software code (29%), and digital text (28%)."
The ETD+ Toolkit provides introductory training for data curation and digital longevity techniques. It helps students identify and offset risks and threats to their digital research.

Tuesday, April 11, 2017

It’s not just a word

It’s not just a word. Helen Hockx. Things I cannot say in 140 characters.  April 7, 2017.
     Post that talks about her new job, to coordinate and develop a campus-wide strategy, and to oversee its implementation. Digital assets are managed but it now provides the opportunity to revisit the topic and address the gaps.  "A key finding is the strong focus on “now” – archiving and preservation are routinely overlooked. As a result, some digital assets have been lost and some are at risk."  A recommendation, considering "the 3 pillars of policy, process and technology" is to add “digital resources” to the university's goals where superb stewardship is required. Adding the word “digital” or calling out “digital resources” specifically, may not seem needed by some, but it emphasizes the need to "do a much better job with digital assets, if we applied the same rigor and coordinated approach." We still have a ways to go with digital archiving and preservation.

"So it is not just a word. Digital assets are a new class of resources which requires active care and management over time.  Adding it to the strategic mix is a recognition of their value, and of digital stewardship as a strategic priority. No. it is not just a word, it will have to come with commitment, ownership and resources." Some day we can remove the word “digital” from our strategic plan, "when preservation of digital assets is embedded in the organisational culture and operations, when there is no need to even mention it."

Monday, April 10, 2017

Encoding and Wrapper Decisions and Implementation for Video Preservation Master Files

Encoding and Wrapper Decisions and Implementation for Video Preservation  Master Files. Mike Casey. Indiana University. March 27, 2017.
     "There is no consensus in the media preservation community on best practice for encoding and wrapping video preservation master files." Institutions preserving video files long term generally choose from three options:
  • 10-bit, uncompressed, v210 codec, usually with a QuickTime wrapper
  • JPEG 2000, mathematically lossless profile, usually with an MXF wrapper
  • FFV1, a mathematically lossless format, with an AVI or Matroska wrapper
The few institutions digitizing and preserving video for the  long-term are roughly evenly divided between the three options above. This report examines in detail a set of choices and an implementation that has worked well for their institution. Originally they chose the first option, but with recent advances of FFV1, they reopened this decision and initiated a research and review process:
  • Exit strategy research and testing
  • Capture research (use FFmpeg within their system to generate FFV1 files).
  • Comparison of issues
  • Consultation with an outside expert
Results:  Research into exit strategies, they were able to move FFV1 files to a lossless codec with no loss of data. They decided to capture using FFmpeg, which requires developing a simple capture tool, and developed specifications for a minimal capture interface with FFmpeg for encoding and wrapping the video data.

Technical:  identified a number of key advantages to FFV1, including:
  • roughly 65% less data than a comparable file using the v210 codec
  • open source, non-proprietary, and hardware independent
  • largely designed for the requirements of digital preservation
  • employs CRCs for each frame allowing any corruption to be associated with a
  • much smaller digital area than the entire file
FFV1 appears to be "trending upwards among developers and cultural heritage organizations engaged in preservation work". They also chose the Matroska wrapper, which is an audiovisual container or wrapper format in use since 2002, and which is a more flexible wrapper option.

As more and more archives undertake video digitization" they will not accept older and limited formats" (AVI or MOV), but they will be looking for standards-based, open source options developed specifically for archival preservation. "Both FFV1 and Matroska are open source and are more aligned with preservation needs than some of the other choices and we believe they will see rapidly increasing adoption and further development."

Implementation: They developed a quality control program to validate that the output meets their specification for long-term preservation and checks the FFV1/Matroska preservation master files. These files are viewed using the VLC media player, a free open source cross-platform multimedia player that supports FFV1 and Matroska

Currently, they have created over 38,000 video files using FFV1 and Matroska. "We have chosen two file formats that are open source, developed in part with reservation in mind, and on the road to standardization with tools in active development. We have aligned ourselves with the large and active FFmpeg community rather than a private company. While the future is ultimately unknowable, we believe that this positions us well for long-term preservation of video-based content."

Saturday, April 08, 2017

New Home and Features for Sustainability of Digital Formats Site

New Home and Features for Sustainability of Digital Formats Site.  Kate Murray, Jaime Mears. The Signal. April 6, 2017.
     The Library of Congress web site, Sustainability of Digital Formats, contains "the technical aspects of digital formats with a focus towards strategic planning regarding formats for digital content, especially collection policies." The formats are divided into the type of object, which includes:
  • still image, sound, textual, moving image, web archive, datasets, geospatial and generic formats
The website shows the relationships between formats, including the sustainability factors and the quality and functionality for each content category.
  • Disclosure
  • Adoption
  • Transparency
  • Self-documentation
  • External dependencies
  • Impact of patents
  • Technical protection mechanisms
The new website is at and it now includes
  • The PRONOM ID and the Wikidata Title ID, both which help to document the formats, and 
  • The Library of Congress Recommended Formats Statement
The digital formats site continues to evolve to meet the Library’s and the digital preservation community’s changing needs.

Friday, April 07, 2017

How a Browser Extension Could Shake Up Academic Publishing

How a Browser Extension Could Shake Up Academic Publishing. Lindsay McKenzie. The Chronicle of Higher Education. April 06, 2017
     There are several open-access  initiatives. One initiative, called Unpaywall, is a just a browser extension. Unpaywall is an open-source, nonprofit organization "dedicated to improving access to scholarly research". It has created a browser extension to hopefully do one thing really well: instantly deliver legal, open-access, full text as you browse. "When an Unpaywall user lands on the page of a research article, the software scours thousands of institutional repositories, preprint servers, and websites like PubMed Central to see if an open-access copy of the article is available. If it is, users can click a small green tab on the side of the screen to view a PDF." A legally uploaded open-access copy is delivered to users more than half the time.

"It’s the scientists who wrote the articles, it’s the scientists who uploaded them — we’re just doing that very small amount of work to connect what the scientists have done to the readers who need to read the science." Open-access papers have the information but don’t always look like the carefully formatted articles in academic journals. Some users might not feel comfortable citing preprints or open-access versions obtained through Unpaywall, "without the trappings and formatting of traditional paywalled publishing," even if the copy is credible.

Friday, March 31, 2017

Procuring Digital Preservation: A Briefing

Procuring Digital Preservation: A Briefing.  Digital Preservation Coalition. 21 March 2017.
     Selecting and deploying solutions is especially challenging where the processes are new, or where the available resources are stretched, moving from project to ‘business as usual’ can be hard. This may be the case with digital preservation, but new digital preservation tools, services, and suppliers are emerging rapidly. This requires digital preservation staff make confident choices between different products. The increasing number and type of choices can lead to‘information overload,’ and delay the already complicated process. Even organisations that "properly understand their digital preservation needs can be frustrated in solving them, while solution providers have to meet impractical and at times unfeasible expectations."

The Digital Preservation Coalition hosted a briefing day to clarify requirements help find solutions. The presentations:
  • examine requirements from the perspective of the developer and the collection owner
  • discuss procedures for acquiring a preservation solution
  • discuss case studies and good practices for documenting requirements
  • examine current proprietary and open source solutions for digital preservation
  • Allow vendors to explain their own requirements 

Slides from several sessions are available:

Thursday, March 30, 2017

ACRL Closes with Carla Hayden

ACRL Closes with Carla Hayden. Amy Carlton. American Libraries. March 27, 2017.
     Some quotes from the article about libraries, collections, and information:
  • “When we seek information, we examine the privilege of the voices and sources of our information, and we learn to identify whose voices are present and whose voices are missing and how that impacts and influences our understanding of that information.” Margaret Brown-Salazar
  • "Hayden said her goal is to make the Library of Congress’s  (LC) priceless collections available to everybody—for LC to live up to its nickname of America’s Library. Obama told her that he went to an exhibit there and saw Lincoln’s reading copy of the Gettysburg Address and the contents of his pockets from the night he was assassinated, but he was pretty sure this access was because of his being president. He told her he wanted someone for the job who could make sure a kid in Baltimore, a person at public library, a student at a community college, and anyone would be able to see these treasures. “And that’s when I said yes,” she said."
  • “Our materials are nothing without the people and staff. That’s what makes it come alive”
  • “Librarians are having a moment! Trustworthiness is our strength. We should revel in it and be confident in it. If we’re having a moment, let’s seize the moment!”

Wednesday, March 29, 2017

Archives Unlocked vision launched at the Southbank Centre

Archives Unlocked vision launched at the Southbank Centre. Press release. The National Archives. 29 March 2017.
     The National Archives (UK) has launched a vision and action plan to help archives secure their future through digital transformation, investing in new workforce skills, and encouraging innovation. This vision and action plan offers a future where "businesses, creative industries, arts organisations, academia, and communities can fully exploit a more resilient archives sector, with the UK leading the world in digital transformation."  It is built on themes of Trust, Enrichment and Openness, that highlight "the importance of archives in holding authority to account through scrutiny, in driving innovation and creativity for businesses and across society, and in cultivating an open approach to knowledge accessible to all."

The rich, national collection of archives "are the nation’s collective memory." The updated vision is needed to sustain the Archives for the long term. "The Archives Unlocked action plan embodies this. It sets out what is required to release the power of the archives."

"Working with partners, stakeholders, investors and individuals, we will have greater potential and influence to accomplish what we need to do. The UK will be home to world-leading archives: both digital and physical."