Showing posts with label PREMIS. Show all posts
Showing posts with label PREMIS. Show all posts

Tuesday, April 18, 2017

Understanding PREMIS

Understanding PREMIS. Priscilla Caplan. Library of Congress Network Development and MARC Standards Office. 2017.
     PREMIS stands for "PREservation Metadata: Implementation Strategies". This document is a relatively brief overview of the PREMIS preservation metadata standard. It can also serve as an "gentle introduction" to the much larger document PREMIS Data Dictionary for Preservation Metadata. PREMIS defines preservation metadata as "the information a repository uses to support the digital preservation process."  Preservation metadata also supports activities "intended to ensure the long-term usability of a digital resource."

The Data Dictionary defines a core set of metadata elements needed in order to perform preservation functions, so that digital objects can be read from the digital media, and can be displayed or played. It includes a definition of the element; a reason why it is part of the metadata; also examples and notes about how the value might be obtained and used.  The elements address information needed to manage files properly, and to document any changes made. PREMIS only defines the metadata elements commonly needed to perform preservation functions on the materials to be preserved. The focus is on the repository and its management, not on the content authors or the associated staff, so it can be a guide or checklist for those developing or managing a repository or software applications. Some information needed is:
  • Provenance: The record of the chain of custody and change history of a digital object. 
  • Significant Properties: Characteristics of an object that should be maintained through preservation actions. 
  • Rights: knowing what you can do with an object while trying to preserve it.
The Data Model defines several kinds of Entities:
  • Objects (including Intellectual Entities)
  • Agents
  • Events
  • Rights
PREMIS provides an XML schema that "corresponds directly to the Data Dictionary to provide a straightforward description of Objects, Events, Agents and Rights."

Tuesday, June 28, 2016

Protecting the Long-Term Viability of Digital Composite Objects through Format Migration

Protecting the Long-Term Viability of Digital Composite Objects through Format Migration. Elizabeth Roke, Dorothy Waugh. iPres 2015 Poster. November, 2015.
     The poster discusses work done at Emory University’s Manuscript, Archives, and Rare Book Library to "review policy on disk image file formats used to capture and store digital content in our Fedora repository". The goal was to to migrate existing disk images to formats more suitable for long-term digital preservation. Trusted Repositories Audit & Certification (TRAC) requires that digital repositories monitor changes in technology in order to respond to changes. Advanced Forensic Format offered a good solution for capturing forensic disk images along with disk image metadata, but Libewf by Joachim Metz, which is a library of tools to access the Expert Witness Compression Format (EWF) has replaced it. They have decided to acquire raw disk images, or when not possible, to use tar files, because the disk images may be less vulnerable to obsolescence.

In attempting to migrate formats, they had to develop methods for migrating the files setup the repository to accept the new files. They also rely on PREMIS metadata.  The migration of disk images from a proprietary or unsupported format to a raw file format has made it easier for us to manage and preserve these objects and mitigates the threat of obsolescence for the near term. There have been some consequences. Some metadata is no longer available. Also, the process will be more complicated and require other workflows, and files will no longer contain embedded metadata. "The migration to a raw file format has made the digital file itself easier to preserve."



Friday, July 17, 2015

Filling the Digital Preservation Gap. A Jisc Research Data Spring project. Phase One report - July 2015

Filling the Digital Preservation Gap. A Jisc Research Data Spring project. Phase One report - July 2015. Jenny Mitcham, et al. Jisc Report. 14 July 2015.
     Research data is a valuable institutional asset and should be treated accordingly. This data is often unique and irreplaceable. It needs to be kept to validate or verify conclusions recorded in publications. Preservation of the data in a usable form may be required by the research funders, publishers, or  universities. The research data should be preserved  and available for others to consult  after the project that generated it is complete.This means the research data needs to be actively managed and curated. "Digital preservation is not just about implementing a good archival storage system or ‘preserving the bits’ it is about working within the framework set out by international standards (for example the Open Archival Information System) and taking steps to increase the chances of enabling meaningful re-use in the future."

Accessing research data is clearly already a problem for researchers when formats and media become obsolete. A 2013 survey showed that 25% of respondents had encountered the “Inability to read files in old software formats on old media or because of expired software licences”. A digital preservation program should address these issues. Open Archival Information System and it uses standards such as PREMIS and METS to store metadata about the objects that are being preserved.  A digital preservation system, such as Archivematica recommended in the report, would consist of a variety of different systems performing different functions within the workflow. "Archivematica should not be seen as a magic bullet. It does not guarantee that data will be preserved in a re-usable state into the future. It can only be as good as digital preservation theory and practice is currently and digital preservation itself is not a fully solved problem."

Research data is particularly challenging from a preservation point of view because of the many data types and formats, many of which are not formats that digital preservation tools and policies exist for, thus they will not receive as a high a level of curation when ingested into Archivematica.
The rights metadata within Archivematica may not fit the granularity that would be required for research data. This information would need to be held elsewhere within the infrastructure.

The value of research data can be subjective and difficult to assess and there may be disagreement on the value of the data. However, the bottom line is "in order to comply with funder mandates, publisher requirements and institutional policies, some data will need to be retained even if the researchers do not believe anyone will ever consult it." Knowing the types of formats used is a key to digital archiving and planning, and without that there will be problems later. In the OAIS Reference Model, information about file formats needs to be part of the ‘Representation Information’ that an end user must have to open and view a file.

Saturday, June 20, 2015

PREMIS Data Dictionary for Preservation Metadata, Version 3.0

PREMIS Data Dictionary for Preservation Metadata, Version 3.0. Library of congress.
June 10, 2015. [Full PDF]
The PREMIS Data Dictionary and its supporting documentation is a comprehensive, practical resource for implementing preservation metadata in digital archiving systems. The Data Dictionary is built on a data model that defines five entities: Intellectual Entities, Objects, Events, Rights, and Agents. Each semantic unit defined in the Data Dictionary is a property of one of the entities in the data model.

This new publications are:
  • PREMIS Data Dictionary. Version 3.0. This is the full document which includes the PREMIS Introduction, the Data Dictionary, Special Topics, and Glossary.
  • PREMIS Data Dictionary This document only has the Data Dictionary, introductory materials
  • Hierarchical Listing of Semantic Units: PREMIS Data Dictionary, Version 3.0
  • The Version 3.0 PREMIS Schema is not yet available
Version 3 of the Data Dictionary includes some major changes and additions to the Dictionary, which are:
  • Reposition Intellectual Entity as a category of Object to enable additional description within PREMIS and linking to related PREMIS entities.
  • Reposition Environments (i.e. hardware and software needed to use digital objects) so that they can be described and preserved reusing the Object entity. That is to say, they can be described as Intellectual Entities and preserved as Representation, File or Bitstream Objects.
  • Add physical Objects to the scope of PREMIS so that they can be described and related to digital objects.
  • Add a a new semantic unit to the Object entity: preservationLevelType (O, NR) to indicate the type of preservation functions expected to be applied to the object for the given preservation level.
  • Add a new semantic unit to the Agent entity to express the version of software Agents: agentVersion (O, NR).
  • Add a new semantic unit to the Event entity: eventDetailInformation (O, R)

There are major additions in the “PREMIS Data Model” and “Environment” sections.
The data model:


The entities in the PREMIS data model are:
  • Object: a unit subject to digital preservation.This can now be an environment.
  • Environment: technology supporting a Digital Object. Can now be as Intellectual Entity.
  • Event: an action concerning an Object or Agent associated with the preservation repository.
  • Agent: entity associated with Rights, Events, or an environment Object.
  • Rights Statement: Rights or permissions pertaining to an Object and/or Agent.
With the advent of Intellectual Entities in PREMIS 3.0, environments have been transformed. "Before version 3.0, there was an environment container within an Object that described the environment supporting that Object. If a non-environment Object needs to refer to an environment, it is now recommended that the environment is described as an Object in its own right and the two Objects are linked with a dependency relationship."

Friday, June 19, 2015

Digital Preservation Metadata and Improvements to PREMIS in Version 3.0

Digital Preservation Metadata  and Improvements to PREMIS in  Version  3.0. Angela Dappert. May 27, 2015. [PDF]
This is the notes from a DCMI/ASIS&T joint webinar about PREMIS v. 3. The PDF document has 63 slides which gives an overview of why digital preservation metadata is needed, shows examples of digital preservation metadata, shows how PREMIS can be used to capture this metadata, and shows some of the changes in version 3.0.

Digital preservation metadata is the metadata needed to ensure long-term accessibility of digital resources. Digital objects must be self-descriptive independently from the systems that were used to create them. PREMIS is the de-facto standard for metadata to support the preservation of digital objects and ensure their long-term usability. It is a common data model for organizing/thinking about preservation metadata, or for exchanging information packages between repositories. It is not an out-of-the-box solution, nor all the metadata needed.

Wednesday, June 03, 2015

PREMIS Data Dictionary for Preservation Metadata: Approved Changes for version 3.0

PREMIS Data Dictionary for Preservation Metadata: Approved Changes for version 3.0. Library of Congress.  November 24, 2014.
The following changes were approved for PREMIS version 3.0:

  • Make Intellectual Entity another category of Object. 
  • Define preservationLevelType to indicate the type of preservation functions expected to be applied to the object for the given Preservation level.
  • AgentVersion was added to record the version of software Agents
  • The data model was changed so that Environments can be described and preserved reusing the Object entity
  • Physical Objects can be described as representations and be related to digital objects.
  • A value of unknown will be added to compositionLevel and format if the information is not available.


Thursday, February 19, 2015

ArchivesDirect hosted service

ArchivesDirect website. February 18, 2015.
ArchivesDirect is a web based hosted service of Archivematica offered by DuraSpace for creating OAIS-based digital preservation workflows with content packages that are archived with DuraCloud and Amazon Glacier. It includes open source preservation tools, and generates archival packets using microservices, PREMIS, and mets xml files. ArchivesDirect is intended for small to mid sized institutions. Duraspace is a partnership with DSpace, Fedora, and Vivo.

Pricing and subscription plans include:
ArchivesDirect Standard (System, training, 1 TB): $11,900
ArchivesDirect Digital Preservation Assessment: $4,500
Additional Storage in Amazon S3 and Glacier: $1,000/TB/year

Saturday, February 07, 2015

Digital Preservation Coalition publishes ‘OAIS Introductory Guide (2nd Edition)’ Technology Watch Report

Digital Preservation Coalition publishes ‘OAIS Introductory Guide (2nd Edition)’ Technology Watch Report. Brian Lavoie.  Digital Preservation Coalition. Watch Report. October, 2014. [PDF]

The report describes the OAIS, its core principles and functional elements, as well as the information model which support long-term preservation, access and understandability of data. The OAIS reference model was approved in 2002 and revised and updated in 2012. Perhaps “the most important achievement of the OAIS is that it has become almost universally accepted as the lingua franca of digital preservation”.

The central concept in the reference model is that of an open archival information system. An OAIS-type archive must meet a set of six minimum responsibilities to do with the ingest, preservation, and dissemination of archived materials: Ingest, Archival Storage, Data Management, Preservation Planning, Access, and Administration. There are also Common Services, which consist of basic computing and networking resources.

An OAIS-type archive references three types of entities: Management, Producer, and Consumer, which includes the Designated Community: consumers expected to independently understand the archived information in the form in which it is preserved and made available by the OAIS. This is a  framework to encourage dialogue and collaboration among participants in standards-building activities, as well as identifying areas most likely to benefit from standards development.

An OAIS-type archive is expected to:
  • Negotiate for and accept appropriate information from information producers;
  • Obtain sufficient control of the information in order to meet long-term preservation objectives;
  • Determine the scope of the archive’s user community;
  • Ensure the preserved information is independently understandable to the user community
  • Follow documented policies and procedures to ensure the information is preserved against all reasonable contingencies
  • Make the preserved information available to the user community, and enable dissemination of authenticated
An OAIS should be committed to making the contents of its archival store available to its intended user community, through access mechanisms and services which support users’ needs and requirements. Such requirements may include preferred medium, access channels, and any access restrictions should be clearly documented.

 The OAIS information model is built around the concept of an information package, which includes: the Submission Information Package, the Archival Information Package, and the Dissemination Information Package. Preservation requires metadata to support and document the OAIS’s preservation processes, called Preservation Description Information, which ‘is specifically focused on describing the past and present states of the Content Information, ensuring that it is uniquely identifiable, and ensuring it has not been unknowingly altered’. The information consists of:
  • Reference Information (identifiers)
  • Context Information (describes relationships among information and objects)
  • Provenance Information (history of the content over time)
  • Fixity Information (verifying authenticity)
  • Access Rights Information (conditions or restrictions)
OAIS is a model and not an implementation. It does not address system architectures, storage or processing technologies, database design, computing platforms, or other technical details of setting up a functioning archival system. But it has been used as a foundation or starting point. Efforts, such as TRAC, have been made to put the attributes of a trusted digital archive into a ‘checklist’ that could be used to support a certification process. PREMIS is a preservation metadata initiative that has emerged as the de facto standard. METS, and XML based  document form, has become widely used for encoding OAIS archival information packages.

The ‘OAIS reference model provides a solid theoretical basis for digital preservation efforts, though theory and practice can sometimes have an uneasy fit.’




Monday, May 27, 2013

National Library of Australia’s Digital Preservation Policy

Digital Preservation Policy 4th Edition (2013). National Library of Australia.  May 26, 2013.
This site outlines the National Library of Australia’s policy on preserving its digital collections, and collaborating with others to preserve digital information resources. The primary objective of their digital preservation activities is maintaining the ability to meaningfully access digital collection content over time. The primary concern is preserving the ability to access the Preservation Master File from which derivatives files may be created or re-created over time. To this end, preservation of digital library material includes:
  •     Bit-level preservation of all digital objects, ie. keeping the original files intact;
  •     Ensuring that authenticity and provenance is maintained;
  •     Ensuring that appropriate preservation information is maintained;
  •     Understanding and reporting on risks which affect ongoing access;
  •     Performing appropriate actions to ensure that objects remain accessible;
  •     Periodic review of preferred formats and digital metadata standards
Preservation of the Library's digital collections involves four main goals:
  1.     Maintaining access to reliable data at bit-stream level;
  2.     Maintaining access to content encoded in the bit streams;
  3.     Maintaining access to the intended content; and
  4.     Maintaining the stated preservation intent for all digital material over time.
While specific preservation activities may focus on one or more of these goals, the Library’s preservation responsibility is only fulfilled when all four goals have been adequately addressed.

The Library uses the concepts in the Open Archival Information Systems (OAIS) Reference Model and other international standards and best practices, such as PREMIS and Open Planets Foundation.