Showing posts with label DPN. Show all posts
Showing posts with label DPN. Show all posts

Thursday, December 13, 2018

Why Is the Digital Preservation Network Disbanding? Lessons from organizational challenges

Why Is the Digital Preservation Network Disbanding? Roger C. Schonfeld. The Scholarly Kitchen; Society for Scholarly Publishing. Dec 13, 2018.
     "The long term stewardship of digital objects and collections through digital preservation is an essential imperative for scholarship and society. Yet its value is intangible and its rewards are deferred. It falls on organizations to invest in preservation, often less out of a sense of anticipated exclusive returns and more out of a sense of contributing to a community mission." It is essential that we discuss the lessons we can learn from organizational challenges.

DPN was a commitment to replicate the data of research and scholarship across diverse environments and to enable existing preservation capacity. It offered an elegant technical solution but the product offering was never as clear as it could have been, and ultimately could not be sustained. Most DPN members did not use the network services and membership declined. Some patterns emerged: 
  • Not every storage need requires a preservation solution, and the members were "in some cases, unsuccessful in distinguishing the added value of a preservation solution from cloud storage."
  • Many library systems were not originally prepared to support DPN’s ingest workflow. For a number of members, the content to be preserved was spread across servers and systems, often with limited curatorial control. 
  • The product definition took too long to emerge and the value proposition was not uniformly understood.
  • DPN’s pricing model did not generate the revenue that DPN’s model anticipated. 
  • Some libraries signed up more out of courtesy or community citizenship than commitment.
  • Membership models are ill-suited to product organizations and marketplace competition.  
There are broader implications in the disbandment of DPN. The article states that  DPN will not be the last closure, merger, or other reorganization. "It seems clear that we are in a period of instability for collaborative library community efforts and more major changes are surely on the horizon."

Wednesday, December 05, 2018

Digital Preservation Network (DPN) Sunset

Community Announcement - DPN SunsetDigital Preservation Network. December 4, 2018.
     The Digital Preservation Network’s Board of Trustees of DPN are ending DPN.  The DPN Board "determined that it is not feasible to design and implement changes that would ensure sustainability." 
"The landscape of digital preservation services has changed considerably in the past six years, as have the community’s preservation needs. Our highest priority is to affect an orderly sunset for the organization’s operations and for the disposition of its deposits."

The ending of a community-based organization to provide long-term digital preservation storage  highlights the numerous challenges with maintaining digital resources long-term.


Friday, August 18, 2017

Evaluating Your DPN Metadata Approach

Evaluating Your DPN Metadata Approach.  DPN Preservation Metadata Standards Working Group. July 27, 2017. [PDF, 6 pp.]
     This brief guide can help determine a clear metadata approach to recovering data "in the far future among unpredictable circumstances".  The document can help users create a sound approach to preserving your institution’s data and make decisions that fit with their own institutional needs.

The first section is:
What information is needed to understand and contextualize an object? It examines both descriptive and structural metadata.

Descriptive Metadata: for the purpose of identification and discovery of an object. Dublin
Core, MODS and VRAcore are common standards used for descriptive metadata.  

Structural Metadata: describes relationships between objects, such as pages in a book. The METS Structural Map can express  hierarchical relationships or parent/child relationships. The PREMIS "relationship" element can express version relationships.

The document also looks at how to:
  • understand and contextualize a collection; 
  • connect/relate objects to a collection; 
  • connect/relate versions to each other; 
  • connect metadata records to associated objects and collections;
  • ensuring the authenticity of an object;
  • ensuring the essential characteristics of the original are maintained in a data migration

Thursday, August 17, 2017

DPN: Metadata Considerations for Deposits

Metadata Considerations for Deposits. DPN. August 2017.
     The Digital Preservation Network working groups have provided an overview of the types of metadata to consider while preparing deposits for DPN. Several areas are addressed:
  1. DPN-specific metadata, especially DPN-specific metadata, DPN’s BagIt specification, Tag Directories and Bag Structure.
  2. DuraCloud-specific metadata, while they do not restrict metadata they "indicate that local policies should be used to define metadata approaches".  Each snapshot contains four DuraCloud-created files: checksums (md5, sha265), a content properties file, and a collection-snapshot file  
  3. Core descriptive metadata records. The DPN Preservation Metadata Standards Working Group examined minimal metadata records from a variety of member institutions to find common metadata schemas. This resulted in  a “core record,” or the "minimum level of information needed in order to understand digital assets at a later date," shown in a clear chart.
  4. Significant properties of content. "In order for digital files to be usable and accessible in the long-term, it is important to recognize the importance of significant properties and to ensure that the properties of your digital materials are being documented in some form." They list content types, with examples of common significant properties. 

Thursday, December 15, 2016

DPN and uploading to DuraCloud Spaces

DPN and uploading to DuraCloud Spaces. Chris Erickson. December 15, 2016.
     For the past while we have been uploading preservation content into DuraCloud as the portal to DPN. DuraCloud can upload files by drag-and-drop but a better way is with the DuraCloud Sync Tool. (The wiki had helpful information in setting this up). This sync tool can copy files from any number of local folders to a DuraCloud Sspace, and can add, update, and delete files. I preferred the GUI version in one browser window and the DuraCloud Account in another.

We have been reviewing all of our long term collections and assigning Preservation Priorities, Preservation Levels, and also the number of Preservation Copies. From all this we decided on three collections to add to DPN, and created a Space (which goes into an Amazon bucket) for each. The Space will then be processed into DPN:
  1. Our institutional repository, including ETDs which are now digitally born, and research information. From our ScholarsArchive repository
  2. Historic images that have been scanned; the original content is either fragile or not available. Exported from Rosetta Digital Archive.
  3. University audio files; the original content was converted from media that is at risk. Some from hard drives, others exported from Rosetta Digital Archive.
Some of the files were already in our Rosetta preservation archive, and some were in processing folders ready to be added to Rosetta. They all had metadata files with them. The sync tool worked well for uploading these collections by configuring the source location as the Rosetta folders and target was the corresponding DuraCloud Space. Initially, the uploading was extremely slow, several days to load 200 GB. But DuraCloud support provided a newer, faster version of the sync tool, and we changed to a faster connection. The upload threads changed from 5 to 26 and we uploaded the next TB in about a day.

We also had a very informative meeting with DPN and the two other universities in Utah that are DPN members, where Mary and Dave told us that the price per TB was now half the original cost. Also, that unused space could be carried over to the next year. This will be helpful in planning additional content to add. Instead of replicating our entire archive in DPN, we currently have a hierarchical approach, based on the number and location of copies, along with the priorities and preservation levels.

Related posts:

Monday, December 05, 2016

Digital Preservation Network - 2016

Digital Preservation Network - 2016. Chris Erickson. December 5, 2016.
     An overview of  the reason for DPN. Academic institutions require that their scholarly histories, heritage and research remain part of the academic record. This record needs to continue beyond the life spans of individuals, technological systems, and organizations. The loss of academic collections that are part of these institutions could be catastrophic. These collections, which include oral history collections, born digital artworks, historic journals, theses, dissertations, media and fragile digitizations of ancient documents and antiquities are irreplaceable resources.

DPN is structured to preserve the stored content by using diverse geographic, technical, and institutional environments. The preservation process consists of:
  1. Content is deposited into the system through an Ingest Node, which are preservation repositories themselves; 
  2. Content is replicated to at least two other Replicating Nodes and stored in different types of repository infrastructures; 
  3. Content is checked by bit auditing and repair services to prevent change or loss; 
  4. Changed or corrupted content is restored by DPN; 
  5. As Nodes enter and leave DPN, preserved content is redistributed to maintain the continuity of preservation services into the far-future.
The Ingest Node that we are using is through DuraCloud.


Monday, May 09, 2016

Looking Across the Digital Preservation Landscape

Looking Across the Digital Preservation Landscape. Margaret Heller. ACRL TechConnect Blog. April 25, 2016.
     "When it comes to digital preservation, everyone agrees that a little bit is better than nothing." The article cited refers to two presentations from Code4Lib 2016, “Can’t Wait for Perfect: Implementing “Good Enough” Digital Preservation” by Shira Peltzman and Alice Sara Prael, and “Digital Preservation 101, or, How to Keep Bits for Centuries” by Julie Swierczek. This article mentions two major items about digital preservation:
  1. Digital preservation doesn’t have to be hard, but it does have to be intentional.
  2. Digital preservation requires institutional commitment. 
Understanding all the basic issues and what your options are can be daunting. They had a committee that started examining born digital materials, but expanded the  focus to all digital materials because it made it easier to test their ideas. Some of the tasks they accomplished included: created a rough inventory of digital materials, a workflow manual, and secured networked storage  to replace all removable hard drives used for backups. "While backups aren’t exactly digital preservation, we wanted to at the very least secure the backups we did have". The inventory and workflow manual are living documents and are useful for identifying gaps in the processes.

They also looked at the end-to-end systems available for digital preservation, such as Preservica, ArchivesDirect, and Rosetta. Migrating from one system to another if you change your mind may involve some very difficult processes, so people may tend to stay with providers.  Another option is to join a preservation network, such as Digital Preservation Network (DPN) or APTrust, that have the larger preservation goal ensuring long-term access to material even if the owning institution disappears.

Sustainable Financing for many is the crux of the digital preservation problem. "It’s possible to do a sort of ok job with digital preservation for nothing or very cheap, but to ensure long term preservation requires institutional commitment for the long haul, just as any library collection requires."

Digital Preservation is receiving more attention digital preservation lately and hopefully more libraries will see this as a priority.

Tuesday, March 15, 2016

The Digital Preservation Network (DPN) Has Launched and Is Accepting Content

The Digital Preservation Network (DPN) Has Launched and Is Accepting Content.  Mary Molinaro. D-Lib Magazine. March/April 2016.
    Several years ago a group of academic leaders examined the risk to future scholars if the digital output from academia is not properly preserved and felt that the risk of loss was very high if nothing was done to protect against natural disasters, technological failure, or institutional failure. They pledged to create a large-scale digital preservation service that is built to last beyond the life spans of individuals, technological systems, and organizations. After three years of work, the resulting Digital Preservation Network is open and is accepting content from members. Five preservation repositories make up the DPN network. They have varying technical architectures and replicate content and perform services to safeguard the content. Content from member institutions can be added to DPN through two sites: DuraCloud Vault and the Academic Preservation Trust. The deposited content is replicated to the other nodes (Hathitrust, the Texas Preservation Node, and the Stanford Digital Repository).

DPN operates as an independent organization under the umbrella of Internet2 and is currently examining ways to open up DPN to other kinds of members.  More information is available at the DPN website.


Tuesday, February 16, 2016

A Digital ‘Library of Alexandria’

A Digital ‘Library of Alexandria’. Katie McNally. UVAToday. February 10, 2016.
     "Scholars often lament the knowledge that might have been preserved if the great Library of Alexandria had been better protected."  Digital collections face a similar threat of "steady extinction" because of technological obsolescence. One safeguard is the Academic Preservation Trust (APTrust) which is "a large-scale solution that preserves digital scholarship by storing it across multiple technologies and physical locations." The primary goal is to package and preserve information in a way so it will be accessible to future generations.  Besides proper description, “Deep dark preservation” refers to all the pieces needed to "effectively archive a digital file and the technology it runs on for future use."  So the digital preservation is really a phased thing: address those items which can be done quickly, then work on the more difficult problems, such as finding ways to preserve the software that makes digital objects accessible. Emulation environments are being worked on to keep old software running. 

APTrust stores files in two separate Amazon data centers, one in Virginia and one in Oregon, and each of these use different technologies to store the data, to help protect against the "failure of future and modern technologies.” [APTrust is also one of the DPN nodes.]

Monday, January 25, 2016

Figshare Joins the Digital Preservation Network

Figshare Joins the Digital Preservation Network to ensure survival, ownership and management of research data into the future. Carol Minton Morris.  DuraSpace. January 20, 2016.
     "Figshare, a platform that supports the management of research content, is the first research data repository to join the DPN Federation". The research data from Figshare will be deposited in DPN through the DuraSpace DuraCloud Vault node and this will provide long-term access to scholarly resources.

Thursday, December 10, 2015

The Digital Preservation Network (DPN) Explained

The Digital Preservation Network (DPN) Explained. DuraSpace.org. December 8, 2015.
     The DPN digital preservation service guarantees academic institutions that scholarly resources will survive into the “far-future”. DPN is "the only large-scale digital preservation service that is built to last beyond the life spans of individuals, technological systems, and organizations". Like insurance, DPN provides a guarantee that future access to scholarly resources will be available in the event of any type of change in administrative or physical institutional environments. This is possible by establishing a redundant and varied technical and legal infrastructure at multiple administrative levels. DPN is a scholarly “dark archive” which means that the content stored is not actively used or accessed, but that it can be made available for use at any time from multiple digital storage facilities.

Academic institutions require that key aspects of their scholarly histories, heritage and research remain part of the record of human endeavor. DPN members will begin adding digital assets to the network through DuraCloud Vault, a cooperative development between DPN, DuraSpace and Chronopolis which will serve as the primary ingest point beginning in January.


Tuesday, October 06, 2015

The Digital Preservation Network (DPN) at the DLF Forum

The Digital Preservation Network (DPN) at the DLF Forum. Evviva Weinraub. Digital Library Federation website. October 1, 2015.
      Digital preservation starts when you take possession of a digital object. When institutions create or accept a digital object, "they are beginning the long journey of digital preservation." What happens to your data if there is there is a disaster or the storage institution fails? The Digital Preservation Network (DPN) is working to create a network of dark, diverse, replicated repositories that guarantee member data for at least 20 years. The DPN partners are APTrust, Texas Digital Libraries, Hathi Trust, Chronopolis, DuraSpace, and Stanford Digital Libraries and they plan to start preservation ingestion in January 2016.

Monday, June 08, 2015

New Sources and Storage Options For Rosetta

Rosetta Users Group 2015: New Sources and Storage Options For Rosetta. Chris Erickson. June 3, 2015. [PDF slides]
This is my presentation at the Rosetta's User Group / Advisory Group held this past week.

We installed Rosetta in March 2012 and have ingested a number of collections in the preservation repository. In addition to those sources and collections we have already set up to work with Rosetta, we have been working with some new areas. These include:
  • University Academic electronic records in SharePoint. Rosetta harvest tool for SharePoint
  • New Library repositories, such as Digital Commons
  • Harvest Canon camera raw images and ingest into Rosetta
  • Ingest University videos and digitized files from the Audio Digitization project
  • Program to gather information from Unstructured folders of archival objects and ingest into Rosetta

One of the things that everyone is struggling with is the lack of sufficient storage. Conventional storage is expensive and limited. So we are investigating alternative long term storage possibilities:
  • Develop a Proof of Concept Project to ingest Rosetta content in DPN
  • Amazon S3 Cloud Storage from Rosetta. Easy to connect to Rosetta.
  • Hitachi - LG Data Storage (HLDS) Optical Archive System
    • Long Term storage system 
    • Lowest storage cost
    • Single rack unit holds 1 PB of permanent storage; unlimited expansion
    • Currently testing with our Rosetta system
    • Reduce the need for refreshing or migrating content
    • Plan to use with Millenniata M-Discs in the archive

What we are looking for in preservation storage:
  • Sufficient capacity for our ever increasing content
  • Reasonable long term cost
    • Lower total costs of ownership
    • Reduced cost of refreshing or migration
  • Reliable and recoverable
    • Archival media
    • Industry Storage Partner
    • Multiple copies, locations
    • Secure storage
  • Accessible
    • Rosetta
    • Network

Monday, April 27, 2015

Chronopolis and DuraCloud: Doing integration right

Chronopolis and DuraCloud: Doing integration right. Bill Branan, David Minor. PASIG Presentation. March 12, 2015. [PDF]
Duracloud is a hosted digital preservation service. Chronopolis is a digital preservation storage network spanning multiple institutions and geographic regionsbased on active preservation (constant checking of items). The reasons for integrating the services and becoming a DPN node:
  • Digital content preservation is important to the future of society
  • All preserved digital content should be handled equally
  • Need an economically viable option to support the preservation needs of all institutions, regardless of size or technical capability
  • Need to simplify the preservation process as much as possible
These are two very independent existing systems with different workflows and processes. DuraCloud works with real-time data, and Chronopolis works with well defined data collections. Sometimes the best way to integrate two systems is to not require either system to know anything about the other.

Thursday, October 30, 2014

Digital Preservation Network (DPN) Launches Member Content Pilot

Digital Preservation Network (DPN) Launches Member Content Pilot. Carol Minton Morris. Duraspace.org. 2014-10-29. 
DPN has launched a Member Content Pilot program as a step toward establishing an operational, long-term preservation system. The pilot is testing real-world interactions between DPN members through DPN “nodes” that ingest data from DPN members and package it for preservation storage. Chronopolis/Duracloud, The Texas Preservation Node, and the Stanford Digital Repository will be functioning as First Nodes. APTrust and HathiTrust, in addition to the above three, will be providing replication services for the pilot data.