Tuesday, November 26, 2019

Jennifer Paustenbaugh - Digital Preservation: We have to get this right

From the time Jennifer Paustenbaugh was hired as the University Librarian at the Harold B. Lee Library, she was a strong proponent of digital preservation. With her sudden passing, it is worth remembering some of her insights on digital preservation. Jennifer will be greatly missed.

"We have to get this right." Jennifer Paustenbaugh. Digital Preservation. Harold B. Lee Library, Brigham Young University. June, 2016.
  • “We have to get this right. If we don't, then not much else that we’re doing in research libraries matters. If we don’t fully develop a sustainable digital preservation program, we could negatively impact whole areas of research, because materials created right now could just disappear. I think about gaps that exist in records because of man-made events and natural disasters. This could be a disaster of our own making.” 
  • "I truly believe that of all the things we’re doing in the library, this is the thing that has the potential to make the biggest difference to scholars 20 or 50 years from now. Much of the digital content that we are preserving will be gone forever if we don’t do this right. It’s a role that at once is formidable and humbling. And for most people, it will probably never be important until something that is vital to their research is just missing (and forever unavailable) from the historical record."

Thursday, September 26, 2019

National Film and Sound Archive of Australia to collect and preserve Australian video games

NFSA to collect and preserve Australian video games. National Film and Sound Archive of Australia. September 26 2019.
     The Archive announced they will start collecting Australian video games for archival preservation. "Today we welcome video games into our collection of more than 3 million items. The collection represents the cultural diversity and breadth of experience of all Australians, and it is constantly evolving just like our creative industries."  "It is essential that games be collected alongside other audiovisual media, to ensure their continued preservation and access."

An initial list of eight games has been selected for preservation. They span almost 40 years of gaming history, from 1982 to 2019, across all platforms, from cassette tape to mobile devices and virtual reality headsets. This will provide "an overview of the evolution of the medium, as well as an opportunity to identify the archival challenges in preserving the different technologies employed - both software and hardware."

The Archive will work with game developers to develop solutions for long-term preservation of their work which will benefit future generations. The initial selection will "allow the NFSA to explore what components and documentation must be collected in order to paint a complete picture of the game’s creative process from concept to finished product. It will also identify challenges around software and hardware obsolescence, long-term storage and access, rights and proprietorial platforms, etc., to inform the ongoing preservation strategy. Following this phase, Australian games will be collected on an ongoing basis."

Friday, August 30, 2019

Saskatchewan Archives digitizing 560,000 newspaper pages from Second World War years

Saskatchewan Archives digitizing 560,000 newspaper pages from Second World War years. Arthur White-Crummey. Regina Leader-Post. August 28, 2019.
     This is an article about the digital preservation laboratory of the Provincial Archives of Saskatchewan and the project of scanning and auditing a collection of weekly newspapers from 160 Saskatchewan communities. The Archives is marking the 80th anniversary of Canada’s entry into the Second World War by digitizing its trove of community newspapers covering the years 1939 to 1945. There are 560,000 pages for that period alone, just part of a massive collection of 10 million pages extending from 1878 to the 1960s.

The archivist views digitization "as a way of ensuring the survival of those priceless records. The newspapers themselves have largely disappeared".  But digitization is about more than preservation, it’s a way of “democratizing the record.” People no longer have to travel to local areas to view the resources. “They can now go online anywhere in the world. That’s what it’s all about.”

Thursday, July 25, 2019

The Library of Congress 2019-2020 Recommended Formats Statement

     The Library of Congress has released the 2019-2020 Recommended Formats Statement. This version provides some valuable updates to the sections on Moving Image Works and Audio Works in particular. The goal of the Recommended Formats has always been to provide useful information furthering the shared goal of ensuring the preservation of and long-term access to creative works.  By providing up-to-date information about the file types, physical and technical characteristics and associated metadata which support these worthy goals, the Statement hopes to provide the building blocks upon which libraries can build their collections, now and for the future.

The Library remains committed to acquiring and preserving digital works and to providing whatever support it can to other similarly committed stakeholders.  "We shall continue to build our collections with their preservation and long-term access firmly in mind; and we shall continue to engage with others in the community in efforts such as the Recommended Formats Statement".  "And we shall continue to engage in an annual review process to ensure that it meets the needs of all stakeholders in the preservation and long-term access of creative works."

Friday, June 21, 2019

A new maturity model for digital preservation

A new maturity model for digital preservation. Jenny Mitcham. Digital Preservation Coalition blog. 20 June 2019.
     This blog discusses a new digital preservation maturity model, which is not yet available, that the DPC has been developing in a project with the UK Nuclear Decommissioning Authority (NDA). They wanted to "measure the NDA’s digital preservation maturity now. This is helpful to do at the start of any digital preservation journey, both to see where you are now, and to consider where you would like to be. The benchmarking tool could then be applied at the end of the project and at regular intervals further down the line to measure progress and review goals."  Digital preservation is usually implemented incrementally, so being able to map progress is incredibly valuable. The effort start with the maturity model created by Adrian Brown of the UK Parliamentary Archives, then make some substantial changes to it, such as changing the roadmaps, promoting the community element, and others. "Digital preservation is not a one-off activity and in an evolving field like this it is important to keep one eye on the horizon to see what is coming up and consider how to react."
The model, called the DPC Rapid Assessment Model, should be:
  • Applicable for all organizations
  • Applicable for all content of long-term value
  • Preservation strategy and solution agnostic
  • Based on existing good practice
  • Simple to understand and quick to apply

Thursday, May 30, 2019

How Archivists Saved Damaged WWII Film

How Archivists Saved Damaged WWII Film for 'The Cold Blue'. Chuck Thompson. Popular Mechanics. May 23, 2019.
    Shrinkage is the biggest problem with old film according to the article. To use original footage for a new movie, the archivist transferred 15 hours of 16mm film to 4K for the World War II documentary The Cold Blue. The film stock that was shot in 1943 has shrunk since it was created. Kodachrome maintains its vibrancy, but tends to lose pliability and moisture over time. All of the outtakes had shrunk to an average of 1.4 percent, which is "considered an immediate preservation risk. Once the film reaches that stage, it’s difficult to preserve the film photochemically because the pitch of the sprocket holes won’t seat accurately on the sprocket teeth of the printers, causing registration and stability issues on the new copy".  “Photochemical preservation” means preservation of a film by printing a new copy on new film stock and then developing and fixing the image  using traditional photographic processes.

The largest outtake reel had 36,880 frames, at 922 feet long, generating 2.6 TBs of data for 25 minutes of run time. The entire project generated just over 80 TBs of material. The preservation DPX files were wrapped with Bagger and written off  LTO-6 tape. The  three copies of the tapes: one in near-line storage, another is offline, and the last that is sent offsite to maintain geographical separation. The original film was returned to its 25-degree Fahrenheit vault, which slows down any deterioration that may continue.

Wednesday, May 29, 2019

Digital Data Storage Outlook 2019

Digital Data Storage Outlook 2019. SpectraLogic, May 2019. [Download]
    The fourth annual Data Storage Outlook report from Spectra Logic looks at the management, usage and storage of data. Some notes on data:
  • A 2018 IDC report predicts that the Global Datasphere will grow to 175 zettabytes (ZB) of digital data by 2025, though this report projects that much of this data will never be stored or will be retained for only a brief time.The amount of “stored” digital data is a smaller subset.
  • While there will be great demand for storage, a lack of advances in a particular technology, such as magnetic disk, means a greater use of other storage mediums such as flash and tape.
  • Increasing scale, level of collaboration, and diverse workflows are moving users from traditional file-based storage to object / web storage. Rather than attempting to force all storage into a single model, a sensible combination of both is the best approach.
  • There is a need for project assets to be shared across a team so they can be the basis for future work. An example is video footage that needs to be used by teams of editors, visual effects, audio editing, music scoring, color grading, and more. 
  • The lifetime of raw assets is effectively forever and they may be migrated across storage technologies many times

Tuesday, April 16, 2019

International Federation of Film Archives: Survey on Long-term Digital Storage and Preservation

Digital Statement PartV Survey on Long-term Digital Storage and Preservation. CĂ©line Ruivo and Anne Gant. FIAF Technical Commission, International Federation of Film Archives. April 2019.
    The sustainability of digital files and formats for long-term preservation has been a major concern in this field for almost two decades now. The FIAF Digital Preservation Principles, published in 2016, use the OAIS (Open Archival Information System) Model. Increasingly, film archives are publishing their own technical specifications online. Digitizing a film includes not only archiving a final result (the master), but also archiving the “raw files” which are uncompressed. Some of the results of the survey of 16 institutions who responded:
  • DPX is the main format used for preservation: 14 archives
  • TIFF is used as a second preservation format: 4 archives
  • Most use 4K resolution when they scan 35mm negatives for preservation 
  • Few have written technical specifications for the deposit of new digital acquisitions, which are mostly born-digital films.
  • Some archives use lossless compression for long-term preservation of a master to reduce storage space 
  • Some archives are considering implementing the FFV1 format this year for storing files. 
  • A checksum called framemd5 is integrated with the files MKV/FFV1.
  • The recording back to film of restorations is applied by 8 archives

    In terms of sound, digital formats are more variable than image formats, depending upon their
final distribution (cinema or TV broadcast). RAW formats are usually the same as the restored
  • Most of the archives use a tape system for long-term conservation. 
  • They generally wait for 2 generations to migrate their data to reduce the cost 
  • Access storage is by a server that allows direct access to the files.
  • Most of the archives store and manage their files in their own facility. 

     This initial survey of the current digital landscape shows there is much more work to be done to get a global view of digital film archiving, and to hear from more archives at all stages in the development of digital workflows. Some conclusions that can be drawn from the current set of responses:
  1. There is a stabilization in language and a conceptual clarity emerging about the stages of a digital workflow within archives. The terms are becoming clear and are recognized as necessary parts of daily archival practice. This will allow for better information exchange and better comparison of workflows.
  2. There are some choices which seem to be predominant, such as 10 bit Log DPX, for example, or the use of ProRes, LTO, .wav files, etc. It is helpful to detail the reasons why certain archives chose uncommon formats or processes.
  3. There are reasons behind each archive’s choices, which make sense at the given moment. But it would be useful to revisit this survey in 5-10 years (or sooner), and see how digital film practices and archiving are progressing.

Thursday, March 28, 2019

A Public Record at Risk: The Dire State of News Archiving in the Digital Age

A Public Record at Risk: The Dire State of News Archiving in the Digital Age. Sharon Ringel and Angela Woodall. Columbia Journalism Review. March 28, 2019.
     This research report looks at archiving practices and policies across newspapers, magazines, wire services, and digital-only news producers, to identify the current state of preserving content in an age of digital distribution. The majority of news outlets had not given any thought to even basic strategies for preserving their digital content, and not one was properly saving a holistic record of what it produces. Digitization and storage in a database are not alone adequate for long-term preservation. True archiving requires forethought and custodianship.

Staff equate digital backup and storage in Google Docs or content management systems with archiving, but they are not the same, and were unable to distinguish between backups and an archive. Backups are temporary copies for data recovery in case of damage or loss, while archiving refers to long-term preservation to ensure records will still be available even as technologies change in the future. They expect that other third-party organizations will have copies, such as the Internet Archive, Google, Twitter, Facebook, etc. Even if the IA has captured a website, what it collects may be limited to the first level of content and could exclude links, comments, personalized content, and different versions of a story.

There are news archiving technologies being developed; preserving digital content is not a technical challenge, but  a matter of priority and a decision that demonstrates intent. The findings should be a wake-up call to an industry which claims that democracy cannot be sustained without journalism to be a truth and accountability watchdog. "In an era where journalism is already under attack, managing its record and future are as important as ever."

The news organizations are interested in the present: “Who cares what existed 10 years ago? I need my thing now. And so, for better, for worse, if there was some value in [archiving], I probably got a better value out of the new thing.” In short, newsrooms are doing very little to nothing to preserve digital news. And none of the content creators interviewed made an effort to download and preserve the stories they produced.

Deletion is the opposite side of preservation and "news organizations, in certain cases, actively remove content from the public record", which raises questions about the role of journalism in society.

Some key findings of the news organizations participating in the research:

  • 19 of the 21 news organizations had no policies or practices for the preservation of their content. Of the 21 news organizations in our study, 19 were not taking any protective steps at all to archive their web output. The remaining two lacked formal preservation strategies.
  • Of the 21 news organizations, only six employed news archivists or librarians and their other responsibilities, took the focus away from the work required for preservation. 
  • None of the digital-only outlets had a news librarian or archivist on staff. 
  • None of the news organizations were preserving their social media publications. Only one was attempting to address the problem.
  • Digital-only news organizations were less aware than print publications of the importance of preservation. Very little is currently being done to preserve news.
  • Journalism’s primary focus is on “what is new” and preserving documentation of their reporting and what makes it accurate than preserving what ultimately gets published.
  • News apps are at high risk of being lost because these new technologies become obsolete before anyone thinks to save them. 
  • Partnerships among archivists, technologists, memory institutions, and news organizations will be vital to ensure future access to digitally distributed news content. Two questions to start with: What should be preserved? Who should preserve it?
  • To enact lasting change, opinion leaders in the field must introduce to staff and management that archiving ideas make sense  positions, it has advantages, and is compatible with their priorities.

News organizations should care about preserving news for the future just as they care about integrity, reliability, and informing the public not just in the present.

Wednesday, March 27, 2019

Next Phase OAIS Review

Next Phase OAIS Review. Barbara Sierman. Digital Preservation Seeds. March 24, 2019.
     The OAIS standard had its 5-year review in 2017, which resulted in over 200 suggestions for change. All the changes were discussed in the DAI Working Group and and most of them were accepted. The next step is for the updated OAIS standard to go to CCSDS and ISO for final approval. The main part of the changes concerned terminology, both clarification and consistency.

Some concepts got a more extensive description, while others have changed. The new OAIS standard shows that a transparent process can lead to a standard that reflects the current practices. The standards group will now have a final opportunity to decide whether all suggested changes are clear and implementable.

Friday, March 22, 2019

Datanomics Costs, Benefits, and Value of Research Data

Datanomics: the value of research data. Neil Beagrie. Jisc Invitational Workshop, Glasgo., February 2019.
 Slides from presentation on Datanomics Costs, Benefits, and Value of Research Data. His description of the slides: 

Twenty years ago format obsolescence was seen as the greatest long-term threat to digital information.  Arguably, experience to date has shown that funding and organisational challenges are perhaps more significant threats. I hope this presentation helps those grappling with these challenges and shows some key advances in how to use knowledge of costs, benefits and value to support long-term sustainability of digital data and services.

These are the slides from my keynote presentation to the joint Digital Preservation Coalition / Jisc workshop on Digital Assets and Digital Liabilities - the Value of Data held in Glasgow in February 2018. The slides summarise work over the last decade in the key areas of exploring costs, benefits and value for data. The slides posted here have additional slide notes and references to new publications since the workshop and some modifications such as removal of animations.

Some notes from the slides:
Costs. Keeping Research Data Safe (KRDS)  rules of thumb.
  1. Getting data in takes about Half of the lifetime costs, Preservation about a sixth, access about a third. 
  2. Preservation costs decline over time. 
  3. Fixed costs are significant for most data archives 
  4. Staff are the most significant Proportion of archive costs.

The KRDS Benefits Framework. Benefit from Curation of Research Data. Framework arranged on 3 dimensions.
  1. What is the outcome?
  2. When is it received?
  3. Who benefits?
Valuing Intangible Assets: Measuring value of intangible assets is much harder than for physical assets. We measure value of data services not just data alone

Economic Metrics Used
  • Investment Value Amount spent on producing the good or service
  • Use Value Amount spent by users to obtain the good or service
  • Contingent value: the amount users are “willing to accept” in return for giving up access
  • Efficiency gain: user estimates of time saved by using the Data Service resources
  • Return on investment: the estimated increase in return on investment due to the additional use
Must also look at the Costs of Inaction
  • Rate of loss of research data sets: 17% per annum
  • Partial information loss: 7% per annum
  • Rate of loss for web-links to data: c. 5.5% per annum
  • Access / Data requests fulfilled
  • Delay in elapsed time to fulfill data requests. Up to 6 months

Recommendations: Investigate the relative costs and benefits of curation levels, storage, or appraisal for what to keep.

“Five or six decades since the beginning of the Information Age, the namesake of this age, and the major asset driving today’s economy, is still not considered an accounting asset”

“Corporations typically exhibit greater discipline in tracking and accounting for their office furniture than their data”

Use cost data to look for trends, leverage our efforts, investigate the relative costs and benefits of curation levels, storage, and look towards hierarchical curation management.

Monday, March 11, 2019

Arctic World Archive receives more world treasures

Arctic World Archive receives more world treasures. Press release. 21. February 2019.
     Institutions and companies from around the world, including Utah Valley University, have deposited their digital content in the Arctic World Archive in Svalbard, Norway.  The Archive is a repository for world memory where the data will last for centuries.  The Archive is a collaboration between Piql, digital preservation specialists, and Store Norske Spitsbergen Kulkompani (SNSK), a state-owned Norwegian mining company based on Svalbard with vast experience and resources to build and maintain mountain vaults.

The top 10 items of cultural heritage, as nominated by the public was also stored away for the future. These items include famous religious texts, paintings, architectural designs, science breakthroughs and popular contemporary music. 

See also:

Saturday, March 09, 2019

What to Keep: A Jisc research data study

What to Keep: A Jisc research data study. Neil Beagrie. Jisc. February 2019.  [PDF]
     This study is about research data management and also appraisal and selection. This is an issue that has become more significant in recent years as volumes of data have grown. "The purpose is to provide new insights that will be useful to institutions, research funders, researchers, publishers, and Jisc on what research data to keep and why, the current position, and suggestions for improvement."

"Not all research data is the same: it is highly varied in terms of data level; data type; and origin. In addition, not all disciplines are in the same place or have identical needs."

"It is essential to consider not only What and Why to keep data, but for How Long to keep it, Where to keep it, and increasingly How to keep it in ways that reflects its potential value, cost, and available funding."

The study lists ten recommendations:
  1. Consider what is transferable between disciplines. Support adoption of effective practice via training, technologies, case studies, and guidance checklists.
  2. Bring communities together with workshops to evolve disciplinary norms 
  3. Harmonise funder requirements for research data where relevant
  4. Investigate the costs and benefits of curation levels, storage, or appraisal for what to keep f
  5. Implement the FAIR principles as appropriate for kept data.  
  6. Enhance data discoverability and identification of data by recording and to identifying data  generated by research projects in existing research databases.
  7. Require Data Access Statements in all published research articles where data is used as evidence, and encourage adoption of the Transparency and Openness Promotion (TOP) guidelines 
  8. Improve incentives and lower the barriers for data sharing.
  9. Increase publisher and funder collaborations around research data. 
  10. Improve communication on what research data management costs can be funded and by whom
Definition of research: "a process of investigation leading to new insights, effectively shared. It includes work of direct relevance to the needs of commerce, industry, and to the public and voluntary sectors; scholarship ...; the invention and generation of ideas, images, performances, artefacts including design, where these lead to new or substantially improved insights; and the use of existing knowledge in experimental development to produce new or substantially improved materials, devices, products and processes, including design and construction.”

Other notes from the study:
Costs of research data management seen as too high
Obsolescence of data format or software

The volume of research data and the number of new research data services and repositories is increasing.

"The high-level principles for research data management may be established but the everyday practice and procedures for the full-range of research data, what and why to keep, for how long, and where and how to keep it, are still evolving."

“All those engaged with research have a responsibility to ensure the data they gather and generate is properly managed, and made accessible, intelligible, assessable and usable by others unless there are legitimate reasons to the contrary. Access to research data therefore carries implications for cost and there will need to be trade-offs that reflect value for money and use.”

The Core Trustworthy Data Repositories Requirements notes four curation levels that can be performed by trusted repositories:
a. As deposited
b. Basic curation eg, brief checking, addition of basic metadata or documentation
c. Enhanced curation eg, conversion to new formats, enhancement of documentation
d. Data level curation (as in C above, with additional editing of data for accuracy)

UVU Cylinder Project

UVU Cylinder Project. Website. Utah Valley University. March 8, 2019.
     This website and the cylinder project was showcased at a digital preservation symposium. This site has an extensive searchable library of cylinders, and a Cylinder Player. The un-archived cylinders are in the process of being transcribed, metadata added, recordings being cleaned and then posted to the site.

Wednesday, March 06, 2019

Texas Digital Library Digital Preservation Services

Texas Digital Library Digital Preservation Services. Press release. Texas Digital Library, 5 March 2019. [PDF]
     The organization now offers Digital Preservation Services to its members to help Texas cultural heritage and scholarship stewards provide access for the long term through direct consulting, training, and workflow support that includes the right combination of technologies for your unique content needs. The content can be stored in multiple geographically-dispersed locations with fixity checking with Chronopolis and Amazon through the DuraCloud interface.

Tuesday, March 05, 2019

Accessible Archives Inc. Partners with Portico

Accessible Archives Inc. Partners with Portico. Press release. Accessible Archives, Inc. Mar 05, 2019.
     Accessible Archives Inc., an electronic publisher of full-text primary source historical databases, has partnered with Portico, in order to fully support the digital preservation of their content. With more content in a digital-only format, this will help the archival collections remain accessible Preservation will help to ensure the long-term availability of these resources for future scholars.

Monday, January 28, 2019

Introduction to Digital Preservation: What is Digital Preservation?

Introduction to Digital Preservation: What is Digital Preservation? Bodleian Libraries, Oxford LibGuides. Aug 28, 2018.    
     Digital preservation at Bodleian Libraries is defined as: "The formal activity of ensuring access to digital information for as long as necessary. It requires polices, planning, resource allocation (funds, time, people) and appropriate technologies and actions to ensure accessibility, accurate rendering and authenticity of digital objects.  A “lifecycle management” approach to digital preservation is taken, where action is done at regular intervals and future activity is planned. This includes policies and recommendations for appraising and selecting digital information to preserve, acknowledging resources are finite."

There are two different kinds of digital preservation:
  1.  Bit-level Preservation: a "very basic level of preservation of the digital object as it was submitted (literally preserving the bits forming a digital object)." It is a beginning step to the more complete set of digital preservation practices and processes that ensure the survival and usability of digital materials over time.
  2. Logical Preservation: The part of preservation management that ensures the continued usability of content by ensuring the existence of a usable manifestation the digital object. Sometimes  referred to as format preservation or active preservation, it includes
  • Understanding what digital materials are in the repository.
  • Identifying threats to the materials and planning actions to be taken for at-risk digital materials
  • Putting things into action 
Defining other terms:
  • "Digital curation involves maintaining, preserving and adding value to digital files throughout their lifecycle—not just at the end of their active lives. This active management of digital files reduces threats to their long-term value and mitigates the risk of digital obsolescence. Digital curation includes digital preservation, but the term adds the curatorial aspects of: selection, appraisal and ongoing enhancement of the object for reuse."It is commonly used in the science and social sciences for research data and is often being replaced with research data management, especially when referring to active digital files.
  • Digital archiving is often used interchangeably with digital preservation in archives. It has two main definitions used by computing personnel and archivists and librarians respectively. Recognize both definitions of the term and be aware of the audiences that use this term differently.
    1. The process of storage, backup and ongoing maintenance as opposed to strategies for long-term digital preservation
    2. the long-term storage, preservation and access to information that is "born digital" or for which the digital version is considered to be the primary archive 
  • Digital Stewardship, more commonly used in the US, "combines both curation and preservation—the active life of a digital asset and its continual preservation afterwards for long-term use. But this school of thought splits digital curation & digital preservation into two separate categories and then uses digital stewardship as the umbrella term."
The Bodleian Librarians consider digital preservation to be a "holistic term that includes aspects of digital curation and stewardship". They work with creators to organise and manage their digital objects to preserve them, to follow best practice for creation and managing active files so that they will be easier to manage and provide access to in the long-term.

See also: