Wednesday, April 01, 2020

Digital Asset Management: BYU Photo

Digital Asset Management: BYU Photo. Jaren Wilkey. BYU Behind the Scenes. March 31, 2020.
     A great article by the BYU University Photographer. Some notes and quotes are:
  • "Professional photographers are always responsible for their photos. Clients are not worried about backing up your photos, or how they are keyworded. If they lose the files they will always come back the creator and expect them to be perfectly preserved and readily accessible. You are 100% responsible for the content you create, period."  
  • ..."we are not just capturing photos for brochures or videos for Instagram, we are recording the history of our universities for our future generations. When you see yourself as a historian you should approach DAM with a completely different mindset. Digital asset management is important because it will ensure that your photos and videos will outlast you."
  • The truth of the matter is that every single method of storing digital photos is only temporary. Every method will fail at some point. 
  •  The cardinal rule of backups is simple: Multiple Copies, Multiple Locations.
  • More and more people are interested in using online storage for their archive. While I think that  online storage can be part of your strategy, it is important to ask yourself if that company will still exist in 50 years. For your long term archive it is best that you have a physical copy that you manage.
  • Whether you are a seasoned archiving expert or a novice, start where you stand and move forward.
It is important to note that there isn’t just one right way to approach digital asset management, it all depends on your specific goals and resources.  The 3 Basics of DAM - the goal is to make sure that your photos and videos are:
  1. Organized 
  2. Accessible 
  3. Archived

Monday, March 09, 2020

The Future of Past Email is PDF

The Future of Past Email is PDF. Chris Prom. Information and Data Manager (IDM). March 6, 2020.
     The article reports on a group of people who look at the question: How should governments, universities, business, and archives ensure the future generations can access and render email? A group looks at ways to capture, preserve, and render. It builds on an earlier report:
The Future of Email Archives: A Report from the Task Force on Technical Approaches for Email Archives. CLIR Publications. August 2018. [PDF
  • Email is an increasingly important part of the historical record, yet it is particularly difficult to preserve, putting future access to this vast resource at risk. It looks at what makes email archiving so complex and describes emerging strategies to meet the challenge.
  • Addressing the challenges will require commitment from stakeholders, as well as for tool support, testing, and development.
Some institutions preserve emails with MBOX, EML or PST; maintain or emulate old email environments; or transform them to XML. All these ways require a high level of technical support. Others simply store email archives.

The group suggests the PDF format could be used for email, though there are gaps and risks.
  • PDF includes data structures that could fully accommodate the diversity of email content and metadata. It is completely self-contained, PDF and designed to capture text and graphical content for archival purposes. 
  • Email-to-PDF provides a migration pathway for email messages independent of email applications and could preserve essential attributes of the message.
  •     A standardized application of PDF technology could provide source data, universally usable archival-quality renderings including attachments, and provenance metadata.
  • It could use existing standards and a diverse vendor community for preserving, searching and reusing email.
  • Using PDF could integrate with existing preservation tools for ingesting, storing, preserving and disseminating content from established repository systems already in use in government, academic, public, and corporate archives and libraries.
  • Since the PDF format is so widely implemented, there would already be a common understanding of best-practices for archiving email with PDF.
"In short, the "email archiving in PDF" concept seeks to build on widely implemented standards and technologies.  It would allow individuals and institutions a pathway to migrate email into the most widely used format for the distribution of text documents."

Currently there is a drawback for using PDF for email preservation: "attachments, metadata, context, and sometimes, even searchable text are missing. Simply "printing to PDF" fails to meet the specific needs of institutions archiving volumes of complex email messages, at least as currently implemented."  So how can "institutions ensure authenticity, completeness, privacy, security and other needs, especially when working with thousands or millions of messages, when most header metadata and attachments are lost in the conversion?"

The group identified and documented the essential characteristics and technical requirements for converting email into PDF, which will soon be published as a set of fundamental requirements for archiving email.

Tuesday, November 26, 2019

Jennifer Paustenbaugh - Digital Preservation: We have to get this right

From the time Jennifer Paustenbaugh was hired as the University Librarian at the Harold B. Lee Library, she was a strong proponent of digital preservation. With her sudden passing, it is worth remembering some of her insights on digital preservation. Jennifer will be greatly missed.

"We have to get this right." Jennifer Paustenbaugh. Digital Preservation. Harold B. Lee Library, Brigham Young University. June, 2016.
  • “We have to get this right. If we don't, then not much else that we’re doing in research libraries matters. If we don’t fully develop a sustainable digital preservation program, we could negatively impact whole areas of research, because materials created right now could just disappear. I think about gaps that exist in records because of man-made events and natural disasters. This could be a disaster of our own making.” 
  • "I truly believe that of all the things we’re doing in the library, this is the thing that has the potential to make the biggest difference to scholars 20 or 50 years from now. Much of the digital content that we are preserving will be gone forever if we don’t do this right. It’s a role that at once is formidable and humbling. And for most people, it will probably never be important until something that is vital to their research is just missing (and forever unavailable) from the historical record."

Thursday, September 26, 2019

National Film and Sound Archive of Australia to collect and preserve Australian video games

NFSA to collect and preserve Australian video games. National Film and Sound Archive of Australia. September 26 2019.
     The Archive announced they will start collecting Australian video games for archival preservation. "Today we welcome video games into our collection of more than 3 million items. The collection represents the cultural diversity and breadth of experience of all Australians, and it is constantly evolving just like our creative industries."  "It is essential that games be collected alongside other audiovisual media, to ensure their continued preservation and access."

An initial list of eight games has been selected for preservation. They span almost 40 years of gaming history, from 1982 to 2019, across all platforms, from cassette tape to mobile devices and virtual reality headsets. This will provide "an overview of the evolution of the medium, as well as an opportunity to identify the archival challenges in preserving the different technologies employed - both software and hardware."

The Archive will work with game developers to develop solutions for long-term preservation of their work which will benefit future generations. The initial selection will "allow the NFSA to explore what components and documentation must be collected in order to paint a complete picture of the game’s creative process from concept to finished product. It will also identify challenges around software and hardware obsolescence, long-term storage and access, rights and proprietorial platforms, etc., to inform the ongoing preservation strategy. Following this phase, Australian games will be collected on an ongoing basis."

Friday, August 30, 2019

Saskatchewan Archives digitizing 560,000 newspaper pages from Second World War years

Saskatchewan Archives digitizing 560,000 newspaper pages from Second World War years. Arthur White-Crummey. Regina Leader-Post. August 28, 2019.
     This is an article about the digital preservation laboratory of the Provincial Archives of Saskatchewan and the project of scanning and auditing a collection of weekly newspapers from 160 Saskatchewan communities. The Archives is marking the 80th anniversary of Canada’s entry into the Second World War by digitizing its trove of community newspapers covering the years 1939 to 1945. There are 560,000 pages for that period alone, just part of a massive collection of 10 million pages extending from 1878 to the 1960s.

The archivist views digitization "as a way of ensuring the survival of those priceless records. The newspapers themselves have largely disappeared".  But digitization is about more than preservation, it’s a way of “democratizing the record.” People no longer have to travel to local areas to view the resources. “They can now go online anywhere in the world. That’s what it’s all about.”

Thursday, July 25, 2019

The Library of Congress 2019-2020 Recommended Formats Statement

     The Library of Congress has released the 2019-2020 Recommended Formats Statement. This version provides some valuable updates to the sections on Moving Image Works and Audio Works in particular. The goal of the Recommended Formats has always been to provide useful information furthering the shared goal of ensuring the preservation of and long-term access to creative works.  By providing up-to-date information about the file types, physical and technical characteristics and associated metadata which support these worthy goals, the Statement hopes to provide the building blocks upon which libraries can build their collections, now and for the future.

The Library remains committed to acquiring and preserving digital works and to providing whatever support it can to other similarly committed stakeholders.  "We shall continue to build our collections with their preservation and long-term access firmly in mind; and we shall continue to engage with others in the community in efforts such as the Recommended Formats Statement".  "And we shall continue to engage in an annual review process to ensure that it meets the needs of all stakeholders in the preservation and long-term access of creative works."

Friday, June 21, 2019

A new maturity model for digital preservation

A new maturity model for digital preservation. Jenny Mitcham. Digital Preservation Coalition blog. 20 June 2019.
     This blog discusses a new digital preservation maturity model, which is not yet available, that the DPC has been developing in a project with the UK Nuclear Decommissioning Authority (NDA). They wanted to "measure the NDA’s digital preservation maturity now. This is helpful to do at the start of any digital preservation journey, both to see where you are now, and to consider where you would like to be. The benchmarking tool could then be applied at the end of the project and at regular intervals further down the line to measure progress and review goals."  Digital preservation is usually implemented incrementally, so being able to map progress is incredibly valuable. The effort start with the maturity model created by Adrian Brown of the UK Parliamentary Archives, then make some substantial changes to it, such as changing the roadmaps, promoting the community element, and others. "Digital preservation is not a one-off activity and in an evolving field like this it is important to keep one eye on the horizon to see what is coming up and consider how to react."
The model, called the DPC Rapid Assessment Model, should be:
  • Applicable for all organizations
  • Applicable for all content of long-term value
  • Preservation strategy and solution agnostic
  • Based on existing good practice
  • Simple to understand and quick to apply

Thursday, May 30, 2019

How Archivists Saved Damaged WWII Film

How Archivists Saved Damaged WWII Film for 'The Cold Blue'. Chuck Thompson. Popular Mechanics. May 23, 2019.
    Shrinkage is the biggest problem with old film according to the article. To use original footage for a new movie, the archivist transferred 15 hours of 16mm film to 4K for the World War II documentary The Cold Blue. The film stock that was shot in 1943 has shrunk since it was created. Kodachrome maintains its vibrancy, but tends to lose pliability and moisture over time. All of the outtakes had shrunk to an average of 1.4 percent, which is "considered an immediate preservation risk. Once the film reaches that stage, it’s difficult to preserve the film photochemically because the pitch of the sprocket holes won’t seat accurately on the sprocket teeth of the printers, causing registration and stability issues on the new copy".  “Photochemical preservation” means preservation of a film by printing a new copy on new film stock and then developing and fixing the image  using traditional photographic processes.

The largest outtake reel had 36,880 frames, at 922 feet long, generating 2.6 TBs of data for 25 minutes of run time. The entire project generated just over 80 TBs of material. The preservation DPX files were wrapped with Bagger and written off  LTO-6 tape. The  three copies of the tapes: one in near-line storage, another is offline, and the last that is sent offsite to maintain geographical separation. The original film was returned to its 25-degree Fahrenheit vault, which slows down any deterioration that may continue.

Wednesday, May 29, 2019

Digital Data Storage Outlook 2019

Digital Data Storage Outlook 2019. SpectraLogic, May 2019. [Download]
    The fourth annual Data Storage Outlook report from Spectra Logic looks at the management, usage and storage of data. Some notes on data:
  • A 2018 IDC report predicts that the Global Datasphere will grow to 175 zettabytes (ZB) of digital data by 2025, though this report projects that much of this data will never be stored or will be retained for only a brief time.The amount of “stored” digital data is a smaller subset.
  • While there will be great demand for storage, a lack of advances in a particular technology, such as magnetic disk, means a greater use of other storage mediums such as flash and tape.
  • Increasing scale, level of collaboration, and diverse workflows are moving users from traditional file-based storage to object / web storage. Rather than attempting to force all storage into a single model, a sensible combination of both is the best approach.
  • There is a need for project assets to be shared across a team so they can be the basis for future work. An example is video footage that needs to be used by teams of editors, visual effects, audio editing, music scoring, color grading, and more. 
  • The lifetime of raw assets is effectively forever and they may be migrated across storage technologies many times

Tuesday, April 16, 2019

International Federation of Film Archives: Survey on Long-term Digital Storage and Preservation

Digital Statement PartV Survey on Long-term Digital Storage and Preservation. CĂ©line Ruivo and Anne Gant. FIAF Technical Commission, International Federation of Film Archives. April 2019.
    The sustainability of digital files and formats for long-term preservation has been a major concern in this field for almost two decades now. The FIAF Digital Preservation Principles, published in 2016, use the OAIS (Open Archival Information System) Model. Increasingly, film archives are publishing their own technical specifications online. Digitizing a film includes not only archiving a final result (the master), but also archiving the “raw files” which are uncompressed. Some of the results of the survey of 16 institutions who responded:
  • DPX is the main format used for preservation: 14 archives
  • TIFF is used as a second preservation format: 4 archives
  • Most use 4K resolution when they scan 35mm negatives for preservation 
  • Few have written technical specifications for the deposit of new digital acquisitions, which are mostly born-digital films.
  • Some archives use lossless compression for long-term preservation of a master to reduce storage space 
  • Some archives are considering implementing the FFV1 format this year for storing files. 
  • A checksum called framemd5 is integrated with the files MKV/FFV1.
  • The recording back to film of restorations is applied by 8 archives

    In terms of sound, digital formats are more variable than image formats, depending upon their final distribution (cinema or TV broadcast). RAW formats are usually the same as the restored files.
  • Most of the archives use a tape system for long-term conservation. 
  • They generally wait for 2 generations to migrate their data to reduce the cost 
  • Access storage is by a server that allows direct access to the files.
  • Most of the archives store and manage their files in their own facility. 

     This initial survey of the current digital landscape shows there is much more work to be done to get a global view of digital film archiving, and to hear from more archives at all stages in the development of digital workflows. Some conclusions that can be drawn from the current set of responses:
  1. There is a stabilization in language and a conceptual clarity emerging about the stages of a digital workflow within archives. The terms are becoming clear and are recognized as necessary parts of daily archival practice. This will allow for better information exchange and better comparison of workflows.
  2. There are some choices which seem to be predominant, such as 10 bit Log DPX, for example, or the use of ProRes, LTO, .wav files, etc. It is helpful to detail the reasons why certain archives chose uncommon formats or processes.
  3. There are reasons behind each archive’s choices, which make sense at the given moment. But it would be useful to revisit this survey in 5-10 years (or sooner), and see how digital film practices and archiving are progressing.

Thursday, March 28, 2019

A Public Record at Risk: The Dire State of News Archiving in the Digital Age

A Public Record at Risk: The Dire State of News Archiving in the Digital Age. Sharon Ringel and Angela Woodall. Columbia Journalism Review. March 28, 2019.
     This research report looks at archiving practices and policies across newspapers, magazines, wire services, and digital-only news producers, to identify the current state of preserving content in an age of digital distribution. The majority of news outlets had not given any thought to even basic strategies for preserving their digital content, and not one was properly saving a holistic record of what it produces. Digitization and storage in a database are not alone adequate for long-term preservation. True archiving requires forethought and custodianship.

Staff equate digital backup and storage in Google Docs or content management systems with archiving, but they are not the same, and were unable to distinguish between backups and an archive. Backups are temporary copies for data recovery in case of damage or loss, while archiving refers to long-term preservation to ensure records will still be available even as technologies change in the future. They expect that other third-party organizations will have copies, such as the Internet Archive, Google, Twitter, Facebook, etc. Even if the IA has captured a website, what it collects may be limited to the first level of content and could exclude links, comments, personalized content, and different versions of a story.

There are news archiving technologies being developed; preserving digital content is not a technical challenge, but  a matter of priority and a decision that demonstrates intent. The findings should be a wake-up call to an industry which claims that democracy cannot be sustained without journalism to be a truth and accountability watchdog. "In an era where journalism is already under attack, managing its record and future are as important as ever."

The news organizations are interested in the present: “Who cares what existed 10 years ago? I need my thing now. And so, for better, for worse, if there was some value in [archiving], I probably got a better value out of the new thing.” In short, newsrooms are doing very little to nothing to preserve digital news. And none of the content creators interviewed made an effort to download and preserve the stories they produced.

Deletion is the opposite side of preservation and "news organizations, in certain cases, actively remove content from the public record", which raises questions about the role of journalism in society.

Some key findings of the news organizations participating in the research:

  • 19 of the 21 news organizations had no policies or practices for the preservation of their content. Of the 21 news organizations in our study, 19 were not taking any protective steps at all to archive their web output. The remaining two lacked formal preservation strategies.
  • Of the 21 news organizations, only six employed news archivists or librarians and their other responsibilities, took the focus away from the work required for preservation. 
  • None of the digital-only outlets had a news librarian or archivist on staff. 
  • None of the news organizations were preserving their social media publications. Only one was attempting to address the problem.
  • Digital-only news organizations were less aware than print publications of the importance of preservation. Very little is currently being done to preserve news.
  • Journalism’s primary focus is on “what is new” and preserving documentation of their reporting and what makes it accurate than preserving what ultimately gets published.
  • News apps are at high risk of being lost because these new technologies become obsolete before anyone thinks to save them. 
  • Partnerships among archivists, technologists, memory institutions, and news organizations will be vital to ensure future access to digitally distributed news content. Two questions to start with: What should be preserved? Who should preserve it?
  • To enact lasting change, opinion leaders in the field must introduce to staff and management that archiving ideas make sense  positions, it has advantages, and is compatible with their priorities.

News organizations should care about preserving news for the future just as they care about integrity, reliability, and informing the public not just in the present.

Wednesday, March 27, 2019

Next Phase OAIS Review

Next Phase OAIS Review. Barbara Sierman. Digital Preservation Seeds. March 24, 2019.
     The OAIS standard had its 5-year review in 2017, which resulted in over 200 suggestions for change. All the changes were discussed in the DAI Working Group and and most of them were accepted. The next step is for the updated OAIS standard to go to CCSDS and ISO for final approval. The main part of the changes concerned terminology, both clarification and consistency.

Some concepts got a more extensive description, while others have changed. The new OAIS standard shows that a transparent process can lead to a standard that reflects the current practices. The standards group will now have a final opportunity to decide whether all suggested changes are clear and implementable.

Friday, March 22, 2019

Datanomics Costs, Benefits, and Value of Research Data

Datanomics: the value of research data. Neil Beagrie. Jisc Invitational Workshop, Glasgo., February 2019.
 Slides from presentation on Datanomics Costs, Benefits, and Value of Research Data. His description of the slides: 

Twenty years ago format obsolescence was seen as the greatest long-term threat to digital information.  Arguably, experience to date has shown that funding and organisational challenges are perhaps more significant threats. I hope this presentation helps those grappling with these challenges and shows some key advances in how to use knowledge of costs, benefits and value to support long-term sustainability of digital data and services.

These are the slides from my keynote presentation to the joint Digital Preservation Coalition / Jisc workshop on Digital Assets and Digital Liabilities - the Value of Data held in Glasgow in February 2018. The slides summarise work over the last decade in the key areas of exploring costs, benefits and value for data. The slides posted here have additional slide notes and references to new publications since the workshop and some modifications such as removal of animations.

Some notes from the slides:
Costs. Keeping Research Data Safe (KRDS)  rules of thumb.
  1. Getting data in takes about Half of the lifetime costs, Preservation about a sixth, access about a third. 
  2. Preservation costs decline over time. 
  3. Fixed costs are significant for most data archives 
  4. Staff are the most significant Proportion of archive costs.

The KRDS Benefits Framework. Benefit from Curation of Research Data. Framework arranged on 3 dimensions.
  1. What is the outcome?
  2. When is it received?
  3. Who benefits?
Valuing Intangible Assets: Measuring value of intangible assets is much harder than for physical assets. We measure value of data services not just data alone

Economic Metrics Used
  • Investment Value Amount spent on producing the good or service
  • Use Value Amount spent by users to obtain the good or service
  • Contingent value: the amount users are “willing to accept” in return for giving up access
  • Efficiency gain: user estimates of time saved by using the Data Service resources
  • Return on investment: the estimated increase in return on investment due to the additional use
Must also look at the Costs of Inaction
  • Rate of loss of research data sets: 17% per annum
  • Partial information loss: 7% per annum
  • Rate of loss for web-links to data: c. 5.5% per annum
  • Access / Data requests fulfilled
  • Delay in elapsed time to fulfill data requests. Up to 6 months

Recommendations: Investigate the relative costs and benefits of curation levels, storage, or appraisal for what to keep.

“Five or six decades since the beginning of the Information Age, the namesake of this age, and the major asset driving today’s economy, is still not considered an accounting asset”

“Corporations typically exhibit greater discipline in tracking and accounting for their office furniture than their data”

Use cost data to look for trends, leverage our efforts, investigate the relative costs and benefits of curation levels, storage, and look towards hierarchical curation management.

Monday, March 11, 2019

Arctic World Archive receives more world treasures

Arctic World Archive receives more world treasures. Press release. 21. February 2019.
     Institutions and companies from around the world, including Utah Valley University, have deposited their digital content in the Arctic World Archive in Svalbard, Norway.  The Archive is a repository for world memory where the data will last for centuries.  The Archive is a collaboration between Piql, digital preservation specialists, and Store Norske Spitsbergen Kulkompani (SNSK), a state-owned Norwegian mining company based on Svalbard with vast experience and resources to build and maintain mountain vaults.

The top 10 items of cultural heritage, as nominated by the public was also stored away for the future. These items include famous religious texts, paintings, architectural designs, science breakthroughs and popular contemporary music. 

See also:

Saturday, March 09, 2019

What to Keep: A Jisc research data study

What to Keep: A Jisc research data study. Neil Beagrie. Jisc. February 2019.  [PDF]
     This study is about research data management and also appraisal and selection. This is an issue that has become more significant in recent years as volumes of data have grown. "The purpose is to provide new insights that will be useful to institutions, research funders, researchers, publishers, and Jisc on what research data to keep and why, the current position, and suggestions for improvement."

"Not all research data is the same: it is highly varied in terms of data level; data type; and origin. In addition, not all disciplines are in the same place or have identical needs."

"It is essential to consider not only What and Why to keep data, but for How Long to keep it, Where to keep it, and increasingly How to keep it in ways that reflects its potential value, cost, and available funding."

The study lists ten recommendations:
  1. Consider what is transferable between disciplines. Support adoption of effective practice via training, technologies, case studies, and guidance checklists.
  2. Bring communities together with workshops to evolve disciplinary norms 
  3. Harmonise funder requirements for research data where relevant
  4. Investigate the costs and benefits of curation levels, storage, or appraisal for what to keep f
  5. Implement the FAIR principles as appropriate for kept data.  
  6. Enhance data discoverability and identification of data by recording and to identifying data  generated by research projects in existing research databases.
  7. Require Data Access Statements in all published research articles where data is used as evidence, and encourage adoption of the Transparency and Openness Promotion (TOP) guidelines 
  8. Improve incentives and lower the barriers for data sharing.
  9. Increase publisher and funder collaborations around research data. 
  10. Improve communication on what research data management costs can be funded and by whom
Definition of research: "a process of investigation leading to new insights, effectively shared. It includes work of direct relevance to the needs of commerce, industry, and to the public and voluntary sectors; scholarship ...; the invention and generation of ideas, images, performances, artefacts including design, where these lead to new or substantially improved insights; and the use of existing knowledge in experimental development to produce new or substantially improved materials, devices, products and processes, including design and construction.”

Other notes from the study:
Costs of research data management seen as too high
Obsolescence of data format or software

The volume of research data and the number of new research data services and repositories is increasing.

"The high-level principles for research data management may be established but the everyday practice and procedures for the full-range of research data, what and why to keep, for how long, and where and how to keep it, are still evolving."

“All those engaged with research have a responsibility to ensure the data they gather and generate is properly managed, and made accessible, intelligible, assessable and usable by others unless there are legitimate reasons to the contrary. Access to research data therefore carries implications for cost and there will need to be trade-offs that reflect value for money and use.”

The Core Trustworthy Data Repositories Requirements notes four curation levels that can be performed by trusted repositories:
a. As deposited
b. Basic curation eg, brief checking, addition of basic metadata or documentation
c. Enhanced curation eg, conversion to new formats, enhancement of documentation
d. Data level curation (as in C above, with additional editing of data for accuracy)

UVU Cylinder Project

UVU Cylinder Project. Website. Utah Valley University. March 8, 2019.
     This website and the cylinder project was showcased at a digital preservation symposium. This site has an extensive searchable library of cylinders, and a Cylinder Player. The un-archived cylinders are in the process of being transcribed, metadata added, recordings being cleaned and then posted to the site.

Wednesday, March 06, 2019

Texas Digital Library Digital Preservation Services

Texas Digital Library Digital Preservation Services. Press release. Texas Digital Library, 5 March 2019. [PDF]
     The organization now offers Digital Preservation Services to its members to help Texas cultural heritage and scholarship stewards provide access for the long term through direct consulting, training, and workflow support that includes the right combination of technologies for your unique content needs. The content can be stored in multiple geographically-dispersed locations with fixity checking with Chronopolis and Amazon through the DuraCloud interface.

Tuesday, March 05, 2019

Accessible Archives Inc. Partners with Portico

Accessible Archives Inc. Partners with Portico. Press release. Accessible Archives, Inc. Mar 05, 2019.
     Accessible Archives Inc., an electronic publisher of full-text primary source historical databases, has partnered with Portico, in order to fully support the digital preservation of their content. With more content in a digital-only format, this will help the archival collections remain accessible Preservation will help to ensure the long-term availability of these resources for future scholars.

Monday, January 28, 2019

Introduction to Digital Preservation: What is Digital Preservation?

Introduction to Digital Preservation: What is Digital Preservation? Bodleian Libraries, Oxford LibGuides. Aug 28, 2018.    
     Digital preservation at Bodleian Libraries is defined as: "The formal activity of ensuring access to digital information for as long as necessary. It requires polices, planning, resource allocation (funds, time, people) and appropriate technologies and actions to ensure accessibility, accurate rendering and authenticity of digital objects.  A “lifecycle management” approach to digital preservation is taken, where action is done at regular intervals and future activity is planned. This includes policies and recommendations for appraising and selecting digital information to preserve, acknowledging resources are finite."

There are two different kinds of digital preservation:
  1.  Bit-level Preservation: a "very basic level of preservation of the digital object as it was submitted (literally preserving the bits forming a digital object)." It is a beginning step to the more complete set of digital preservation practices and processes that ensure the survival and usability of digital materials over time.
  2. Logical Preservation: The part of preservation management that ensures the continued usability of content by ensuring the existence of a usable manifestation the digital object. Sometimes  referred to as format preservation or active preservation, it includes
  • Understanding what digital materials are in the repository.
  • Identifying threats to the materials and planning actions to be taken for at-risk digital materials
  • Putting things into action 
Defining other terms:
  • "Digital curation involves maintaining, preserving and adding value to digital files throughout their lifecycle—not just at the end of their active lives. This active management of digital files reduces threats to their long-term value and mitigates the risk of digital obsolescence. Digital curation includes digital preservation, but the term adds the curatorial aspects of: selection, appraisal and ongoing enhancement of the object for reuse."It is commonly used in the science and social sciences for research data and is often being replaced with research data management, especially when referring to active digital files.
  • Digital archiving is often used interchangeably with digital preservation in archives. It has two main definitions used by computing personnel and archivists and librarians respectively. Recognize both definitions of the term and be aware of the audiences that use this term differently.
    1. The process of storage, backup and ongoing maintenance as opposed to strategies for long-term digital preservation
    2. the long-term storage, preservation and access to information that is "born digital" or for which the digital version is considered to be the primary archive 
  • Digital Stewardship, more commonly used in the US, "combines both curation and preservation—the active life of a digital asset and its continual preservation afterwards for long-term use. But this school of thought splits digital curation & digital preservation into two separate categories and then uses digital stewardship as the umbrella term."
The Bodleian Librarians consider digital preservation to be a "holistic term that includes aspects of digital curation and stewardship". They work with creators to organise and manage their digital objects to preserve them, to follow best practice for creation and managing active files so that they will be easier to manage and provide access to in the long-term.

See also:

Wednesday, December 19, 2018

Nothing succeeds like success: An approach for evaluating digital preservation efficacy

Nothing succeeds like success: An approach for evaluating digital preservation efficacy. Stephen Abrams. Paper, iPres 2018. [PDF]
     Digital preservation encompasses the theory and practice ensuring purposeful future use of digital resources. But how can one tell whether it has been effective or not? The evaluation of the effectiveness of preservation actions has two dimensions: trustworthiness of managerial programs and systems; and successful use of managed resources.
  • Preservation should be viewed as facilitating meaningful communication across time and cultural distance.
  • The preservation field has not yet matured to a point of having established metrics for evaluating the success or failure of its outcomes
  • We should be asking what measures can be used to evaluate the success of the digital preservation efforts
  • A proper model for digital preservation should be viewed as human communication rather than data management and evaluating success through operational, not just descriptive evidence. 
  • The goal of that communication is to transfer an intangible but intentional unit of meaning from the producer to a consumer across temporal, technical, and cultural distance
  • Like any formal discipline, digital preservation should be viewed as a complex of actors, policies, technologies, and practices; its maturity is dependent on its capacity for reflective self-evaluation
  • There are two primary measures of preservation efficacy: trustworthiness of managerial systems and programs; and successful use of preserved resources.
  • Because of the open-ended time horizon of preservation commitments, preservation success should be understood as a provisional, rather than absolute value. One can’t make categorical assertions beyond the ever-forward-moving point of now, since the consequences of the future cannot be fully anticipated
  • A model of the digital preservation enterprise provides a way to analyze, explicate, and understand the domain. It can lead to new criteria and metrics for evaluating success. It also will form the basis for rational prioritization of strategic goals, allocation of programmatic resources, and transparent accountability to stakeholder communities.

Friday, December 14, 2018

In-House Digitization with the Lossless FFV1 Codec At the University of Notre Dame Archives. AMIA Poster

In-House Digitization with the Lossless FFV1 Codec At the University of Notre Dame Archives. Erik Dix and Angela Fritz, University of Notre Dame Archives. AMIA 2018. Poster. [pptx].
     An interesting poster at AMIA which shows their digitization workflow and processing steps from accessioning to preservation system.
WHY FFV1 as a codec for Digital Preservation Masters?
1. Lossless compression (no quality loss)
2. A Standard Definition FFV1 file is ca. 46 % of the size of the uncompressed file.
    A High Definition FFV1 file is ca. 57 % of the size of the uncompressed file.
3. FFV1 is part of the FFmpeg project and open source
4. It is safe for long term preservation.
5. Encoding into FFV1 can be done with low cost Windows PCs.
6. The video is captured in FFV1 in real time.
7. Standard definition FFV1 files can be played with the VLC media player

Digitization Workflow

Accessioning as Processing:
  • Archives conducts a preliminary inventory, assigns collection code,  creates CMS record 
  • AV materials transferred to AV Archivist for a preservation and digitization assessment
  • Descriptive and technical metadata gathered
  • Analog materials reorganized and stabilized for long-term storage.
Basic Metadata Creation:
  • AV Archivist creates item–level metadata
  • Descriptive and technical metadata promotes access and discoverability
  • Descriptive metadata added to finding aid and uploaded to the Archives, the IR, and ILS
Inspection & Prep of A-V materials:
  • Only requested AV items or at-risk items will be digitized 
  • Videotapes often require baking or splicing
VCRS without SDI output:
  • The digitization capture card uses SDI [Serial Digital Interface] connections. 
  • VHS, Betamax, and older professional formats, e.g. 1” type C, U-matic don’t have SDI outputs. 
  • A DPS-575 frame synchronizer is used to create a SDI signal from the S-video or output of these items
  • Basic color correction is done at this step if necessary. 
  • The SDI signal from the frame synchronizer is then split in two to feed a Windows PC for the creation of the FFV1 preservation file and to feed a Mac computer to create an Apple ProRes 422 mezzanine file.
VCRs with SDI output:
  • They have VCRs for the DV tape family from Mini DV up to DVCPro HD, DVCam, and HDV, as well as the Betacam tape family from Betcam to HDcam that can output an SDI or HD-SDI signal. 
  • The signal is also split in two to simultaneously create a FFV1 file on a Windows Pc and an Apple ProRes 422 file on a MAC.
Digital Preservation System:
  • Use an LTO tape library for the storage of our digitized files. 
  • Currently, the Archives is evaluating digital preservation systems for implementation. 
  • Archives capabilities will be expanded to provide digital preservation micro-services to ensure continued access to its digital collections.

Thursday, December 13, 2018

Why Is the Digital Preservation Network Disbanding? Lessons from organizational challenges

Why Is the Digital Preservation Network Disbanding? Roger C. Schonfeld. The Scholarly Kitchen; Society for Scholarly Publishing. Dec 13, 2018.
     "The long term stewardship of digital objects and collections through digital preservation is an essential imperative for scholarship and society. Yet its value is intangible and its rewards are deferred. It falls on organizations to invest in preservation, often less out of a sense of anticipated exclusive returns and more out of a sense of contributing to a community mission." It is essential that we discuss the lessons we can learn from organizational challenges.

DPN was a commitment to replicate the data of research and scholarship across diverse environments and to enable existing preservation capacity. It offered an elegant technical solution but the product offering was never as clear as it could have been, and ultimately could not be sustained. Most DPN members did not use the network services and membership declined. Some patterns emerged: 
  • Not every storage need requires a preservation solution, and the members were "in some cases, unsuccessful in distinguishing the added value of a preservation solution from cloud storage."
  • Many library systems were not originally prepared to support DPN’s ingest workflow. For a number of members, the content to be preserved was spread across servers and systems, often with limited curatorial control. 
  • The product definition took too long to emerge and the value proposition was not uniformly understood.
  • DPN’s pricing model did not generate the revenue that DPN’s model anticipated. 
  • Some libraries signed up more out of courtesy or community citizenship than commitment.
  • Membership models are ill-suited to product organizations and marketplace competition.  
There are broader implications in the disbandment of DPN. The article states that  DPN will not be the last closure, merger, or other reorganization. "It seems clear that we are in a period of instability for collaborative library community efforts and more major changes are surely on the horizon."

Wednesday, December 12, 2018

Preservation of AV Materials in Manuscript Collections. Training for AV format identification and risk assessment. Actions to take

Preservation of AV Materials in Manuscript Collections; Internal Training.  Ben Harry. Brigham Young University. November 2018.
     The presentation is not yet available on the internet. Some notes from the training:
“There is now consensus among audiovisual archives internationally that we will not be able to support large-scale digitisation of magnetic media in the very near future. Tape that is not digitised by 2025 will in most cases be lost.”, Oct. 2018

The problem with AV is Fragility:
  • Playback equipment is disappearing
  • Knowledgeable experts are disappearing
  • Materials breaking down
  • Untrained handling easily destroys materials
The solution to the fragility is to address materials in a timely manner:
  • Priority and Speed and Efficiency
  • Train transfer operators
  • Untrained handling easily destroys materials
A Challenge of AV is Neglect:
  • Unable to describe AV Content adequately in finding aids or catalogs. 
  • Requires certain level of specific knowledge of formats and physical carriers.
  • Requires machine to read information that may not be available
  • Time-consuming process for little reward
  • Expensive, unstructured, uncoordinated
To overcome the challenge:
  • Digitize material for description in basic processing
  • Time-consuming process for little reward
  • MUST be a lean process to minimize the effect upon processing
Audio-video preservation requires a certain level of specific knowledge. Staff must be trained to recognize and report AV Formats. Also, it is important to have risk assessment guidelines to help make informed decisions. Coordinate efforts and resources to reduce confusion, prioritize and set goals, unify our proposals for equipment and man power.

Actions to take:
  • Prioritize Formats for Migration / Reformatting
  • Maintain Transparent Records on Preservation and Access
  • Link Preservation and Access (one does not happen without the other)
  • Provide Curators with AV Assessment tools
  • Organize a Queue System to keep things equitable (what about 12 items per month, per curator? Adjust as Necessary)
  • Create Digital File Naming guidelines
  • Establish Access and Preservation format standards for AV materials:
 For Access and Preservation, the following standards will be used:

Audio Preservation
  • Preservation Format:  PCM / wav 96 kHZ sampling   24-bit depth. 1 GB/Hour
  • Access Copy: mp3.  Music: 256 kbps. Voice: 192 Kbps.

Video Preservation: Standard Def
  • Preservation Format: ffv1 / mkv 720 x 486. 33 GB/Hour  
  • Access Copy: H.264 / mp4

Video Preservation: Hi Def
  • Preservation Format: ffv1 / mkv Native: 1080i / 1080p. 100 GB/Hour?  
  • Access Copy: H.264 / mp4

Film Preservation
  • Preservation Format: RGB ffv1 / mkv 1080i scan (MPS capability ceiling). 100 GB/Hour?  
  • Access Copy: H.264 / mp4

Archive and delivery methods:
  • Preservation: Rosetta
  • Access: various options are available. 

Monday, December 10, 2018

A Preservationist’s Guide to the DMCA Exemption for Software Preservation

A Preservationist’s Guide to the DMCA Exemption for Software Preservation. Kee Young Lee and Kendra Albert. Software Preservation Network and the Cyberlaw Clinic @ the Berkman Klein Center. December 10, 2018.  [PDF]
     "The Library of Congress recently adopted several exemptions to the Digital Millennium Copyright Act (DMCA) provision prohibiting circumvention of technological measures that control access to copyrighted works. The exemptions went into effect on October 28, 2018 and last until October 28th, 2021. This guide is intended to help preservationists determine whether their activities fall under the new exemption."  The Software Preservation Network has obtained temporary exemptions which remove the legal liability for circumventing technological protection measures for preserving the software or resulting files, provided that certain conditions are met. These exemptions do not remove legal liability for copyright infringement of the underlying software itself.

The guide provides excellent information on the issues and the exemptions. The exemptions are  generally directed to preservation activities  by libraries, archives, and museums, but there are five criteria required in order to claim the exemption. The library, archive, or museum must:
  1. Make its collections open to the public or routinely available to unaffiliated outside researchers.
  2. Ensure that its collections are composed of lawfully acquired or licensed materials.
  3. Implement reasonable digital security measures for preservation activities.
  4. Have a public service mission.
  5. Have trained staff or volunteers that provide services normally provided by libraries, archives, or museums
In addition, there are requirements for using the preserved software:
  • The computer program must have been lawfully acquired.
  • The software must no longer be reasonably available in the commercial marketplace.
  • The sole purpose of the circumvention activity must be for lawful preservation of the computer program or digital materials that are dependent on a computer program.
  • The computer programs cannot be used for commercial advantage.
  • Use of the exemptions can only be for non-infringing uses of the software.
  • Copies of the computer programs cannot be made available outside of the physical premises of the library, archive, or museum.
These exemptions are only for three years, so evidence of software preservation activities will help to renew the exemptions.

The Guide also includes a DMCA Exemption for Software Preservation Checklist.

Saturday, December 08, 2018

Make The Case for Digital Preservation in Your Organisation

Make The Case for Digital Preservation in Your OrganisationDigital Preservation Coalition. 2018.
     This page provides some useful guides, examples and other resources that can help with building a business case and more broadly making the case for digital preservation within your organisation.
When preparing a business case or briefing, these resources will provide useful an array of helpful information to assist in the construction of a business case, from planning and preparation all the way through to polishing and communicating the finished case for digital preservation in your organisation.
Communication is critical for understanding your stakeholders and creating a foundation for establishing digital preservation within your organisation. These resources provide guidance on engaging with particular audiences

Thursday, December 06, 2018

3 Principles for Selecting a Digital Preservation Solution

3 Principles for Selecting a Digital Preservation Solution. Daniel Greenberg. Ex Libris. November 29, 2018.
   This post was in honor of World Digital Preservation Day and lists some important elements to remember when reviewing digital preservation systems:

1. Interoperability different types of data and integrating with other systems
  • Support common protocols for harvesting, publishing and searching, e.g. Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and SRU (Search/Retrieve via URL).
  • Ingest content with multiple methods and structures; e.g., BagIt, METS, CSV, and XML.
  • Providing well-documented external APIs
  • Integrating with other information systems
2. Follow Industry Standards, particularly standard metadata schemas and communication protocols. Benefits of doing this:
  • Interoperability between new and existing services and applications.
  • Compliance with policies and regulations.
  • Introduction of innovative features.
  • Enable a robust exit strategy, in case the vendor goes out of business.
3. Scalability:
  • Architectural scalability: Start small and grow big. Ability to expand the throughput over time without compromising performance.
  • Operational scalability: Ability to customize the system to the institutions’ needs.
  • Informational scalability: Keep up with latest strategies, practices, tools and policies by an active user community.
  • Organizational scalability: Administer multiple institutions with a single installation; support a flexible consortium model.

Wednesday, December 05, 2018

Digital Preservation Network (DPN) Sunset

Community Announcement - DPN SunsetDigital Preservation Network. December 4, 2018.
     The Digital Preservation Network’s Board of Trustees of DPN are ending DPN.  The DPN Board "determined that it is not feasible to design and implement changes that would ensure sustainability." 
"The landscape of digital preservation services has changed considerably in the past six years, as have the community’s preservation needs. Our highest priority is to affect an orderly sunset for the organization’s operations and for the disposition of its deposits."

The ending of a community-based organization to provide long-term digital preservation storage  highlights the numerous challenges with maintaining digital resources long-term.

Thursday, November 29, 2018

The File Discovery Tool - A simple tool to gather file and filepath information, and ingest into our Rosetta Digital Archive

The File Discovery Tool. Chris Erickson. Brigham Young University. November 29, 2018.
     We have created a File Discovery Tool that analyzes directories of objects and prepares a spreadsheet of all the files it discovers for preservation/ingest. This file allows the curators to discover and work with the materials, select those that need to be preserved, and then add collection and other metadata information. The tool fits our workflow, but the source code may be useful for others trying to accomplish a similar task.

A sample command to run the tool:
>> java -jar FileDiscovery.jar [path name of files to check] [output path name for saving the report]
>> java -jar C:\FileDiscovery\FileDiscovery.jar "R:\test\objects"  C:\output\files
 The commands and syntax are outlined in a brief document: File Discovery Outline
The spreadsheet that is created has the following column headings:

Metadata can be added as needed before ingesting the content into Rosetta.

The files and the metadata can then be submitted to Rosetta using the csv option in the Rosetta File Harvester tool by adding in a second row of Dublin Core names in order to map the column. A standard template has been created to help in preparing the file for ingest and is found on the resources page: RosettaFile Ingest template for Excel, or (PDF)
The source is available at