Thursday, March 28, 2019

A Public Record at Risk: The Dire State of News Archiving in the Digital Age

A Public Record at Risk: The Dire State of News Archiving in the Digital Age. Sharon Ringel and Angela Woodall. Columbia Journalism Review. March 28, 2019.
     This research report looks at archiving practices and policies across newspapers, magazines, wire services, and digital-only news producers, to identify the current state of preserving content in an age of digital distribution. The majority of news outlets had not given any thought to even basic strategies for preserving their digital content, and not one was properly saving a holistic record of what it produces. Digitization and storage in a database are not alone adequate for long-term preservation. True archiving requires forethought and custodianship.

Staff equate digital backup and storage in Google Docs or content management systems with archiving, but they are not the same, and were unable to distinguish between backups and an archive. Backups are temporary copies for data recovery in case of damage or loss, while archiving refers to long-term preservation to ensure records will still be available even as technologies change in the future. They expect that other third-party organizations will have copies, such as the Internet Archive, Google, Twitter, Facebook, etc. Even if the IA has captured a website, what it collects may be limited to the first level of content and could exclude links, comments, personalized content, and different versions of a story.

There are news archiving technologies being developed; preserving digital content is not a technical challenge, but  a matter of priority and a decision that demonstrates intent. The findings should be a wake-up call to an industry which claims that democracy cannot be sustained without journalism to be a truth and accountability watchdog. "In an era where journalism is already under attack, managing its record and future are as important as ever."

The news organizations are interested in the present: “Who cares what existed 10 years ago? I need my thing now. And so, for better, for worse, if there was some value in [archiving], I probably got a better value out of the new thing.” In short, newsrooms are doing very little to nothing to preserve digital news. And none of the content creators interviewed made an effort to download and preserve the stories they produced.

Deletion is the opposite side of preservation and "news organizations, in certain cases, actively remove content from the public record", which raises questions about the role of journalism in society.

Some key findings of the news organizations participating in the research:

  • 19 of the 21 news organizations had no policies or practices for the preservation of their content. Of the 21 news organizations in our study, 19 were not taking any protective steps at all to archive their web output. The remaining two lacked formal preservation strategies.
  • Of the 21 news organizations, only six employed news archivists or librarians and their other responsibilities, took the focus away from the work required for preservation. 
  • None of the digital-only outlets had a news librarian or archivist on staff. 
  • None of the news organizations were preserving their social media publications. Only one was attempting to address the problem.
  • Digital-only news organizations were less aware than print publications of the importance of preservation. Very little is currently being done to preserve news.
  • Journalism’s primary focus is on “what is new” and preserving documentation of their reporting and what makes it accurate than preserving what ultimately gets published.
  • News apps are at high risk of being lost because these new technologies become obsolete before anyone thinks to save them. 
  • Partnerships among archivists, technologists, memory institutions, and news organizations will be vital to ensure future access to digitally distributed news content. Two questions to start with: What should be preserved? Who should preserve it?
  • To enact lasting change, opinion leaders in the field must introduce to staff and management that archiving ideas make sense  positions, it has advantages, and is compatible with their priorities.

News organizations should care about preserving news for the future just as they care about integrity, reliability, and informing the public not just in the present.

Wednesday, March 27, 2019

Next Phase OAIS Review

Next Phase OAIS Review. Barbara Sierman. Digital Preservation Seeds. March 24, 2019.
     The OAIS standard had its 5-year review in 2017, which resulted in over 200 suggestions for change. All the changes were discussed in the DAI Working Group and and most of them were accepted. The next step is for the updated OAIS standard to go to CCSDS and ISO for final approval. The main part of the changes concerned terminology, both clarification and consistency.

Some concepts got a more extensive description, while others have changed. The new OAIS standard shows that a transparent process can lead to a standard that reflects the current practices. The standards group will now have a final opportunity to decide whether all suggested changes are clear and implementable.

Friday, March 22, 2019

Datanomics Costs, Benefits, and Value of Research Data

Datanomics: the value of research data. Neil Beagrie. Jisc Invitational Workshop, Glasgo., February 2019.
 Slides from presentation on Datanomics Costs, Benefits, and Value of Research Data. His description of the slides: 

Twenty years ago format obsolescence was seen as the greatest long-term threat to digital information.  Arguably, experience to date has shown that funding and organisational challenges are perhaps more significant threats. I hope this presentation helps those grappling with these challenges and shows some key advances in how to use knowledge of costs, benefits and value to support long-term sustainability of digital data and services.

These are the slides from my keynote presentation to the joint Digital Preservation Coalition / Jisc workshop on Digital Assets and Digital Liabilities - the Value of Data held in Glasgow in February 2018. The slides summarise work over the last decade in the key areas of exploring costs, benefits and value for data. The slides posted here have additional slide notes and references to new publications since the workshop and some modifications such as removal of animations.

Some notes from the slides:
Costs. Keeping Research Data Safe (KRDS)  rules of thumb.
  1. Getting data in takes about Half of the lifetime costs, Preservation about a sixth, access about a third. 
  2. Preservation costs decline over time. 
  3. Fixed costs are significant for most data archives 
  4. Staff are the most significant Proportion of archive costs.

The KRDS Benefits Framework. Benefit from Curation of Research Data. Framework arranged on 3 dimensions.
  1. What is the outcome?
  2. When is it received?
  3. Who benefits?
Valuing Intangible Assets: Measuring value of intangible assets is much harder than for physical assets. We measure value of data services not just data alone

Economic Metrics Used
  • Investment Value Amount spent on producing the good or service
  • Use Value Amount spent by users to obtain the good or service
  • Contingent value: the amount users are “willing to accept” in return for giving up access
  • Efficiency gain: user estimates of time saved by using the Data Service resources
  • Return on investment: the estimated increase in return on investment due to the additional use
Must also look at the Costs of Inaction
  • Rate of loss of research data sets: 17% per annum
  • Partial information loss: 7% per annum
  • Rate of loss for web-links to data: c. 5.5% per annum
  • Access / Data requests fulfilled
  • Delay in elapsed time to fulfill data requests. Up to 6 months

Recommendations: Investigate the relative costs and benefits of curation levels, storage, or appraisal for what to keep.

“Five or six decades since the beginning of the Information Age, the namesake of this age, and the major asset driving today’s economy, is still not considered an accounting asset”

“Corporations typically exhibit greater discipline in tracking and accounting for their office furniture than their data”

Use cost data to look for trends, leverage our efforts, investigate the relative costs and benefits of curation levels, storage, and look towards hierarchical curation management.

Monday, March 11, 2019

Arctic World Archive receives more world treasures

Arctic World Archive receives more world treasures. Press release. 21. February 2019.
     Institutions and companies from around the world, including Utah Valley University, have deposited their digital content in the Arctic World Archive in Svalbard, Norway.  The Archive is a repository for world memory where the data will last for centuries.  The Archive is a collaboration between Piql, digital preservation specialists, and Store Norske Spitsbergen Kulkompani (SNSK), a state-owned Norwegian mining company based on Svalbard with vast experience and resources to build and maintain mountain vaults.

The top 10 items of cultural heritage, as nominated by the public was also stored away for the future. These items include famous religious texts, paintings, architectural designs, science breakthroughs and popular contemporary music. 

See also:

Saturday, March 09, 2019

What to Keep: A Jisc research data study

What to Keep: A Jisc research data study. Neil Beagrie. Jisc. February 2019.  [PDF]
     This study is about research data management and also appraisal and selection. This is an issue that has become more significant in recent years as volumes of data have grown. "The purpose is to provide new insights that will be useful to institutions, research funders, researchers, publishers, and Jisc on what research data to keep and why, the current position, and suggestions for improvement."

"Not all research data is the same: it is highly varied in terms of data level; data type; and origin. In addition, not all disciplines are in the same place or have identical needs."

"It is essential to consider not only What and Why to keep data, but for How Long to keep it, Where to keep it, and increasingly How to keep it in ways that reflects its potential value, cost, and available funding."

The study lists ten recommendations:
  1. Consider what is transferable between disciplines. Support adoption of effective practice via training, technologies, case studies, and guidance checklists.
  2. Bring communities together with workshops to evolve disciplinary norms 
  3. Harmonise funder requirements for research data where relevant
  4. Investigate the costs and benefits of curation levels, storage, or appraisal for what to keep f
  5. Implement the FAIR principles as appropriate for kept data.  
  6. Enhance data discoverability and identification of data by recording and to identifying data  generated by research projects in existing research databases.
  7. Require Data Access Statements in all published research articles where data is used as evidence, and encourage adoption of the Transparency and Openness Promotion (TOP) guidelines 
  8. Improve incentives and lower the barriers for data sharing.
  9. Increase publisher and funder collaborations around research data. 
  10. Improve communication on what research data management costs can be funded and by whom
Definition of research: "a process of investigation leading to new insights, effectively shared. It includes work of direct relevance to the needs of commerce, industry, and to the public and voluntary sectors; scholarship ...; the invention and generation of ideas, images, performances, artefacts including design, where these lead to new or substantially improved insights; and the use of existing knowledge in experimental development to produce new or substantially improved materials, devices, products and processes, including design and construction.”

Other notes from the study:
Costs of research data management seen as too high
Obsolescence of data format or software

The volume of research data and the number of new research data services and repositories is increasing.

"The high-level principles for research data management may be established but the everyday practice and procedures for the full-range of research data, what and why to keep, for how long, and where and how to keep it, are still evolving."

“All those engaged with research have a responsibility to ensure the data they gather and generate is properly managed, and made accessible, intelligible, assessable and usable by others unless there are legitimate reasons to the contrary. Access to research data therefore carries implications for cost and there will need to be trade-offs that reflect value for money and use.”

The Core Trustworthy Data Repositories Requirements notes four curation levels that can be performed by trusted repositories:
a. As deposited
b. Basic curation eg, brief checking, addition of basic metadata or documentation
c. Enhanced curation eg, conversion to new formats, enhancement of documentation
d. Data level curation (as in C above, with additional editing of data for accuracy)

UVU Cylinder Project

UVU Cylinder Project. Website. Utah Valley University. March 8, 2019.
     This website and the cylinder project was showcased at a digital preservation symposium. This site has an extensive searchable library of cylinders, and a Cylinder Player. The un-archived cylinders are in the process of being transcribed, metadata added, recordings being cleaned and then posted to the site.

Wednesday, March 06, 2019

Texas Digital Library Digital Preservation Services

Texas Digital Library Digital Preservation Services. Press release. Texas Digital Library, 5 March 2019. [PDF]
     The organization now offers Digital Preservation Services to its members to help Texas cultural heritage and scholarship stewards provide access for the long term through direct consulting, training, and workflow support that includes the right combination of technologies for your unique content needs. The content can be stored in multiple geographically-dispersed locations with fixity checking with Chronopolis and Amazon through the DuraCloud interface.

Tuesday, March 05, 2019

Accessible Archives Inc. Partners with Portico

Accessible Archives Inc. Partners with Portico. Press release. Accessible Archives, Inc. Mar 05, 2019.
     Accessible Archives Inc., an electronic publisher of full-text primary source historical databases, has partnered with Portico, in order to fully support the digital preservation of their content. With more content in a digital-only format, this will help the archival collections remain accessible Preservation will help to ensure the long-term availability of these resources for future scholars.