This blog contains information related to digital preservation, long term access, digital archiving, digital curation, institutional repositories, and digital or electronic records management. These are my notes on what I have read or been working on. I enjoyed learning about Digital Preservation but have since retired and I am no longer updating the blog.
Saturday, November 19, 2016
Software Sustainability and Preservation: Implications for Long-term Access to Digital Heritage
Digital content requires software for interpretation, processing, and use, and sustaining the software functionality beyond its normal life span is an issue. It may not be possible, economically or otherwise, for the software vendors to maintain software long term. Virtualization and emulation are two techniques that may be viable options for long-term access to objects, and there are currently efforts to preserve essential software that is needed to access or render digital content. Some efforts are the earlier KEEP Emulation Framework project, and currently the bwFLA Emulation as a Service (EaaS) project has demonstrated the ability to provide access to emulated and virtualized environments via a simple web browser and as part of operational archival and library workflows.
Memory institutions and software vendors have valuable digital heritage software collections that need to be maintained. A growing number of digital objects require software in order to be used and viewed. Yale University, the Society of American Archivists and others are working to resolve legal barriers to software preservation practices. The preservation community "continues to evolve their practices and strive for more comprehensive and complete technical registries to support and coordinate software preservation efforts".
Wednesday, April 06, 2016
Validating migration via emulation
"Automated migration of content between files of different formats can often lead to content being lost or altered." Verifying the migration of content is mostly a manual process, and when done for a large number of objects it is not-cost effective. A possible way to do this is to automatically migrate to preferred formats as much as possible and give users the option of working with the object in the “original” software as well as an emulation service. The users could look at both the migrated and emulated versions and verify that the migrated object is valid. By involving multiple users, the migrated object becomes a trusted object.
If this were done together with migration or emulation on demand, then validated digital objects could be separately ingested into a digital preservation system and preserved along with the original version. This could reduce the storage of migrated versions by "only preserving 'validated' migrated versions" and also ensure that trusted content was "available and properly preserved".
Thursday, March 03, 2016
Research Software Sustainability: Report on a Knowledge Exchange Workshop
"Without software, modern research would not be possible since it is connected to the software that is used to generate results." Overlooking software will put at risk the reliability and the ability to reproduce the research itself. Like the research, and any other tool, software must stand up to the same scrutiny. It is not easy to define software sustainability, but it is the practices that allow software to continue to function as expected in the future. This is neither easy nor straightforward. Software has a lifecycle: it is conceived, matures and decays. Not all software should be sustained, we should concentrate on sustaining software that is most useful. Software is always reliant on other software in order to work., including operating system, system libraries, and other necessary packages. Any change or decay at any level can affect the operation of the software higher up the stack. "If we attempt to preserve software, it quickly becomes out of step with its dependent software."
Definitions:
- Research software: software developed within academia for the purposes of research, particularly to generate, process and analyze results.
- Software sustainability: the technical and non-technical practices that allow software to continue to operate as expected in the future. A constant level of effort is required to maintain the software’s operation.
- Software preservation: an approach to extend the lifetime of software that is no longer actively maintained.
- Software archiving: one important aspect of software preservation. It is the process of storing a copy of software so that it may be referred to in the future.
- Encapsulation. Preserve the original hardware and software to ensure that the software continues to operate (an example is recomputation.org)
- Emulation. Emulate the original hardware and operating environment so that the software continues to operate
- Migration. Update the software to maintain the original functionality and transfer it to new platforms as necessary to prevent obsolescence
- Cultivation. Keep the software up to date by adopting an open development model that allows new contributors to be brought on board
- Hibernation. Preserve knowledge of how to resuscitate the software’s exact functionality at a later date
- Deprecation. Formally retire the software. Unlike hibernation, no time is invested into preparations to make it easier to resuscitate the software
- Procrastination. Do nothing
- Raise awareness of the fundamental role of software in research
- Recognize research software as a valuable research object
- Promote software sustainability
- Embed software sustainability skills in the research community
- Create organizations as focal points for software sustainability expertise
- Trusted research
- Increased rate of discovery
- Increased return on investment
- Research data remains readable and usable
Saturday, February 27, 2016
Back in a Flash
Flashback is a proof of concept project run by the British Library’s Digital Preservation Team that examines emulation and migration solutions as methods for preserving the content on CD, DVD , 3.5” and 5.25” disks. The team acquired original hardware for their legacy lab to analyze and deal with content from those formats. They have found that the old hardware can have problems. The first step is a capture process which extracts data from the storage media and characterizes its physical components and lists the files on the media. The content can be placed in a controlled environment that ensures that the bits are retained regardless of deteriorating storage media. The technical information about the content is important for preservation planning.
For less complex content such as text the solution is to migrate the file from for old or obsolete formats to more contemporary and reliable formats. The large majority of the content though is so "tightly bound up with its original environment that it cannot be migrated", which is the case for software. For these, the option is to emulate the item’s original hardware and software environment which were supplied by the University of Freiburg via BwFLA – Emulation As A Service. Flashback is gathering data about the performance and viability of emulating groups and comparing characteristics of the software on original hardware and emulators.
Monday, February 01, 2016
Preserving and Emulating Digital Art Objects
This white paper describes the media archiving project's findings, discoveries, and challenges. The goal is the creation of a preservation and access practice as well as sustainable, realistic, and cost-efficient service frameworks and policies. The project was looking at new media art but it should also help inform other types of complex born-digital collections. It aims to develop scalable technical frameworks and associated tools to facilitate enduring access to complex, born-digital media objects.
Interactive digital assets are much more complex to preserve and manage than regular digital media files. A single interactive work can include a range of digital objects, dependencies, different types and formats, applications and operating systems. The artwork can consist of "sound recordings, digital paintings, short video clips, densely layered audiovisual essays that the user navigates and explores with the clicks and movements of a computer mouse. Expansive and complex, the artwork may include many sections, each with its own distinct aesthetic, expressed through rich sound and video quality and intuitive but non-standard modes of interactivity." The interactive and technological nature of these assets poses serious challenges to digital media collections.
About 70 percent of the project artworks could not be accessed at all without using legacy hardware. The project team realized that operating system emulation could be a viable access strategy for those complex digital media holdings.
Project Goals
- Identify significant properties needed to preserve and access of new media objects.
- Define a metadata framework to support capture of technical and descriptive
information for preservation and reuse. - Create SIPs that can be ingested into a preservation repository.
- Explore resource requirements, staff skills, equipment needs, and associated costs.
- Help understand “preservation viability” for complex digital assets
"Emulation seems an excellent and flexible approach to providing fully interactive access to obsolete artworks at very reasonable quality." However there are issues with using emulation as an archival access strategy:
- preserve emulators must be preserved as well as artworks.
- creating archival identities for emulators is difficult and documentation tends to be inconsistent.
- emulators will eventually become obsolete with new operating systems
- new emulators must be created
- no emulator can provide a fully “authentic” rendering of a software-based artwork.
Artists have increasing access to tool for creating complex art exhibits and objects, but it is "nearly impossible to preserve these works through generations of technology and context changes." Digital curation is more important that ever. Access is the keystone of preservation. The appendices include Emulation Documentation, the Pre-Ingest Work Plan, and Artwork Classifications:
- Structure of the classifications
- Browser-Based Works
- Virtual Reality Components
- Executables in Works
- Macromedia and Related Executables
- HFS File System
Monday, November 02, 2015
Emulation as a Tool. What Can Emulation Do for You?
Emulation can be used as a tool for:
- Contextualization, To identify, describe and preserve object environments
- Generalization. To allow the environment to be run everywhere
- Preservation Planning. Prepare environments to run long term
- Publication & Access. Provide citation of objects in context; allow reuse
- Encapsulation of different emulators and technology to common component
- Centralize technical services
- Hide technical complexity of emulation through web interfaces
- Browser-based access
- Provides citation support
- Available with simple browser-based access
- Make emulated content embeddable and shareable like Youtube videos
Tuesday, August 04, 2015
Software dependent content
People assume that emulation is only for preserving the “look and feel” of digital objects. Instead, it can be used to interact with content that requires a particular and limited range of software environments. Software can add, remove, or alter content, not just the “look and feel”. The post introduces the term “software dependent content” to refer to digital content that is to be interacted with, rendered, viewed or consumed. "I use this term to refer to content that requires a particular and limited range of software environments in order to be interacted with, rendered, viewed or consumed." It is useful when discussing the need to invest in emulation based digital preservation approaches.
Wednesday, October 15, 2014
Functional Access To Electronic Media Collections using Emulation-as-a-Service.
- User Layer: the ingest workflow, data evaluation and prioritization, license evaluation, and creation of images.
- Workflow layer: retrieving image, evaluate rendering, technical metadata, object rendering.
- Technical layer: EaaS environments, local resources, resource allocation
Saturday, August 10, 2013
Game Walkthroughs As A Metaphor for Web Preservation
Somethings can't really be preserved digitally, such as computer games, even though it would be possible to create emulators. So for some, the best way to experience the game is though walk throughs on YouTube.
"I think game walkthroughs can provide us with an interesting metaphor for web archiving, not simply walkthroughs of web instead of game sessions (though that is possible), but in the sense of capturing a series of snapshots of dynamic services and archiving them. Given "enough" snapshots, we might be able to reconstruct the output of a black box"
Tuesday, April 03, 2012
Dream of perpetual access comes true!
The KEEP project released its final version of the open source Emulation Framework software. This project has brought emulation in the digital preservation context to the next level, that is, user friendly. The easy to install package runs on all major computer platforms. It automates several steps:
- identify what kind of digital file you want to render;
- find the required software and computer platform you need;
- match the requirements with available software and emulators;
- install the emulator;
- configure the emulator and prepare software environment;
- inject the digital file you selected into the emulated environment;
- give you control over the emulated environment.
Tuesday, March 16, 2010
Digital Preservation Matters - March 16 2010
This looks at the archival material, including digital, from an author that is on display at Emory University. It highlights what research libraries and archives are discovering, that “born-digital” materials are much more complicated and costly to preserve than anticipated. The “archivists are finding themselves trying to fend off digital extinction at the same time that they are puzzling through questions about what to save, how to save it and how to make that material accessible.” Computers have now been used for over two decades, but their digital materials are just now find their way into archives. The curator said “We don’t really have any methodology as of yet to process born-digital material. We just store the disks in our climate-controlled stacks, and we’re hoping for some kind of universal Harvard guidelines.” The challenges including cataloging the material, acquiring the equipment and expertise to access the data stored on obsolete media. Do they try to save the look and feel of the material or just save the content? The computer editing meant that there are no manuscripts with pages with “lots of crossings-out and scribbling”. The display is providing the “emulation to a born-digital archive” similar to reproducing the author’s work environment. Emory is providing $500,00 to produce a computer forensics lab to do this kind of work. Others are impressed with the emulation, but their focus is storage and preservation of digital content. One center is trying to raise money to hire a to hire a digital collections coordinator. Until then, the digital materials are unavailable to researchers.
---
More on using DROID for Appraisal. Chris Prom. Practical E-Records. March 10, 2010.
The information that DROID supplies is useful but the output not optimally organized for reuse. But by regularizing the DROID CSV output the information became sortable and more useful. DROID was also useful in identifying files that did not use the standard file extension for an application, also to find files that needed attention or need to be converted. And it was very useful in the appraisal process. With it, the major migration problems could be identified and it helped to weed out inappropriate, duplicate, or private content.
---
Data, data everywhere. Economist. February 25, 2010.
The world contains an unimaginably vast amount of digital information which is increasing rapidly. This makes it possible to do many things that previously could not be done but it is also creating a host of new problems. The proliferation of data is making them increasingly inaccessible. The way that information is managed touches all areas of life. The data-centered economy is still new and the implications are not yet understood.
---
Archon™: The Simple Archival Information System. Website. 15 February 2010.
Version 3 of this software has been released. The software is for archivists and manuscript curators. It publishes archival descriptive information and digital archival objects to a user-friendly website. Functionality includes:
· Create standards-compliant collection descriptions and full finding aids using web forms.
· Describe the series, subseries, files, items, etc. within each collection.
· Upload digital objects/electronic records or link archival descriptions to external URLs.
· Batch import data
· Export MARC and EAD records
---
Deluge of scientific data needs to be curated for long-term use. Carole L. Palmer. PhysOrg.com. February 24, 2010.
Data curation is the active and ongoing management of data through their lifecycle. It is an important part of research. Data is a valuable asset to institutions and to the scientific enterprise. Saving the publications that report the results of research isn't enough; researchers also need access to data. Data curation begins long before the data are generated, it needs to start at the proposal stage. Without the data there is the issue of replicating and validating a research project's conclusions. "Digital content, including digital data, is much more vulnerable than the print or analog formats we had before." selecting, appraising and organizing data to make them accessible and interpretable takes a lot of work and expense. "The bottom line is that many very talented scientists are spending a lot of time and effort managing data. Our aim is to get scientists back to doing science, where their expertise can make a real difference to society."
---
Is copyright getting in the way of us preserving our history? Victor Keegan. The Guardian. 25 February 2010.
In theory, future historians will have a lot of information about our age. In reality, much of it may be lost. Much of the information is on web pages, and they have a short life expectancy. The British Library has launched the UK Web Archive, which will guarantee longevity to thousands of hand-picked UK websites. But this is only a small part. “The issue of copyright is a global nightmare for anyone interested in digital preservation.”
---
"Zubulake Revisited: Six Years Later": Judge Shira Scheindlin Issues her Latest e-Discovery Opinion. Electronic Discovery Law. January 27, 2010.
This review of a case that addresses the issues of parties’ preservation obligations. Check here for the full opinion. The case revisits an earlier decision concerning e-discovery, or finding electronic documents, emails, etc, in court cases; obligations; and negligence for failure to keep records correctly. Some statements from the court opinion:
- By now, it should be abundantly clear that the duty to preserve means what it says and that a failure to preserve records, paper or electronic, and to search in the right places for those records, will inevitably result in the spoliation of evidence.
- While litigants are not required to execute document productions with absolute precision, at a minimum they must act diligently and search thoroughly at the time they reasonably anticipate litigation.
- The following failures support a finding of gross negligence, when the duty to preserve has attached: to issue a written litigation hold; to identify all of the key players and to ensure that their electronic and paper records are preserved; to cease the deletion of email or to preserve the records of former employees that are in a party's possession, custody, or control; and to preserve backup tapes when they are the sole source of relevant information or when they relate to key players, if the relevant information maintained by those players is not obtainable from readily accessible sources.
- The case law makes crystal clear that the breach of the duty to preserve, and the resulting spoliation of evidence, may result in the imposition of sanctions by a court because the court has the obligation to ensure that the judicial process is not abused.
Tuesday, February 09, 2010
Digital Preservation Matters - February 8, 2010
Online Recordkeeping: It's All in a Name. Mimi Dionne. Internet Evolution. February 2, 2010.
The born-digital record lifecycle has five stages, in chronological order: creation; distribution and use; storage and maintenance; retention; and disposition or archival preservation. All five stages are important. One of the best practices for born-digital records is uniform file naming protocols, including location, to encourage strong content management. These should align with the records retention policies. Organizations are better off if they select the information they need to retain and destroy what they don’t need. “The benefits of implementing a records program that includes regular records destruction have far-reaching influence not only on compliance issues and maintenance of a company’s IT environment but also the health of its budget.”
---
SPIE to Preserve E-Books in Portico. Press Release. Portico. 2 February 2010.
Portico has agreed with SPIE (the international society for optics and photonics) to preserve its collection of e-books, currently 93 items. It already participates with Portico to preserve its e-journals. Portico now holds over 34,000 e-books and over 10,000 e-journals. The SPIE has also announced the launch of their digital library, which includes 120 SPIE Press titles from the Field Guides, Monographs, and Tutorial Texts series.
---
Long-Term Preservation Of Web Archives – Experimenting With Emulation And Migration Methodologies. Andrew Stawowczyk Long. IIPC. December 2009. [54 p. PDF]
The decision to emulate or migration are largely based on personal beliefs, rather than on any particular evidence. We do not know which of these is more useful in the long term. All objects change over time, so ensuring long-term, useful access to collections requires we first define the most important aspects of an object that needs to be preserved. The “Preservation Intent” may be useful for this, which is what the institution intends to preserve for any given digital object and for how long. Also needed is the creator’s intent, the contextual information and the technical information.
Two possible approaches for institutions may be:
- preserve digital objects over the next twenty years;
- find means of preserving objects for longer.
Or an approach may include both: preserve items for 20 years while the search for longer preservation mechanisms continues. “Significant properties” means the properties of a digital object that are essential to the representation of the intended meaning of that object.
The author does not recommend either emulation or migration as a perfect solution to the problem at this current time. Also, their findings and recommendations include:
- There are no tools suitable for long-term preservation of very large web archives
- All preservation actions need to be based on a clearly defined “Preservation Intent”
- Migration and emulation offer some time extensions to for short term access to digital objects.
- Emulation seems to present higher risks as a long-term preservation methodology.
It is not possible to preserve it all. Priorities need to be established for practical, long-term preservation solutions. The best hope for adequate long-term preservation, lies in continuous and systematic work, researching various preservation methodologies, and improving our understanding of the future use of web archives.
---
Is NAND flash about to hit a dead end? Lucas Mearian. Computerworld. February 4, 2010.
IM Flash Technologies has said that shrinking the technology much further may not be possible because of problems with bit errors and reliability. The number of electrons that can be stored in the memory cell decreases with each generation of flash memory, making it more difficult for the cells to reliably retain data.
---
CNRI Digital Object Repository™. Corporation for National Research Initiatives. 19 January 2010.
(CNRI) has developed a new version of its Digital Object Repository Software. It is open source, flexible, scalable, secure, and has a suite that provides a common interface for accessing all types of digital objects. Redundancy is supported by a mirroring system with software to ensure that replicated objects are kept in sync.