Protecting the Long-Term Viability of Digital Composite Objects through Format Migration. Elizabeth Roke, Dorothy Waugh. iPres 2015 Poster. November, 2015.
The poster discusses work done at Emory University’s Manuscript, Archives, and Rare Book Library to "review policy on disk image file formats used to capture and store digital content in our Fedora repository". The goal was to to migrate existing disk images to formats more suitable for long-term digital preservation. Trusted Repositories Audit & Certification (TRAC) requires that digital repositories monitor changes in technology in order to respond to changes. Advanced Forensic Format offered a good solution for capturing forensic disk images along with disk image metadata, but Libewf by Joachim Metz, which is a library of tools to access the Expert Witness Compression Format (EWF) has replaced it. They have decided to acquire raw disk images, or when not possible, to use tar files, because the disk images may be less vulnerable to obsolescence.
In attempting to migrate formats, they had to develop methods for migrating the files setup the repository to accept the new files. They also rely on PREMIS metadata. The migration of disk images from a proprietary or unsupported format to a raw file format has made it easier for us to manage and preserve these objects and mitigates the threat of obsolescence for the near term. There have been some consequences. Some metadata is no longer available. Also, the process will be more complicated and require other workflows, and files will no longer contain embedded metadata. "The migration to a raw file format has made the digital file itself easier to preserve."
This blog contains information related to digital preservation, long term access, digital archiving, digital curation, institutional repositories, and digital or electronic records management. These are my notes on what I have read or been working on. Please note: this does not reflect the views of my employer or anyone else.
Tuesday, June 28, 2016
Monday, June 27, 2016
A Digital Dark Now? : Digital Information Loss at Three Archives in Sweden
A Digital Dark Now? : Digital Information Loss at Three Archives in Sweden. Anna-Maria Underhill and Arrick Underhill. Master’s thesis. Lund University. 2016. [PDF]
The purpose of this study is to examine the loss of digital information at three Swedish archives. Digital preservation is a complex issue that most archival institutions struggle with. Focusing on successes to the exclusion of failures runs the risk of creating a blind spot for existing problems. The definition of digital information in this study includes digital objects and their metadata. The study includes digital internal work documents that serve as a contextual support for an archive’s collections; results are analyzed from the transition between the Records Lifecycle Model and the Records Continuum Model, an ontological understanding of digital information, the SPOT model for risk assessment and the OAIS Reference Model.
Some of the conclusions re-affirm previous research, such as the need to prioritize organizational issues. Others look at the current state of digital preservation at these archives which includes the delicate balancing act "between setting up systems for successful future digital preservation while managing existing digital collections which may not have been preserved correctly". Some institutions are unable to undertake a more proactive form of digital preservation because of the nature of the materials they preserve. The study points out that "when discussing digital preservation, the tendency remains to think of digitized material first rather than born digital information". The loss of a file may be only a part of the loss; there is also a loss of metadata and the connections between information, which may be more common than the loss of entire digital objects. "Finally, one question has followed this study from the beginning to the end: How can you know that you have lost something you never knew existed".
One definition of short, medium, and long-term preservation is:
Six essential properties for digital preservation which must be preserved:
The purpose of this study is to examine the loss of digital information at three Swedish archives. Digital preservation is a complex issue that most archival institutions struggle with. Focusing on successes to the exclusion of failures runs the risk of creating a blind spot for existing problems. The definition of digital information in this study includes digital objects and their metadata. The study includes digital internal work documents that serve as a contextual support for an archive’s collections; results are analyzed from the transition between the Records Lifecycle Model and the Records Continuum Model, an ontological understanding of digital information, the SPOT model for risk assessment and the OAIS Reference Model.
Some of the conclusions re-affirm previous research, such as the need to prioritize organizational issues. Others look at the current state of digital preservation at these archives which includes the delicate balancing act "between setting up systems for successful future digital preservation while managing existing digital collections which may not have been preserved correctly". Some institutions are unable to undertake a more proactive form of digital preservation because of the nature of the materials they preserve. The study points out that "when discussing digital preservation, the tendency remains to think of digitized material first rather than born digital information". The loss of a file may be only a part of the loss; there is also a loss of metadata and the connections between information, which may be more common than the loss of entire digital objects. "Finally, one question has followed this study from the beginning to the end: How can you know that you have lost something you never knew existed".
- When discussing digital preservation, it is important to clarify that storage is not the same thing as preservation.
- The survival of information is dependent upon the maintenance of its infrastructure and migrating it to contemporary formats.
- Authenticity can be a major issue for digital records and is important to their evidentiality.
- Emulation is another option for digital preservation, which targets the operating environment of the information rather than the file.
- Emulation will eventually require migration. Emulation can become too complicated to be viable in the long run
- Sometimes digital preservation fails to preserve what it intends to save, which can be termed information loss.
- Obsolescence is currently one of the greatest threats to successful digital preservation. If a file cannot be read, then it is nearly the same thing as a document having been destroyed.
- "Without the provenance and the contextual links between records, records cannot be demonstrated to be authentic and reliable, evidentiality is lost and the use of the records for knowledge and understanding about what has happened will be difficult."
One definition of short, medium, and long-term preservation is:
- Short-term preservation – solutions that are used for a short time, 5 years maximum.
- Medium-term preservation – solutions that are used during a system’s lifetime, 10 years maximum.
- Long-term preservation – solutions that are used after the originating system’s lifetime, the number of years varies, usually from 10 to 50 years.
Six essential properties for digital preservation which must be preserved:
- Availability
- Identity
- Persistence
- Renderability
- Understandability
- Authenticity
- Loss of parts or whole digital objects during migration
- Loss of the connections between analog and digital information belonging to the same archive
- Loss of information due to it having been saved in an incorrect format
- Loss of data in connection with technological changes
- Loss of digital information when stored together with analog
- Loss of information due to obsolete hardware
- Loss of metadata due to databases written in code that is not open source
- Human error during the production of information
- An analog understanding and treatment of digital information
- A lack of organizational structure and strategies for digital preservation
- Lack of resources
- Technological limitations
- Lack of competencies among staff who produce digital information
Friday, June 24, 2016
File-format analysis tools for archivists
File-format analysis tools for archivists. Gary McGath. LWN. May 26, 2016.
Preserving files for the long term is more difficult than just copying them to a drive. There are other issues are involved. "Will the software of the future be able to read the files of today without losing information? If it can, will people be able to tell what those files contain and where they came from?"
Digital data is more problematic than analog materials, since file formats change. Detailed tools can check the quality of digital documents, analyze the files and report problems. Some concerns:
"Specialists are passionate about the answers, and there often isn't one clearly correct answer. It's not surprising that different tools with different philosophies compete, and that the best approach can be to combine and compare their outputs"
Preserving files for the long term is more difficult than just copying them to a drive. There are other issues are involved. "Will the software of the future be able to read the files of today without losing information? If it can, will people be able to tell what those files contain and where they came from?"
Digital data is more problematic than analog materials, since file formats change. Detailed tools can check the quality of digital documents, analyze the files and report problems. Some concerns:
- Exact format identification: Knowing the MIME type isn't enough.
- Format durability: Software can fade into obsolescence if there isn't enough interest to keep it updated.
- Strict validation: Archiving accepts files in order to give them to an audience that doesn't even exist yet. This means it should be conservative in what it accepts.
- Metadata extraction: A file with a lot of identifying metadata, such as XMP or Exif, is a better candidate for an archive than one with very little. An archive adds a lot of value if it makes rich, searchable metadata available.
- JHOVE (JSTOR-Harvard Object Validation Environment)
- DROID and PRONOM
- ExifTool
- FITS File Information Tool Set
"Specialists are passionate about the answers, and there often isn't one clearly correct answer. It's not surprising that different tools with different philosophies compete, and that the best approach can be to combine and compare their outputs"
Wednesday, June 22, 2016
Five Star File Format Signature Development
Five Star File Format Signature Development. Ross Spencer. Open Preservation Foundation blog. 14 Jun 2016 .
Discussion about formats and the importance of developing identification techniques for text formats. DROID is a useful tool but it has its limitations. For those wanting to be involved in defining formats, there are five principles of file format signature development:
Discussion about formats and the importance of developing identification techniques for text formats. DROID is a useful tool but it has its limitations. For those wanting to be involved in defining formats, there are five principles of file format signature development:
- Tell the community about your identification gaps
- Share sample files
- Develop basic signatures
- Where feasible, engage openly with the community
- Seek supporting evidence
Tuesday, June 21, 2016
Vienna Principles: A Vision for Scholarly Communication
Vienna Principles: A Vision for Scholarly Communication. Peter Kraker, et al. June 2016.
The twelve principles of Scholarly Communication are:
The twelve principles of Scholarly Communication are:
- Accessibility: be immediately and openly accessible by anyone
- Discoverability: should facilitate search, exploration and discovery.
- Reusability: should enable others to effectively build on top of each other’s work.
- Reproducibility: should provide reproducible research results.
- Transparency: should provide open means for judging the credibility of a research result.
- Understandability: should provide research in an understandable way adjusted to different stakeholders.
- Collaboration: should foster collaboration and participation between researchers and their stakeholders.
- Quality Assurance: should provide transparent and competent review.
- Evaluation: should support fair evaluation.
- Validated Progress: should promote both the production of new knowledge and the validation of existing knowledge.
- Innovation: should embrace the possibilities of new technology.
- Public Good: should expand the knowledge commons.
Monday, June 20, 2016
Preserving Transactional Data
Preserving Transactional Data. Sara Day Thomson. DPC Technology Watch Report 16-02. May 2016.
This report examines the requirements for preserving transactional data and the challenges in re-using these data for analysis or research. Transactional will be used to refer to "data that result from single, logical interactions with a database and the ACID properties (Atomicity, Consistency, Isolation, Durability) that support reliable records of interactions."
Transactional data, created through interactions with a database, can come from many sources and different types of information. "Preserving transactional data, whether large or not, is imperative for the future usability of big data, which is often comprised of many sources of transactional data. Such data have potential for future developments in consumer analytics and in academic research and "will only lead to new discoveries and insights if they are effectively curated and preserved to ensure appropriate reproducibility."
The organizations who collect transactional data aim to manage and preserve collected data for business purposes as part of their records management. There are strategies for database preservation as well as tools and standards that can look at data re-use. The strategies for managing and preserving big transactional data must adapt to both SQL and NoSQL environments. Some significant challenges include the large amounts of data, rapidly changing data, and different sources of data creation.
Some notes:
Best practices:
This report examines the requirements for preserving transactional data and the challenges in re-using these data for analysis or research. Transactional will be used to refer to "data that result from single, logical interactions with a database and the ACID properties (Atomicity, Consistency, Isolation, Durability) that support reliable records of interactions."
Transactional data, created through interactions with a database, can come from many sources and different types of information. "Preserving transactional data, whether large or not, is imperative for the future usability of big data, which is often comprised of many sources of transactional data. Such data have potential for future developments in consumer analytics and in academic research and "will only lead to new discoveries and insights if they are effectively curated and preserved to ensure appropriate reproducibility."
The organizations who collect transactional data aim to manage and preserve collected data for business purposes as part of their records management. There are strategies for database preservation as well as tools and standards that can look at data re-use. The strategies for managing and preserving big transactional data must adapt to both SQL and NoSQL environments. Some significant challenges include the large amounts of data, rapidly changing data, and different sources of data creation.
Some notes:
- understanding the context and how the data were created may be critical in preserving the meaning behind the data
- data purpose: preservation planning is critical in order to make preservation actions fit for purpose while keeping preservation cost and complexity to a minimum
- how data are collected or created can have an impact on long-term preservation, particularly when database systems have multiple entry points, leading to inconsistency and variable data quality.
- Current technical approaches to preserving transactional data primarily focus on the preservation of databases.
- Database preservation may not capture the complexities and rapid changes enabled by new technologies and processing methods
- As with all preservation planning, the relevance of a specific approach depends on the organization’s objectives.
- Encapsulation
- Emulation
- Migration/Normalization
- Archival Data Description Markup Language (ADDML)
- Standard Data Format for Preservation (SDFP)
- Software Independent Archiving of Relational Databases (SIARD)
Best practices:
- choose the best possible format, either preserving the database in its original format or migrating to an alternative format.
- after a database is converted, encapsulate it by adding descriptive, technical, and other relevant documentation to understand the preserved data.
- submit database to a preservation environment that will curate it over time.
Friday, June 17, 2016
The Web’s Past is Not Evenly Distributed
The Web’s Past is Not Evenly Distributed. Ed Summers. Maryland Institute for Technology. May 27, 2016.
This post discusses ways to structure the content "with the grain of the Web so that it can last (a bit) longer."The web was created so that there was not a central authority to sure all the links work, and permission is not needed to link to a site. It does result in a web where about 5% of links break per year, according to one site.
"The Web dwells in a never-ending present. It is—elementally—ethereal, ephemeral, unstable, and unreliable. Sometimes when you try to visit a Web page what you see is an error message: Page Not Found. This is known as link rot, and it’s a drag, but it’s better than the alternative. Jill Lepore." If we didn’t have a partially broken Web, where content constantly change and links break, it’s quite possible we wouldn’t have a Web at all. Some things to take note of:
"Our knowledge of the past has always been mediated by the collective care of those who care to preserve it, and the Web is no different."
This post discusses ways to structure the content "with the grain of the Web so that it can last (a bit) longer."The web was created so that there was not a central authority to sure all the links work, and permission is not needed to link to a site. It does result in a web where about 5% of links break per year, according to one site.
"The Web dwells in a never-ending present. It is—elementally—ethereal, ephemeral, unstable, and unreliable. Sometimes when you try to visit a Web page what you see is an error message: Page Not Found. This is known as link rot, and it’s a drag, but it’s better than the alternative. Jill Lepore." If we didn’t have a partially broken Web, where content constantly change and links break, it’s quite possible we wouldn’t have a Web at all. Some things to take note of:
- problems with naming things
- redirects
- proxies
- web archives
- static sites
- data export
"Our knowledge of the past has always been mediated by the collective care of those who care to preserve it, and the Web is no different."
Thursday, June 16, 2016
Current Game Preservation is Not Enough
Current Game Preservation is Not Enough. Eric Kaltman. Eric Kaltman's blog. 6 June, 2016.
The current preservation practices we use for games and software must be reconsidered for modern computer games. The Standard preservation model considers three major areas of interest:
The current preservation practices we use for games and software must be reconsidered for modern computer games. The Standard preservation model considers three major areas of interest:
- the physical extent of the game,
- the data stored on it, and
- the hardware necessary to run it.
- Consider what we are trying to save when we preserve video games. Is it to save the ability to play a historical game at some point in the future or record the act of play itself.
- Get the people creating games to dedicate time to basic preservation activities, such as providing records of development, production processes, and legacy source code that would help to recreate or recover the games .
- There needs to be more pressure and motivation from society to legitimate games as cultural production worth saving, and to create institutional structures to fight for preservation activity. Similar to what is being done for film.
- This all applies to more than to just games, but also software in general, which may be in an even worse situation.
- Knowledge Cafe - Games in Sound and Vision (with Eric Kaltman)
- The Troubles with Game History: Objects and Game Play
Wednesday, June 15, 2016
Keep Calm and do Practical Records Preservation
Keep Calm and do Practical Records Preservation. Matthew Addis. Conference on European Electronic Data Management and eHealth Topics. 23 May 2016.
The presentation looks at some of the practical tools and approaches that can be used to ensure that digital content remains safe, secure, accessible and readable over multiple decades. It covers mostly "practical and simple steps towards doing digital preservation for electronic content" but also some ways to determine how well prepared you are for preservation. Some things you need to show:
"Focus on the basic steps that need to be done now in order to support something bigger and better in the future." Know what you have and get the precious stuff in a safe storage environment.
The presentation looks at some of the practical tools and approaches that can be used to ensure that digital content remains safe, secure, accessible and readable over multiple decades. It covers mostly "practical and simple steps towards doing digital preservation for electronic content" but also some ways to determine how well prepared you are for preservation. Some things you need to show:
- ongoing integrity and authenticity of content in an auditable way.
- that content is secured and access is controlled.
- ability to access content when needed that is readable and understandable.
- ability to do this over decades, which is a very long time in the IT world
- have an archivist with clear responsibility for making all this happen
- have appropriate processes that manage all the risks proportionally.
"Focus on the basic steps that need to be done now in order to support something bigger and better in the future." Know what you have and get the precious stuff in a safe storage environment.
Tuesday, June 14, 2016
Digital Preservation: We have to get this right
"We have to get this right." Jennifer Paustenbaugh. Digital Preservation. Harold B. Lee Library, Brigham Young University. June, 2016.
Here are some recent email comments from Jennifer Paustenbaugh, our University Librarian, on digital preservation:
Here are some recent email comments from Jennifer Paustenbaugh, our University Librarian, on digital preservation:
- “We have to get this right. If we don't, then not much else that we’re doing in research libraries matters. If we don’t fully develop a sustainable digital preservation program, we could negatively impact whole areas of research, because materials created right now could just disappear. I think about gaps that exist in records because of man-made events and natural disasters. This could be a disaster of our own making.”
- "I truly believe that of all the things we’re doing in the library, this is the thing that has the potential to make the biggest difference to scholars 20 or 50 years from now. Much of the digital content that we are preserving will be gone forever if we don’t do this right. It’s a role that at once is formidable and humbling. And for most people, it will probably never be important until something that is vital to their research is just missing (and forever unavailable) from the historical record."
Monday, June 13, 2016
Macro & Micro Digital Preservation Services & Tools
Rosetta Users Group 2016: Macro & Micro Digital Preservation Services & Tools. Chris Erickson. June 7, 2016. [PDF slides]
This is my presentation at the Rosetta's User Group / Advisory Group held this past week in New York (I always enjoy these meetings; probably my favorite conference).
“We have to get this right. If we don't, then not much else that we’re doing in research libraries matters. If we don’t fully develop a sustainable digital preservation program, we could negatively impact whole areas of research, because materials created right now could just disappear. I think about gaps that exist in records because of man-made events and natural disasters. This could be a disaster of our own making.” Jennifer Paustenbaugh. University Librarian.
Since starting in this position in 2002, our digital preservation challenges have changed and increased. Re-evaluating where we are heading and how we proceed is important. A combination of broad visions and practical applications can ensure the future use of digital assets
This is my presentation at the Rosetta's User Group / Advisory Group held this past week in New York (I always enjoy these meetings; probably my favorite conference).
- Preservation Micro Services: free-standing applications that perform a single or limited number of tasks in the larger preservation process. The presentation includes some of those we use in our processes, both from internet sites and those that we have created in-house. Micro services are often used in the following processes:
- Capture
- Appraisal
- Processing
- Description
- Preservation
- Access
- Preservation Macro Services: Institutional services and directions that
assist organizations in identifying and implementing a combination of
policies, strategies, and tactics to effectively meet their preservation
needs. Some of these are:
- Digital Preservation Policy Framework
- Workflows
- Storage plans
- Financial Commitment and
- Engaging the Community
“We have to get this right. If we don't, then not much else that we’re doing in research libraries matters. If we don’t fully develop a sustainable digital preservation program, we could negatively impact whole areas of research, because materials created right now could just disappear. I think about gaps that exist in records because of man-made events and natural disasters. This could be a disaster of our own making.” Jennifer Paustenbaugh. University Librarian.
Since starting in this position in 2002, our digital preservation challenges have changed and increased. Re-evaluating where we are heading and how we proceed is important. A combination of broad visions and practical applications can ensure the future use of digital assets
Thursday, June 02, 2016
The Three C’s of Digital Preservation: Contact, Context, Collaboration
The Three C’s of Digital Preservation: Contact, Context, Collaboration. Brittany. DigHist Blog. May 5, 2016.
The post looks at three themes from learning about digital preservation: "every contact leaves a trace, context is crucial, and collaboration is the key".
Contact: A digital object is more than we see, and we need to take into consideration the hardware, software, code, and everything that runs underneath it. There are "layers and layers of platforms on top of platforms for any given digital object", the software, the browser, the operating system and others. These layers or platforms are constantly obsolescing or changing and "cannot be relied upon to preserve the digital objects. Especially since most platforms are proprietary and able to disappear in an instant."
Context is Crucial: "There’s no use in saving everything about a digital object if we don’t have any context to go with it." Capture the human experience with the digital objects.
Collaboration is the Key: "There are a number of roles played by different people in digital preservation, and these roles are conflating and overlapping." As funding becomes tighter and the digital world more complex, "collaboration is going to become essential for a lot of digital preservation projects".
There are still many unanswered questions that need to be asked and answered.
The post looks at three themes from learning about digital preservation: "every contact leaves a trace, context is crucial, and collaboration is the key".
Contact: A digital object is more than we see, and we need to take into consideration the hardware, software, code, and everything that runs underneath it. There are "layers and layers of platforms on top of platforms for any given digital object", the software, the browser, the operating system and others. These layers or platforms are constantly obsolescing or changing and "cannot be relied upon to preserve the digital objects. Especially since most platforms are proprietary and able to disappear in an instant."
Context is Crucial: "There’s no use in saving everything about a digital object if we don’t have any context to go with it." Capture the human experience with the digital objects.
Collaboration is the Key: "There are a number of roles played by different people in digital preservation, and these roles are conflating and overlapping." As funding becomes tighter and the digital world more complex, "collaboration is going to become essential for a lot of digital preservation projects".
There are still many unanswered questions that need to be asked and answered.
Subscribe to:
Posts (Atom)