Showing posts with label fixity. Show all posts
Showing posts with label fixity. Show all posts

Wednesday, September 13, 2017

Self-preservation: The Gibraltar National Archives uses cloud to safeguard its history

Self-preservation: The Gibraltar National Archives uses cloud to safeguard its history. Caroline Donnelly. ComputerWeekly. 13 September 2017.
     Many enterprises are familiar with the concept of retaining corporate data as part of their regulatory and compliance obligations. But some fail to understand that the data must be kept accessible. "While regulatory compliance is the key reason why many enterprises embark on this process in the corporate world, for the Gibraltar National Archives (GNA), digital preservation is an essential part of ensuring the annals of its cultural heritage and democratic history are safeguarded forever." After a long process of digitizing historical content, they realized that digitising content is not the same as preserving it. "The risk was we could have spent all this time and money doing digitisation only to lose [this information] a few years down the line because it is not preserved correctly.” Digital preservation is about:
  • actively managing the file formats
  • ensuring they remain readable in future
  • being proactive and managing the content
Just as it is important to be able to prove the provenance of physical records, the fixity of the digital documents needs to be maintained.  “People often ask me when our digital preservation project will be finished. I tell them never, because every day we are collecting records. Every day we are archiving unique material from newspapers to government records all for generations to come.”


Saturday, April 02, 2016

Protecting digital material: Strategies for digital preservation

Protecting digital material: Strategies for digital preservation. Matthew Miguez. Illuminations. March 25, 2016.
     An important question for digital preservation is: "How can you tell when a computer file has been corrupted?" A way to tell is by using checksums, which are "character strings generated by a class of algorithms called hash functions or cryptographic hashes". At the beginning of the digital preservation process, you should create a checksum for each file or bitstream. By rerunning the hash and comparing that output to the original checksum, you can test the integrity of your digital files. If a checksum comparison fails, you need to restore a copy of the failed file and verify it against the original checksum. "Digital preservation is a rapidly developing field. New challenges requiring new solutions arise every day." 

Related posts:
  

Monday, November 16, 2015

Fixity Architecting for Integrity

Fixity Architecting for Integrity. Scott Rife, Library of Congress, presentation. Designing Storage Architectures for Digital Collections 2015. September 2015. [PDF]
     The Problem: “This is an Archive. We can’t afford to lose anything!” They are custodians to the history of the United States and do not want to consider that the loss of data is likely to happen. The current solutions:
  • At least 2 copies of everything digital
  • Test and monitor for failures or errors
  • Refresh the damaged copy from the good copy
  • This process must be as automated as possible
  • Recognize that someday data loss will occur
Fixity is the process of verifying that a digital object has not been altered or corrupted. It is a function of the whole architecture of Archive/Long Term Storage (hardware, software, network, processes, people, budget)
What costs are reasonable to reduce the loss of data?
Need to understand the possible solutions.  How much more secure will our customers content be if:
  • There is a third, fourth or fifth copy?
  • All content is verified once a year versus every 5 years?
  • More money is spent on higher quality storage?
  • More staff are hired
RAID, erasure encoding, is at risk due to larger disk sizes. With storage, there is a wide variation in price, performance and reliability. Performance and reliability are not always correlated with price. Choose hardware combinations to limit likely failures based on your duty cycle

Tuesday, October 13, 2015

Presentations from Library of Congress Storage Architectures Symposium 2015

Presentations from Library of Congress Storage Architectures Symposium 2015. Clifford Lynch. CNI. October 12, 2015. [PDF files]
     The presentations from the Library of Congress 2015 Symposium on Storage Architectures for Digital Collections are now available. The presentations during the symposium include:
  • Technology Overview of Library of Congress Storage Architectures and also Industry
  • Technical Presentations: Tape Futures, Object Storage, Fixity and Integrity
  • Community Presentations
  • Alternative Media Presentations: Digital Optical, DNA
  • Look Back/Future Predictions of Storage

Friday, October 02, 2015

Bit Preservation: How do I ensure that data remains unharmed and readable over time?

Bitbevaring – hvordan sikrer jeg, at data forbliver uskadte og læsbare over tid? Eld Zierau, Det Kongelige Bibliotek.  Original November 2010; edited January 2015.
       Preservation of bits ensures that the values ​​and order of digital bits is correct, undamaged and readable. The bits are the same as when they were received, and by managing them they will be available in the future. If the bits are changed, in the best case the object will appear different, and in the worst case the object will be unreadable in the future. Fixity can only ensure that the bits are the same; it is important along with bit preservation to plan for the logical preservation as well to make sure that the file can be rendered.
Bit security is based upon assessing the risks to the objects and then protecting the objects from events that will change the bits. The more you protect the bit integrity of the files, the more confidence you have that the files are accurately preserved.

The traditional method of file security is to make multiple copies. Those copies must be checked regularly for errors that would then need to be corrected. All copies are equally important and must be checked. You must also make sure that the copies will not be affected by the same failure event. It that happens and the error is not discovered, you could lose all copies. This is part of the risk assessment process, and you should consider the following items in order to make sure at least one copy is intact:
  • Number of copies stored: The more copies stored, the more likely that at least one copy is intact
  • Frequency of checking copies: The more often copies are checked, the more likely that at least one copy is intact
  • Copies are stored independently, such as type of hardware, organizational custody, or geographical location, the greater chance that the copies won't be affected by the same problem
Integrity Check: Use a checksum to verify the integrity of the file and store the information. This is like a fingerprint to determine which files have not changed.

Media migration: Storage media do not last forever, so the digital content must be migrated regularly. It is important that the different copies are not exposed to the migration process at the same time.

Other considerations of bit preservation include understanding the cost; determining the level of object security desired; confidentiality of materials. The Royal Library, the National Archives and the National Library are working together to provide Bitmagasinet, a shared hosted service is to store data by cooperating with each other, with copies on different media, in different locations and at different organizations.

Wednesday, September 30, 2015

Checking Your Digital Content: What is Fixity, and When Should I be Checking It?

Checking Your Digital Content: What is Fixity, and When Should I be Checking It? Paula De Stefano, et al. NDSA. October 2014.
     A fundamental goal of digital preservation is to verify that a object has not changed over time or during transfer processes. This is done by checking the “fixity” or stability of the digital content. The National Digital Stewardship Alliance provides this guide to help answer questions about fixity.

Fixity, the property of a digital file or object being fixed or unchanged, is synonymous with bit-level integrity and offers evidence that one set of bits is identical to another. PREMIS defines fixity as "information used to verify whether an object has been altered in an undocumented or unauthorized way." The most widely used tools for fixity are checksums (CRCs) and cryptographic hashes (MD5 and SHA algorithms). Fixity is a tool but by itself it is not sufficient to ensure long-term access to digital information. The fixity information must be used, such as audits of the objects, replacement or repair processes, and other methods to show that the object is or will be understandable. Long term access means the ability to "make sense of and use the contents of the file in the future".

Fixity information helps answer three primary questions:
  1. Have you received the files you expected?
  2. Is the data corrupted or altered from what you expected?
  3. Can you prove the data/files are what you intended and are not corrupt or altered? 
Fixity has other uses and benefits as well, which include:
  • Support the repair of corrupt or altered files by knowing which copy is correct 
  • Monitor hardware degradation: Fixity checks that fail at high rates may be an indication of media failure.
  • Provide confidence to others that the file or object is unchanged
  • Meet best practices such as ISO 16363/TRAC and NDSA Levels of Digital Preservation
  • Support the monitoring of processes to monitor content integrity as content is moved
  • Document provenance and history by maintaining and logging fixity information
Workflows for checking the fixity of digital content includes:
  • Generating/Checking Fixity Information on Ingest
  • Checking Fixity Information on Transfer
  • Checking Fixity at Regular Intervals
  • Building Fixity Checking into Storage Systems
Considerations for Fixity Check Frequency include:
  • Storage Media: Fixity checks increases media use, which could increase the rate of failure
  • Throughput: Your rate of fixity checking will depend on how fast you can run the checks
  • Number and Size of Files or Objects: Resource requirements change as the scale of objects increase
Fixity information may be stored in different ways, which will depend on your situation, such as:
  • In the object metadata records
  • In databases and logs
  • Alongside content, such as with BagIt