Showing posts with label best practices. Show all posts
Showing posts with label best practices. Show all posts

Friday, October 14, 2016

Digital Preservation Program: Levels of Digital Preservation Support

Digital Preservation Program. South Dakota State Historical Society. 2015.
     A look at the South Dakota State Archives webpage concerning the levels of digital preservation.  They are committed to collecting, preserving, and providing access to their materials.

Levels of Digital Preservation Support:  The Archives has established three distinct levels of preservation support for digital archival materials that will be applied to digital materials at the time of accession. The levels are:
  • Full Support:  The Archives will take all reasonable actions to maintain usability including migration, emulation, or normalization and will ensure data fixity for all original and transformed files and will provide access to transformed files.
  • Limited Support:  The Archives will take limited steps to maintain usability and undertake strategic monitoring. They may actively transform a file from one format to another to mitigate format obsolescence, and will ensure data fixity for all original and transformed files and will provide access to transformed files.
  • Basic Support: The Archives will provide access to the item in its submission file format only and will work to ensure data fixity of the submitted file. No transformations will be enacted on these files for preservation purposes.
The archives also has created a chart that outlines the preservation tasks associated with each level of preservation support. The tasks are:
  • Create preservation metadata for accessibility, provenance, and management
  • Perform fixity checks on a regular basis using proven checksum methods
  • Periodically refresh storage media
  • Provide for discovery of objects via online descriptive finding aid  
  • Undertake strategic monitoring of file format
  • Plan and perform file normalization if necessary
  • Plan and perform migration to succeeding format upon obsolescence
  • Offer long-term storage in a trusted preservation-worthy format

Tuesday, September 27, 2016

Digital Preservation File Names

Digital Preservation File Names. Chris Erickson. September 27, 2016. Updated 31 Oct. 2016.
     While processing some collections, we had difficulty creating the mets xml files because of some characters in the file names. The characters may be valid in some systems, but may cause difficulties in others. From comments on the internet it appears that there are only a few characters that are forbidden, but experiences from a number of people suggest that some systems may not support all the characters in file names. We decided that it was better to use only alpha numeric characters, and underscores as a separator, and a fullstop (period) before the extension.  When preserving digital files it is important to remember that the files may be used by a variety of computer systems over their life time. To have the greatest chance of keeping the files usable in the future it is best to follow some basic standards when naming files.

Here are some suggestions we are considering:
  1. Decide on file naming conventions so that file names have meaning.
  2. File extensions can help determine the type of file it is (such as .txt, .doc, .wav, .jpg)
  3.  File name length varies for different operating systems, so generally stay under 30 characters
  4. Avoid spaces in file names. Spaces are an acceptable character for most file names, but they can cause difficulty when processing. Underscores may be used as a separator.
  5. Avoid punctuation and special characters. The safest characters to use are numbers and letters. Most operating systems are case sensitive. Some characters to avoid for our preservation system are spaces, ampersands, brackets, and commas
  6. Keep the filenames to a reasonable length and it is best if they are under 30 characters.
  7. Don’t start or end the filename with a space, special characters, or punctuation marks.
  8. These conventions apply to folders as well as files
Characters that others have had difficulties with and which should not be used in filenames:

# pound                      < left angle bracket               $ dollar sign                      + plus sign
% percent                   > right angle bracket             ! exclamation point           ` backtick
& ampersand             * asterisk                               ‘ single quotes                   | pipe
{ left bracket              ? question mark                     “ double quotes                = equal sign
} right bracket            / forward slash                       : colon                                      
\ back slash                 blank spaces                          @ at sign

It appears that xml in general has a specific problem with ampersands and brackets in file names. Some other resources of information:


Friday, October 02, 2015

Bit Preservation: How do I ensure that data remains unharmed and readable over time?

Bitbevaring – hvordan sikrer jeg, at data forbliver uskadte og læsbare over tid? Eld Zierau, Det Kongelige Bibliotek.  Original November 2010; edited January 2015.
       Preservation of bits ensures that the values ​​and order of digital bits is correct, undamaged and readable. The bits are the same as when they were received, and by managing them they will be available in the future. If the bits are changed, in the best case the object will appear different, and in the worst case the object will be unreadable in the future. Fixity can only ensure that the bits are the same; it is important along with bit preservation to plan for the logical preservation as well to make sure that the file can be rendered.
Bit security is based upon assessing the risks to the objects and then protecting the objects from events that will change the bits. The more you protect the bit integrity of the files, the more confidence you have that the files are accurately preserved.

The traditional method of file security is to make multiple copies. Those copies must be checked regularly for errors that would then need to be corrected. All copies are equally important and must be checked. You must also make sure that the copies will not be affected by the same failure event. It that happens and the error is not discovered, you could lose all copies. This is part of the risk assessment process, and you should consider the following items in order to make sure at least one copy is intact:
  • Number of copies stored: The more copies stored, the more likely that at least one copy is intact
  • Frequency of checking copies: The more often copies are checked, the more likely that at least one copy is intact
  • Copies are stored independently, such as type of hardware, organizational custody, or geographical location, the greater chance that the copies won't be affected by the same problem
Integrity Check: Use a checksum to verify the integrity of the file and store the information. This is like a fingerprint to determine which files have not changed.

Media migration: Storage media do not last forever, so the digital content must be migrated regularly. It is important that the different copies are not exposed to the migration process at the same time.

Other considerations of bit preservation include understanding the cost; determining the level of object security desired; confidentiality of materials. The Royal Library, the National Archives and the National Library are working together to provide Bitmagasinet, a shared hosted service is to store data by cooperating with each other, with copies on different media, in different locations and at different organizations.

Wednesday, August 05, 2015

The ABCs of Digital Preservation

The ABCs of Digital Preservation. Jenny Mullins. Preservation Services at Dartmouth College. July 21, 2015.
     The purpose of the presentation was to introduce some basic digital preservation concepts, such as choosing file formats, file naming best practices, and the basics of preservation metadata. Also discussed were  tools and models for managing digital materials. The slides of the talk are available. “File extensions tell computers how to read, open and render files. If a file extension is wrong or missing, your computer will be confused. Digital Preservation is about not confusing your computer.”

Formats: Choose open, widely supported file formats:
  • Images: .tif, .png
  • Text: .pdf, .rtf, .txt
  • Audio: .wav
  • Video: .avi, MPEG-2 or 4
  • Spreadsheets: .csv, .ods
File Naming: Provide unique descriptive names; be consistent.  Only use letters, number, underscores, periods. Include version number or date.

Identify and Describe the content:
  • Provenance: How was it created, and by whom? How and why has it changed?
  • Context: How does this file relate to others? What is needed to understand it?
  • Descriptive: Who? What? Where? Why? When?
  • Technical: What are the file characteristics?
  • Rights: Who owns it? Who’s allowed to use it and for what purpose?
Some general resources...
There are many software tools and training materials available for large scale data preservation.

Friday, July 31, 2015

Advice for our donors and depositors

Advice for our donors and depositors. Jenny Mitcham. Digital Archiving at the University of York. 25 October 2013.
     One of the best ways to ensure the longevity of your digital data is to plan for it at the point of creation. If data is created with long term archiving in mind and if a few simple and common sense data management rules are followed, then the files will be much easier for the file creator and the digital archivist to work with and to manage in the future. It is important for those who deposit digital materials in the archives to put good data management into practice. We should speak to them about this and the sooner the better.

Some of the tips include:
  • Name files sensibly
  • Organize files within a directory structure
  • Document your files
  • Always back up your files
  • Use anti-virus software

Some of the current 'hot topics' in the digital archiving world include:
  • How do you archive e-mails?
  • Is cloud storage safe?
  • What is wrong with pdf files?
  • What is the life span of a memory stick?

Related posts:

Thursday, July 30, 2015

DPOE Interview with Danielle Spalenka of the Digital POWRR Project

DPOE Interview with Danielle Spalenka of the Digital POWRR Project. Susan Manus, Barrie Howard. The Signal. July 20, 2015.
     Article about an interview with Danielle Spalenka, Project Director for the Digital POWRR Project. They had a National Leadership Grant to investigate digital preservation at institutions with limited resources. They have prepared a workshop, a white paper and the Tool Grid. The workshop, free through the end of 2016 with funding is from the NEH, looks at best practices and standards. 

Our review of the landscape of digital preservation instruction was that it is largely aimed at an audience beginning to come to grips with the idea that digital objects are subject to loss if we don’t actively care for them. There are lots of offerings discussing the theory of digital preservation – the “why” of the problem – and we found that there were limited opportunities to learn the “how” of digital preservation, both on the advocacy and technical sides. We also found that other great offerings, like the Digital Preservation Management Workshop Series based at MIT, had a tuition fee that was unaffordable for many prospective attendees, especially from under-funded institutions. Our goal in this phase is to make the workshops free to attend.

"A major goal of the workshop is to discuss specific tools and provide a hands-on portion so that participants could try a tool that they could apply directly at their own institutions." It provides an  overview of how digital preservation services and tools actually relate to the standards, how to use them in a workflow, and how to advocate for implementation. The POWRR Tool Grid is now maintained by COPTR (Community Owned digital Preservation Tool Registry).

Some recommendations for those just starting out:
  • First consider what type of tool you might be interested in (processing, storage, etc.) Looking at the specific function of a tool might be a good place to understand the wide variety of tools better.
  • A number of tools and services offer free webinars and information sessions to learn more about a specific tool. Download the tools to gain some hands-on experience.
  • Remember that digital preservation is an incremental process, and there are small steps you can take now to start digital preservation activities at your own institution. 
  • Remember you are not alone! 
  • See what others are doing and talking about. 

Related posts:

Wednesday, May 06, 2015

Preparing the Workforce for Digital Curation

Preparing the Workforce for Digital Curation. The National Academies Press. 2015.  
This 105 page report focuses on the need for digital curation education and training in order to provide meaningful use of digital information, now and in the future.  [PDF version] This study defines digital curation as: “The active management and enhancement of digital information assets for current and future use.” Digital curation is more than preserving the digital information in secure storage because curation may add value to digital information and increase its utility.

Digital curation is similar to traditional curation. "Regardless of whether a collection is physical or digital, a curator must appraise its value and relevance to the community of potential users; determine the need for preservation; document provenance and authenticity; describe, register, and catalog its content; arrange for long-term storage and preservation; and provide a means for access and use." But it also has many new challenges: the quantities of material to be curated, the need for active and ongoing management, continually changing uses and technology, and the diversity of organizational contexts in which curation occurs. It is more than simply collecting and storing data and information. Active management denotes planned, systematic, coordinated, purposeful, and directed actions that make digital information fit for a purpose. And to ensure that digital information will remain discoverable, accessible, and useable for as long as users have a need and a right to use it.

A new pattern of data usage puts a greater emphasis on the standardization of digital curation practices so that the data can be shared more easily.  Archiving digital data requires a more active management approach, and a more collaborative partnership between producers, archivists and users.

The Loss of Cultural Heritage Through Deterioration of Records and Technological Change: Sound recordings are a striking example of cultural heritage data at high risk of loss. These include music, oral histories, and radio broadcasts preserved in a wide variety of formats and media.

Some benefits of digital curation include:
  • Increased collaboration and cost sharing;
  • Greater use of data in teaching and research training;
  • New opportunities and uses for data, including data mining;
  • Creation of a more complete record of research;
  • Creation of new areas of research, new industries, or new support services.
Some principal conclusions:
  1. Significant opportunities exist to embed digital curation deeply into an organization’s practices to reduce costs and increase benefits. Digital curation will be increasingly in demand across many sectors of society.
  2. Digital curation can be advanced by various organizations that can serve as leaders, models, and sources of good curation practices, and build trust by preserving assets.
  3. Some barriers to digital curation include: lack of sharing of resources and insufficient resources.
  4. There is a need to identify, segregate, and measure the costs of curation tasks in scientific research and business processes.
  5. Standards and existing practices vary greatly, which can lead to a lack of coordination across different sectors. This in turn can lead to limited adoption of consistent standards for digital curation and fragmented dissemination of good practices.
  6. Automation of at least some digital curation tasks is desirable
  7. The knowledge and skills required of those engaged in digital curation are dynamic and highly interdisciplinary.
Some recommendations include:
  1. Research communities, educational institutions, and others should work together to develop and adopt digital curation standards and good practices.
  2. Work to identify and predict the costs associated with digital curation.
  3. Organizations should identify, explain, and measure the benefits derived from digital curation

Monday, March 30, 2015

Digital Preservation Challenges with an ETD Collection: A Case Study at Texas Tech University

Digital Preservation Challenges with an ETD Collection — A Case Study at Texas Tech University. Joy M. Perrina, Heidi M. Winkler, Le Yanga. The Journal of Academic Librarianship. January, 2015.
The potential risk of loss seems distant and theoretical until it actually happens. The "potential impact of that loss increases exponentially" for a university when the loss is part of the research output. This excellent article looks at a case study of the challenges one university library encountered with its electronic theses and dissertations (ETDs).  Many institutions have been changing from publishing paper theses and dissertations to accepting electronic copies. One of the challenges that has not received as much attention is that of preserving these electronic documents for the long term.  The electronic documents require more hands-on curation.

Texas Tech University encountered difficulties with preserving their ETD collection. They hope the lessons learned from these data losses will help other organizations looking to preserve ETDs and other types of digital files and collections. Some of the losses were:
  1. Loss of metadata edits. Corrupted database and corrupted IT backups required a rebuild of the database, but the entered metadata was lost.
  2. Loss of administrative metadata-embargo periods. The ETD-db files imported into DSpace did not include the embargoed files. Plans were not documented and personnel changed before the problem was discovered. Some items were found accidentally on a personal drive years later.
  3. Loss of scanned files. The scanning server was also the location to store files after scanning. Human error beyond the backup window resulted in the deletion of over a thousand scanned ETDs, which were eventually recovered.
  4. Failure of policies: loss of embargo statuses changes. The embargo statement recorded in the ETD management system did not match what was published in DSpace.
The library started on real digital preservation for the ETD collection. Funds were set aside to increase the storage of the archive space and provide a second copy of the archived files. A digital resources unit was created to handle the digital files which finally brought the entire digital workflow, from scanning to preservation, under one supervisor. The library joined DPN in hopes that it would yield a level of preservation far beyond what the university would be able to accomplish alone. The clean-up of the problems has been difficult and will take years to accomplish. Lessons learned:
  1. Systems designed for managing or publishing documents are not preservation solutions
  2. System backups are not reliable enough to act as a preservation copy. Institutions must make digital preservation plans beyond backups
  3. Organizations with valuable digital assets should invest in their items to store them outside of a display system only. 
  4. Multiple copies of digital items must reside on different servers in order to guarantee that files will not be accidentally deleted or lost through technical difficulties. 
  5. All metadata, including administrative data, should be preserved outside of the display system. The metadata is a crucial part of the digital item.
  6. Digital items are collections of files and metadata.
  7. Maintaining written procedures and documentation for all aspects of digital collections is vital.
  8. The success of digital preservation will require collaboration between curators and the IT people who maintain the software and hardware, and consistent terminology (e.g. archived).
 "Even though this case study has primarily been a description of local issues, the grander lessons gleaned from these crises are not specific to this institution. Librarians are learning and re-learning every day that digital collections cannot be managed in the same fashion as their physical counterparts. These digital collections require more active care over the course of their lifecycles and may require assistance from those outside the traditional library sphere...."

Friday, February 27, 2015

Data on the Web Best Practices

Data on the Web Best Practices. W3C First Public Working Draft. 24 February 2015.
This document provides best practices related to the publication and usage of data on the Web. Data should be discoverable and understandable by humans and machines and the efforts of the data publisher recognized.This will help the interaction between the publishers and users.

Data on the Web allows for the existence of multiple ways to represent and to access data which is a challenge. Some of the other challenges include: metadata, formats, provenance, quality, access, versions, and preservation. The Best Practices proposed should help data publishers and data consumers overcome the different challenges faced during the data life cycle on the web. The draft proposes best practices for each one of the described challenges.