The potential risk of loss seems distant and theoretical until it actually happens. The "potential impact of that loss increases exponentially" for a university when the loss is part of the research output. This excellent article looks at a case study of the challenges one university library encountered with its electronic theses and dissertations (ETDs). Many institutions have been changing from publishing paper theses and dissertations to accepting electronic copies. One of the challenges that has not received as much attention is that of preserving these electronic documents for the long term. The electronic documents require more hands-on curation.
Texas Tech University encountered difficulties with preserving their ETD collection. They hope the lessons learned from these data losses will help other organizations looking to preserve ETDs and other types of digital files and collections. Some of the losses were:
- Loss of metadata edits. Corrupted database and corrupted IT backups required a rebuild of the database, but the entered metadata was lost.
- Loss of administrative metadata-embargo periods. The ETD-db files imported into DSpace did not include the embargoed files. Plans were not documented and personnel changed before the problem was discovered. Some items were found accidentally on a personal drive years later.
- Loss of scanned files. The scanning server was also the location to store files after scanning. Human error beyond the backup window resulted in the deletion of over a thousand scanned ETDs, which were eventually recovered.
- Failure of policies: loss of embargo statuses changes. The embargo statement recorded in the ETD management system did not match what was published in DSpace.
- Systems designed for managing or publishing documents are not preservation solutions
- System backups are not reliable enough to act as a preservation copy. Institutions must make digital preservation plans beyond backups
- Organizations with valuable digital assets should invest in their items to store them outside of a display system only.
- Multiple copies of digital items must reside on different servers in order to guarantee that files will not be accidentally deleted or lost through technical difficulties.
- All metadata, including administrative data, should be preserved outside of the display system. The metadata is a crucial part of the digital item.
- Digital items are collections of files and metadata.
- Maintaining written procedures and documentation for all aspects of digital collections is vital.
- The success of digital preservation will require collaboration between curators and the IT people who maintain the software and hardware, and consistent terminology (e.g. archived).