This short
paper describes their preservation workflow for digitized documents and the
in-house mass digitization workflow, based on the Kitodo software, and the three
major challenges encountered.
- validating and checking the target file format and the constraints to it,
- handling updates of d content already submitted to the preservation system,
- checking the integrity of all archived data in an affordable way
To ensure robustness, only single page, uncompressed TIFF files are accepted. They use the open-source tool checkit-tiff to check files against a specified configuration. To deal with AIP updates, files can be submitted multiple times: the first time is an ingest, all transfers after that are updates. Rosetta ingest functions can add, delete, or replace a file. Rosetta can also manage multiple versions of an AIP, so older versions of digital objects remain accessible for users.
They manage three copies of the data, which totals 120 TBs. An integrity check of all digital documents, including the three copies, is not feasible due to the time that is required to read all data from tape storage and check them. So to get reliable results without checking all data in the archive they use two different methods:
- Sample Method Integrity 1% sample of archival copies is checked yearly
- Specified fixed bit pattern workflow that is checked quarterly.
Their current challenges are in developing new media types (digital video, audio, photographs and pdf documents), unified pre-ingest processing, and automation of processes (e.g. to perform tests of new software versions).
No comments:
Post a Comment