One of the tasks was to fix the broken links in the catalogue. A report showed that of about 16,000 links to external resources, about 1,200 were non-functional (7.5%). There were ways to fix many of these, but about 10% of the links referred to documents which no longer existed. Many of these were government publications. The question was how to do this differently. They looked at adding materials into their own repository, which would allow them to solve the link rot problem while "building in a core level of digital preservation and increasing the discoverability of these documents. We were convinced that a citation which linked to a record in a Web archive was far more likely to survive than one which did not."
They needed to clarify the intellectual property rights, add descriptive metadata, such as the type of document, a collection name, subjects, and the organization that created the document. They also found that they "had to accept all common file formats at present. In practice, the majority are pdf, some MS Word and a few Excel files. It would, for preservation purposes, be preferable to convert and ingest in PDF/A format, at least for the textual formats. However our view was that the small overhead of batch migrating to that format at a later stage means it would be better to spend time upfront now on metadata rather than file conversion. We felt that this was a pragmatic response which meant that we would be working within the spirit of digital preservation best practice." Also, they found that "data-based formats such as Excel cannot be meaningfully integrated into a full-text search and that these objects would benefit from better representations."
Other things they learned include
- Placing files in a repository gives digital preservation to key documents in the subject field and eradicates the link rot problem.
- Adding high-quality metadata enhances the resource and allows it to hold its head high and become an integral part of a library's collection.
- A library can play an important role in preserving content as part of its long-term strategy and ensure high-quality resources remain available.
- The added value of being able to search the full text provides a potentially very rich resource for researchers.