After 10 years of the UK web archive, what has been saved? Three collections, over 8 billion resources, and 160 TB of compressed data. "Looking inward is not enough: To understand the value of our collection, we need to look beyond our walls and put it in context." A review shows how much has been lost from the web. Almost 100% of the crawled urls in the UK web archive, are gone or missing on the internet. And about 40% from 2013 is gone or missing. Link rot & content drift dominate:
- 50% of resources unrecognisable or gone after 1 year
- 60% after 2 years, 65% after 3 years (islands of stability)
- Noticeably higher rot rate than results for legal/academic web
Simple similarity measures provides some insights, but there needs to be more work to look for old content in new locations.