The post looks at the many projects that have been launched to archive and preserve the digital world; the best known is the Internet Archive, "which has been crawling and preserving the open web for more than two decades" and has preserved more than 510 billion distinct URLs from over 361 million websites. The author asks: "With such an incredible repository of global society’s web evolution, why don’t we see more applications of this unimaginable resource?"
Some of the reasons that there isn't a more vibrant and active research and software development community around web archives may be:
- Economics plays a role,
- Complex nature of web archives
- The Internet Archive archive is over 15 petabytes, which is difficult to manipulate
- There aren't many tools that can use the archive, particularly indexing
"At the end of the day, web archives are our only record capturing the evolution of human society from the physical to the virtual domains. The Internet Archive in particular represents one of the greatest archives ever created of this immense transition in human existence and with the right tools and a greater focus on non-traditional avenues, perhaps we can launch a whole new world of research into how humans evolved into a digital existence."