Wednesday, February 04, 2015

The Cobweb. Can the Internet be archived?

The Cobweb. Can the Internet be archived? Jill Lepore. The New Yorker. January 26, 2015.

The average life of a Web page is about a hundred days. The pages can disappear through “link rot,” or people may see an updated web page where most likely the original has been overwritten. Or the page may have been moved and something else is where it used to be. This is known as “content drift.” This is worse than an error message since it’s impossible to tell that what you’re seeing isn’t what you went to look for: the overwriting, erasure, or moving of the original is invisible.

Link rot and content drift, collectively known as “reference rot,” have been disastrous for the law and courts. In providing evidence, legal scholars, lawyers, and judges often cite Web pages in their footnotes; they expect that evidence to remain where they found it as their proof. But a 2013 survey of law- and policy-related publications found that after six years, nearly fifty per cent of the URLs cited in those publications no longer worked. A Harvard Law School study in 2014  showed “more than 70% of the URLs within the Harvard Law Review and other journals, and 50% of the URLs within United States Supreme Court opinions, do not link to the originally cited information.”

The overwriting, drifting, and rotting of the Web also affects engineers, scientists, and doctors. Recently, researchers at Los Alamos National Laboratory reported the results of a study of three and a half million scholarly articles published in science, technology, and medical journals between 1997 and 2012: one in five links provided in the notes suffers from reference rot.

The problems with links disappearing has been known since the start of the internet. Tim Berners-Lee proposed the HTTP protocol to link web pages, and he had also considered a time axis for the protocol, but "preservation was not a priority.” Other internet pioneers are also concerned. Vint Cerf has talked about a need for a long-term storage “digital vellum”:  “I worry that the twenty-first century will become an informational black hole.” Brewster Kahle started the Internet Archive, which has archived more than four hundred and thirty billion Web pages.

Herbert Van de Sompel has been working on Memento which allows a user to look at pages around the time it was written.

