Digital Preservation Matters: persistent ID

Showing posts with label persistent ID. Show all posts

Wednesday, February 04, 2015

The Cobweb. Can the Internet be archived?

The Cobweb. Can the Internet be archived? Jill Lepore. The New Yorker. January 26, 2015.

The average life of a Web page is about a hundred days. The pages can disappear through “link rot,” or people may see an updated web page where most likely the original has been overwritten. Or the page may have been moved and something else is where it used to be. This is known as “content drift.” This is worse than an error message since it’s impossible to tell that what you’re seeing isn’t what you went to look for: the overwriting, erasure, or moving of the original is invisible.

Link rot and content drift, collectively known as “reference rot,” have been disastrous for the law and courts. In providing evidence, legal scholars, lawyers, and judges often cite Web pages in their footnotes; they expect that evidence to remain where they found it as their proof. But a 2013 survey of law- and policy-related publications found that after six years, nearly fifty per cent of the URLs cited in those publications no longer worked. A Harvard Law School study in 2014 showed “more than 70% of the URLs within the Harvard Law Review and other journals, and 50% of the URLs within United States Supreme Court opinions, do not link to the originally cited information.”

The overwriting, drifting, and rotting of the Web also affects engineers, scientists, and doctors. Recently, researchers at Los Alamos National Laboratory reported the results of a study of three and a half million scholarly articles published in science, technology, and medical journals between 1997 and 2012: one in five links provided in the notes suffers from reference rot.

The problems with links disappearing has been known since the start of the internet. Tim Berners-Lee proposed the HTTP protocol to link web pages, and he had also considered a time axis for the protocol, but "preservation was not a priority.” Other internet pioneers are also concerned. Vint Cerf has talked about a need for a long-term storage “digital vellum”: “I worry that the twenty-first century will become an informational black hole.” Brewster Kahle started the Internet Archive, which has archived more than four hundred and thirty billion Web pages.

Herbert Van de Sompel has been working on Memento which allows a user to look at pages around the time it was written.

Digital Preservation Matters.

Monday, February 02, 2015

Websites Change, Go Away and Get Taken Down

Websites Change, Go Away and Get Taken Down. Website. January 2015.

Perma.cc is a beta service that allows users to create citation links that will never break.
When a user creates a link, Perma.cc archives a copy of the referenced content, and generates a link to an unalterable hosted instance of the site. Regardless of what may happen to the original source, if the link is later published by a journal using the Perma.cc service, the archived version will always be available through the Perma.cc link.

When readers click on a Perma.cc link they are directed to a page which points to either the original site (which may have changed since the link was created) or see the archived copy of the site in its original state.

Perma.cc is an online preservation service developed by the Harvard Law School Library in conjunction with university law libraries across the country and other organizations in the “forever” business.

Digital Preservation Matters.

Sunday, May 12, 2013

ZENODO. Research. Shared.

ZENODO. Research. Shared. Website. May 12, 2013.
ZENODO is a new open digital repository repository service that enables researchers, scientists, projects and institutions to share and showcase multidisciplinary research results (data and publications) that are not part of existing institutional or subject-based repositories. The repository is created by OpenAIRE and CERN, and supported by the European Commission. It promotes peer-reviewed openly accessible research; all items have a DOI, so they are citable. All formats are allowed. There is a 1GB per file size constraint. Data files are versioned, but records are not. Files may be deposited under closed, open, embargoed or restricted access.
It is named after Zenodotus, the first librarian of the Ancient Library of Alexandria and father of the first recorded use of metadata, a landmark in library history. ZENODO is provided free of charge for educational and informational use.

Digital Preservation Matters.

Wednesday, May 16, 2012

Implementing DOIs for Research Data.

Implementing DOIs for Research Data. Natasha Simons. D-Lib Magazine. May/June 2012.
As research becomes more collaborative and global it is also becoming more difficult to manage the large amounts of research data generated daily. The Digital Object Identifier (DOI) system is one way to create persistent identifiers for research data collections and datasets. "Data that is richly described, organised, integrated and connected allows the data to be more easily discovered by other researchers." Identifying such resources allow research data collections and datasets be open and discoverable to others, but there are questions that need to be answered, such as the type of material to get a persistent id, the granularity, whether the landing page or the resource should get the id, who creates and maintains the ids, and for how long. The questions, common to other institutions, should encourage discussion and collaboration.

Digital Preservation Matters.