The two institutions behind the Netarchive.dk, the Royal Library and the State and University Library, have developed a system for web archiving called NetarchiveSuite, which has become open source.
An update on web archiving activity. The Act on Legal Deposit of Published Material of 22 December 2004 provides their legal foundation. The major issue for web archiving is access, which is currently limited to researchers. These come under their data protection act.
Their archive contains about 112 Terabytes with about 3.5 billion objects. The Top Level Domain DK now has more than 1.3 mill. domain names of which about 1 mill. are active. In addition they harvest about 44.000 Danish sites on other domains.
The system provides technical information on harvests but it is important to document the decisions made on what to collect and not collect so that future researchers may know the content of the archive.
No comments:
Post a Comment