Friday, October 24, 2014

The Many Uses of Rhizome’s New Social Media Preservation Tool.

The Many Uses of Rhizome’s New Social Media Preservation Tool.  Benjamin Sutton. Hyperallergic Media. October 21, 2014.
New York’s digital art nonprofit Rhizome is developing Colloq, a conservation tool to help artists preserve social media projects not only by archiving them, but by replicating the exact look and layout of the sites used, and the interactions with other users. The idea for Colloq came from the realization that Rhizome will be unable to accession new, contemporary Internet art if we don’t rethink archival practices. Colloq is still in its early stages of development.

Saturday, October 18, 2014

Investing in Curation: A shared path to sustainability.

Digital curation involves managing, preserving and adding value to digital assets over their entire lifecycle. The active management of digital assets maximises their reuse potential, mitigates the risk of obsolescence and reduces the likelihood that their long-term value will diminish. However, this requires effort so there are costs associated with this activity. As the range of organisations responsible for managing and providing access to digital assets over time continues to increase, the cost of digital curation has become a significant concern for a wider range of stakeholders.

Establishing how much investment an organisation should make in its curation activities is a difficult question. If a shared path can be agreed that allows the costs and benefits of digital curation to be collectively assessed, shared and understood, a wider range of stakeholders will be able to make more efficient investments throughout the lifecycle of the digital assets in their care. With a shared vision, it will be easier to assign roles and responsibilities to maximise the return on the investment of digital curation and to clarify questions about the supply and demand of curation services. This will foster a healthier and more effective marketplace for services and solutions and will provide a more robust foundation for tackling future grand challenges.

Situating the Roadmap:  The six messages in the roadmap have been carefully considered to effect a step change in attitudes over the next five years. It starts with a focus on the costs of digital curation—but the end point and the goal is to bring about a change in the way that all organisations think about and sustainably manage their digital assets.

D5.1 - Draft Roadmap ( PDF - 2.5 MB)

Thursday, October 16, 2014

Web archiving in the United States: a 2013 survey and NDSA Report.

Web archiving in the United States: a 2013 survey and NDSA Report. Jefferson Bailey, et al. National Digital Stewardship Alliance. September 2014. [PDF]

Report on a survey of organizations in the United States that are actively involved in, or planning to start, programs to archive content from the Web. Over half of the respondents were from colleges of universities. Respondents consider technical skills to be the most necessary to the development and success of their programs. Respondents are most interested in metrics relating to volume and usage. Most do not participate in collaborative archiving.  Overall the results suggest that web archiving programs are maturing and are moving towards standard practices.
  • 81% devote half or less of an FTE time to archiving the web
  •  40% indicated that knowledge of web technologies or archiving tools is essential
  • 58% capture web content without either notifying or seeking permission from content owners
  •  55% of respondents conditionally respect robots.txt
  •  63% use external web archiving services exclusively, a 3% increase over last survey
Concern about ability to archive types of content (multiple selections):
  • social media - 79%  
  • databases - 74%  
  • video 73% (63
  • interactive media 56%
  •  audio – 45%
  • blogs – 36%
  •  art – 17%

Wednesday, October 15, 2014

Functional Access To Electronic Media Collections using Emulation-as-a-Service.

Functional Access To Electronic Media Collections using Emulation-as-a-Service.  Thomas Bähr, et al. German National Library of Science and Technology. October 14, 2014.
Poster that looks at the emulation of CD collections. Examines:

  • User Layer: the ingest workflow, data evaluation and prioritization, license evaluation, and creation of images.
  • Workflow layer: retrieving image, evaluate rendering, technical metadata, object rendering.
  • Technical layer: EaaS environments, local resources, resource allocation

This won the Best Poster Award at iPRES 2014.

Wednesday, February 12, 2014

ISO Freely Available Standards

Freely Available Standards.

ISO Copyright for the freely available standards

The following standards are made freely available for standardization purposes. They are protected by copyright and therefore and unless otherwise specified, no part of these publications may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, microfilm, scanning, reproduction in whole or in part to another Internet site, without permission in writing from ISO. Requests should be addressed to the ISO Central Secretariat.

The documents you are about to download are a single-user, non-revisable Adobe Acrobat PDF file, to store on your personal computer. You may print out and retain one printed copy of the PDF file. This printed copy is fully protected by national and international copyright laws, and may not be photocopied or reproduced in any form. Under no circumstances may it be resold.

Monday, October 14, 2013

The WARC File Format (ISO 28500) - Information, Maintenance, Drafts

The WARC File Format (ISO 28500) - Information, Maintenance, Drafts.

I confirm - as convenor of the WARC format ISO working group - that there are no substantial modifications between the version on and the ISO standard, except some little editorial changes. So it may be used as a trustworthy reference.

As far as I know, ISO organization resources comes largely from the selling of their standards, so it is not possible to make them freely available, except in some cases. The case of ISO/IEC standards is one of these exceptions; it is due to the fact that the standards are developed by two organizations with different publication rules (ISO and IEC).
Even as convenor, I had no free copies of the standard.
I will check again with ISO secretariat but I doubt it will be legal to make freely available the official version.

This is a reason why there is a common practice to publish draft standards
- such as we did on BnF website.

Best regards,

Saturday, August 10, 2013

Game Walkthroughs As A Metaphor for Web Preservation

Game Walkthroughs As A Metaphor for Web Preservation. Michael Nelson. Web Science and Digital Libraries Research Group. May 25, 2013.
Somethings can't really be preserved digitally, such as computer games, even though it would be possible to create emulators. So for some, the best way to experience the game is though walk throughs on YouTube.
"I think game walkthroughs can provide us with an interesting metaphor for web archiving, not simply walkthroughs of web instead of game sessions (though that is possible), but in the sense of capturing a series of snapshots of dynamic services and archiving them.  Given "enough" snapshots, we might be able to reconstruct the output of a black box"

Google Maps is another site that has preservation issues. 

"There are a number of issues to be researched to make this easy enough for people to do (many of which our group is investigating), but the popularity of game walkthroughs and their preservation side-effects suggests to me that the web archiving community should be informed by them."

Sunday, July 14, 2013 Supports Memento Supports Memento. Web Science and Digital Libraries Research Group. July 9, 2013. a new page-at-a-time personal web archiving utility. It archives a single page on request. Features include a simple search/upload interface, a bookmarklet to push pages into the archive while reading, thumbnails and full-sized images of captured pages, and it now  supports Memento.


The age of data: Strategies for response

The age of data: Strategies for response. John W. Thompson. Computerworld. June 14, 2013.
The scale of data growth today is so massive it can be numbing. A recent study shows that "in the last minute there were 204 million emails sent, 61,000 hours of music listened to on Pandora, 20 million photo views and 3 million uploads to Flickr, 100,000 tweets, 6 million views and 277,000 Facebook logins, and 2 million plus Google searches." Data is continuing to grow at a phenomenal pace. The total of all digital data created and replicated will reach 4 zettabytes in 2013, almost 50 percent more than 2012. The growth of data also provides an opportunity for organizations to analyze the information being gathered and use it to its advantage. One of the things that has helped is the technology to reduce the amount of data by managing it and eliminate dozens and dozens of redundant copies. 

Edward R. Murrow's audio essays with the famous -- and not-so-famous -- have been digitized and put online

Edward R. Murrow's audio essays with the famous -- and not-so-famous -- have been digitized and put online. Computerworld. Lucas Mearian. July 12, 2013.
Over 800 oral essays from Edward R. Murrow's 1950s radio series, This I Believe, have been placed online for public use by Tufts University. The audio collection comes from almost 800 reel-to-reel tape recordings "that were nearly lost forever due to natural wear and tear from more than 50 years in less than ideal storage." The engineers captured the analogue recordings using a 96K, 24-bit high resolution WAV format.

Friday, July 12, 2013

NDSA Storage Report: Reflections on National Digital Stewardship Alliance Member Approaches to Preservation Storage Technologies

NDSA Storage Report: Reflections on National Digital Stewardship Alliance Member Approaches to Preservation Storage TechnologiesMicah Altman, et al. D-Lib Magazine. June 2013.

The structure and design of digital storage systems is a cornerstone of digital preservation.  To better understand ongoing storage practices of organizations committed to digital preservation, the National Digital Stewardship Alliance conducted a survey of member organizations. This article reports on the findings of the survey. 

Key Findings

The key findings from the survey were:
  • 90% of respondents are distributing copies of at least part of their content geographically.
  • 88% of respondents are responsible for their content for an indefinite period of time.
  • 80% of respondents use some form of fixity checking for their content.
  • 75% of respondents report a strong preference to host and control their own technical infrastructure for preservation storage.
  • 69% of respondents are considering, or currently participating in, a distributed storage cooperative or system (ex. LOCKSS alliance, MetaArchive, Data-PASS).
  • 64% of respondents are planning to make significant technological changes in their preservation storage architecture in the next three years.
  • 51% of respondents are considering or already using a cloud storage provider to keep one or more copies of their content.
  • 48% of respondents are considering, or currently contracting out, storage services to be managed by another organization or company.

WARCreate and WAIL: WARC, Wayback and Heritrix Made Easy.

WARCreate and WAIL: WARC, Wayback and Heritrix Made Easy.  July 10, 2013.
A goal of the Web Science and Digital Libraries Research Group is to assist in making web preservation accessible to regular users instead of just power users.  A few digital preservation software packages that were created by WS-DLers include:
  • Warrick - a utility for reconstructing/ recovering a website using various archives and caches.
  • Synchronicity - a Firefox extension that supports rediscovering missing web pages
  • mcurl - a command-line memento client
  • WARCreate - a Google Chrome extension that can create WARC files from any webpage 
  • Web Archiving Integration Layer (WAIL) - a re-packaged Wayback and Heritrix that aims to be "One-Click User Instigated Preservation"

Friday, June 21, 2013

JHOVE 1.10b3

JHOVE 1.10b3. Gary McGath. File Formats Blog.

Saturday, June 15, 2013

EPUB for archival preservation: an update

EPUB for archival preservation: an update. Johan van der Knijff's blog on Open Planets.
In 2012  the KB released a report on the suitability of the EPUB format for archival preservation. A substantial number of EPUB-related developments have happened since then, and as a result some of the report's findings and conclusions have become outdated, particularly the observations on EPUB 3, and the support of EPUB by characterisation tools. This blog post provides an update to those findings :
  • Use of EPUB in scholarly publishing
  • Adoption and use of EPUB 3
  • EPUB 3 reader support
  • Support of EPUB by characterisation tools
The use of EPUB is increasing and a number of publishers are all using EPUB 2. Also, a number of organisations representing the publishing industry support EPUB 3, though the actual use of EPUB 3 is still limited.The 2012 report concluded that EPUB was not optimally supported by characterisation tools. This situation has improved quite a lot since that time. EPUB is now included in PRONOM, and DROID.  Overall, EPUB's credentials as a preservation format appear to have improved quite a bit over the last year.

Friday, June 14, 2013

EPUB for archival preservation

EPUB for archival   preservation. Johan van der Kniff. KB/National Library of the Netherlands. 20 July 2012. 
The EPUB format has become increasingly popular in the consumer market. A number of publishers have indicated their wish to use EPUB for supplying their electronic publications to the KB. This document looks at the characteristics and functionality of the format, and whether or not it is suitable for preservation.  Conceptually, an EPUB file is just an ordinary ZIP archive which includes one or more XHTML files, in one or more directories.  Cascading Style Sheets are used to define layout and formatting. A number of XML files provide metadata.

EPUB has a number of strengths that make it attractive for preservation. It is an open format that is well documented, and there are no known patents or licensing restrictions. The format's specifications are freely available. It is largely based on well‐established and widely‐used standards so it scores high marks for transparency and re‐usability. For situations where authenticity is crucial (e.g. legal documents) all or parts of a document can be digitally signed. Also, EPUB 2 is a popular format with excellent viewer support, including several open source implementations. There is concern that its role is limited because the current e‐book market is dominated by proprietary formats. And EPUB3 is currently less stable. There is a chart of recommendations for using EPUB.

Strategy for archiving digital records at the Danish National Archives

Strategy for archiving digital records at the Danish National Archives. Statens Arkiver. January 2013.
Their aim is to ensure the preservation of records that are of historical value, or that serve as documentation of significant administrative matters or legal importance for citizens and
authorities. The vision is to ensure that digital records are preserved so as to maintain their authenticity, and so that they can be found and reused. Preserving digital information for the long term, in a form that makes it reusable, requires some deliberate choices to be made in terms of methods, technologies and documentation. Digital preservation must also take economic considerations into account.

The basic strategy choice faced by preservation institutions is whether to pursue an emulation strategy or a migration strategy. This will determine how digital preservation in the institution is organised. The Danish National Archives have chosen  a migration strategy which requires that the Archives to migrate digital records to a few, well-defined standard formats, and from time to time, be migrated to new formats and structures.

The Danish National Archives’ strategy must not be dependent on continuous access to the system
in which the data was originally created. It must be possible to interpret and re-use data in other systems. The term “original” cannot be applied in the same way to digital records. Whether data
is extracted from tables in a database or digital documents, a representation of the content is preserved in the preservation format. A digital archive primarily preserves data or information. The key aspect is the preservation of authentic information. The implementation of the strategy requires
  1. Early identification and approval of systems for submission purposes
  2. Frequent submission in non-system dependent format
  3. Ongoing planning of preservation and periodical migration to a new preservation format
The Archives uses distributed digital preservation by keeping several identical copies on several different types of media, both optical and magnetic, at several different geographical locations. The Archives also conducts ongoing preservation planning and continuously adjusts the
implementation of its strategy so that the vision remains attainable and within its reach.