Friday, October 24, 2014

The Many Uses of Rhizome’s New Social Media Preservation Tool.



The Many Uses of Rhizome’s New Social Media Preservation Tool.  Benjamin Sutton. Hyperallergic Media. October 21, 2014.
New York’s digital art nonprofit Rhizome is developing Colloq, a conservation tool to help artists preserve social media projects not only by archiving them, but by replicating the exact look and layout of the sites used, and the interactions with other users. The idea for Colloq came from the realization that Rhizome will be unable to accession new, contemporary Internet art if we don’t rethink archival practices. Colloq is still in its early stages of development.

Friday, October 17, 2014

Safeguard the Future of Your Data: Digital Preservation Technology for the U.S. Federal Market.

Safeguard the Future of Your Data: Digital Preservation Technology for the U.S. Federal Market. Hitachi brochure. 2014.
Hitachi’s Digital Preservation Platform (HDPP) is a non-magnetic storage solution that has the ability to preserve unlimited amounts of data for decades on end with minimal migration. The projected capacity of the storage solution is 1 PB per rack by the end of 2014. Offline media is also supported.

Cost-efficiency is another factor when considering long-term preservation. Traditional archives use a migration strategy that requires regular media refreshing which has proven to be costly over time. Migration is an ongoing process that takes a significant amount of resources.

Blu-ray optical media and M-DISC media ensure longevity and compatibility across generations of technology so the data can still be accessible as formats continue to evolve. Blu-ray discs are projected to 1 TB per disc. Mdisc capability is currently at 25 GB per disc, with plans for 300 GB per disc. Brochure also includes quick specs and diagrams.
 

Millenniata Announces Results of ISO/IEC 10995 Standard Tests – Storage Newsletter

Millenniata Announces Results of ISO/IEC 10995 Standard Tests – Storage Newsletter. Press Release. StorageNewsletter.com. 2013.06.10.
Millenniata, Inc. announced the results of its longevity test program based on the ISO/IEC 10995 standard. The results showed the median expected life of the M-DISC DVD was 1,332 years.  The same tests showed that other archival DVDs have an expected life of only 2.7 to 3.0 years.

Thursday, October 16, 2014

Web archiving in the United States: a 2013 survey and NDSA Report.


Web archiving in the United States: a 2013 survey and NDSA Report. Jefferson Bailey, et al. National Digital Stewardship Alliance. September 2014. [PDF]

Report on a survey of organizations in the United States that are actively involved in, or planning to start, programs to archive content from the Web. Over half of the respondents were from colleges of universities. Respondents consider technical skills to be the most necessary to the development and success of their programs. Respondents are most interested in metrics relating to volume and usage. Most do not participate in collaborative archiving.  Overall the results suggest that web archiving programs are maturing and are moving towards standard practices.
  • 81% devote half or less of an FTE time to archiving the web
  •  40% indicated that knowledge of web technologies or archiving tools is essential
  • 58% capture web content without either notifying or seeking permission from content owners
  •  55% of respondents conditionally respect robots.txt
  •  63% use external web archiving services exclusively, a 3% increase over last survey
Concern about ability to archive types of content (multiple selections):
  • social media - 79%  
  • databases - 74%  
  • video 73% (63
  • interactive media 56%
  •  audio – 45%
  • blogs – 36%
  •  art – 17%


Wednesday, October 15, 2014

Functional Access To Electronic Media Collections using Emulation-as-a-Service.



Functional Access To Electronic Media Collections using Emulation-as-a-Service.  Thomas Bähr, et al. German National Library of Science and Technology. October 14, 2014.
Poster that looks at the emulation of CD collections. Examines:

  • User Layer: the ingest workflow, data evaluation and prioritization, license evaluation, and creation of images.
  • Workflow layer: retrieving image, evaluate rendering, technical metadata, object rendering.
  • Technical layer: EaaS environments, local resources, resource allocation

This won the Best Poster Award at iPRES 2014.

Friday, September 26, 2014

Library of Congress Recommended Format Specifications: Encouraging Preservation Without Discouraging Creation

Library of Congress Recommended Format Specifications: Encouraging Preservation Without Discouraging Creation. Theron Westervelt. Library of Congress. D-Lib Magazine September/October 2014.
The Library of Congress has relied upon the specifications included in the Copyright regulation known as the 'Best Edition Statement', a hierarchy of preference between certain physical characteristics in creative works, and its digital format sustainability.
The Library has devised the Recommended Format Specifications to enable it to identify what formats will most easily lend themselves to preservation and long-term access, especially with regard to digital formats. This was also done to provide guidance to its staff in their work of acquiring content for its collection and to  share with other stakeholders that have a need and interest in preservation and access. To ensure ongoing accuracy and relevancy, the Library of Congress will be reviewing and revising the specifications on an annual basis and welcomes feedback and input from all interested parties.

The Recommended Format Specifications are not intended to answer all questions raised in preserving and providing long-term access to creative content. They are to provide guidance on identifying sets of formats which are not drawn so narrowly as to discourage creators from working within them, but will instead encourage creators to use them to produce works in formats which will make preserving them and making them accessible simpler. They also are to identify the characteristics of creative works which best enable them to last and to be accessible in the long-term.

Thursday, September 25, 2014

The Twelve Principles of Digital Preservation (and a cartridge in a repository…)

The Twelve Principles of Digital Preservation (and a cartridge in a repository…). Christina Duffy. British Library, Collection Care blog. 03 September 2013.
Following the library's strategic priorities are Twelve Principles of Digital Preservation. These principles define at a very high level how the library will approach the preservation of digital collections:

We integrate curatorial assessments of our digital collection content into preservation decisions, so that technical activities support curatorial requirements for the collections
  1. We preserve metadata about our digital collections, so that we may understand and preserve the collections over time
  2. We preserve the provenance of our digital collection content, so that we understand and can demonstrate its authenticity over time
  3. We record any modifications to digital collection content (e.g. preservation action, normalisation) during the lifecycle, so that we can understand and demonstrate its integrity over time
  4. We consistently apply and document our application of metadata standards, so that future generations can understand our collections
  5. We maintain file-level integrity of our digital collections, so that we can protect against loss and damage
  6. We preserve original files in our long term repository, alongside any other required representations of the content, so that we maintain the original artefacts acquired or deposited into our care as a ground truth representation of the content for future, currently unknown, preservation and access scenarios
  7. We maintain Preservation Master copies of collection content in our long term repository, so that the format-based risks of preservation over time are minimised
  8. We maintain and implement preservation plans for our digital collections, so that preservation actions are reliable and based on a holistic understanding of the collections and their context
  9. We implement comprehensive end-to-end workflows, so that we may consistently manage and preserve our digital collections across the entire lifecycle
  10. We regularly monitor our digital collection content for emergent preservation risks, so that we may mitigate against them
  11. We integrate quality assurance checks into the lifecycle where appropriate, so that the authenticity and integrity of the content is maintained
These Principles are the first output of a workstream dedicated to defining the Library’s digital preservation standards, which will help ensure the strategic priorities are met. Other activities include:
  • collection profiling: provide top level descriptions and preservation direction for different types of digital collections
  • risk and preservation condition assessment: for content temporarily stored outside of our long term digital repository
  • file format assessment: define preferred preservation formats for different types of content
  • tool assessment: evaluate the performance of different tools on library content so that evidence-based recommendations can be made on which tools to use in which context
  • training: ensure colleagues across the library are aware of digital preservation responsibilities, requirements, and recommendations relevant whilst content is in their care
 
These principles define at a very high level how we as a Library will approach the preservation of our digital collections:

1. We integrate curatorial assessments of our digital collection content into preservation decisions, so that technical activities support curatorial requirements for the collections
2. We preserve metadata about our digital collections, so that we may understand and preserve the collections over time
3. We preserve the provenance of our digital collection content, so that we understand and can demonstrate its authenticity over time
4. We record any modifications to digital collection content (e.g. preservation action, normalisation) during the lifecycle, so that we can understand and demonstrate its integrity over time
5. We consistently apply and document our application of metadata standards, so that future generations can understand our collections
6. We maintain file-level integrity of our digital collections, so that we can protect against loss and damage
7. We preserve original files in our long term repository, alongside any other required representations of the content, so that we maintain the original artefacts acquired or deposited into our care as a ground truth representation of the content for future, currently unknown, preservation and access scenarios
8. We maintain Preservation Master copies of collection content in our long term repository, so that the format-based risks of preservation over time are minimised
9. We maintain and implement preservation plans for our digital collections, so that preservation actions are reliable and based on a holistic understanding of the collections and their context
10. We implement comprehensive end-to-end workflows, so that we may consistently manage and preserve our digital collections across the entire lifecycle
11. We regularly monitor our digital collection content for emergent preservation risks, so that we may mitigate against them
12. We integrate quality assurance checks into the lifecycle where appropriate, so that the authenticity and integrity of the content is maintained

These Principles are the first output of a workstream dedicated to defining the Library’s digital preservation standards. More work is already underway to define the policies that will be associated with each principle and, in turn, the resulting requirements for meeting that policy. This workstream is part of a larger programme of work being undertaken in digital preservation to ensure our strategic priorities are met. Other activities include:
• a collection profiling exercise to provide top level descriptions and preservation direction for different types of digital collections (eg e-theses, web archives, ebooks, AV material etc)
• a risk and preservation condition assessment exercise for content temporarily stored outside of our long term digital repository
• a file format assessment exercise to define preferred preservation formats for different types of content
• a tool assessment exercise to evaluate the performance of different tools on library content so that evidence-based recommendations can be made on which tools to use in which context
• a training programme to ensure colleagues across the library are aware of digital preservation responsibilities, requirements, and recommendations relevant whilst content is in their care
- See more at: http://britishlibrary.typepad.co.uk/collectioncare/2013/09/the-twelve-principles-of-digital-preservation.html#sthash.y4lgHR40.dpuf
These principles define at a very high level how we as a Library will approach the preservation of our digital collections:

1. We integrate curatorial assessments of our digital collection content into preservation decisions, so that technical activities support curatorial requirements for the collections
2. We preserve metadata about our digital collections, so that we may understand and preserve the collections over time
3. We preserve the provenance of our digital collection content, so that we understand and can demonstrate its authenticity over time
4. We record any modifications to digital collection content (e.g. preservation action, normalisation) during the lifecycle, so that we can understand and demonstrate its integrity over time
5. We consistently apply and document our application of metadata standards, so that future generations can understand our collections
6. We maintain file-level integrity of our digital collections, so that we can protect against loss and damage
7. We preserve original files in our long term repository, alongside any other required representations of the content, so that we maintain the original artefacts acquired or deposited into our care as a ground truth representation of the content for future, currently unknown, preservation and access scenarios
8. We maintain Preservation Master copies of collection content in our long term repository, so that the format-based risks of preservation over time are minimised
9. We maintain and implement preservation plans for our digital collections, so that preservation actions are reliable and based on a holistic understanding of the collections and their context
10. We implement comprehensive end-to-end workflows, so that we may consistently manage and preserve our digital collections across the entire lifecycle
11. We regularly monitor our digital collection content for emergent preservation risks, so that we may mitigate against them
12. We integrate quality assurance checks into the lifecycle where appropriate, so that the authenticity and integrity of the content is maintained

These Principles are the first output of a workstream dedicated to defining the Library’s digital preservation standards. More work is already underway to define the policies that will be associated with each principle and, in turn, the resulting requirements for meeting that policy. This workstream is part of a larger programme of work being undertaken in digital preservation to ensure our strategic priorities are met. Other activities include:
• a collection profiling exercise to provide top level descriptions and preservation direction for different types of digital collections (eg e-theses, web archives, ebooks, AV material etc)
• a risk and preservation condition assessment exercise for content temporarily stored outside of our long term digital repository
• a file format assessment exercise to define preferred preservation formats for different types of content
• a tool assessment exercise to evaluate the performance of different tools on library content so that evidence-based recommendations can be made on which tools to use in which context
• a training programme to ensure colleagues across the library are aware of digital preservation responsibilities, requirements, and recommendations relevant whilst content is in their care
- See more at: http://britishlibrary.typepad.co.uk/collectioncare/2013/09/the-twelve-principles-of-digital-preservation.html#sthash.y4lgHR40.dpuf

Tuesday, August 12, 2014

Facebook uses 10,000 Blu-ray discs to store 'cold' data

Facebook uses 10,000 Blu-ray discs to store 'cold' data. Jon Brodkin. Ars Technica. Jan 29, 2014.
Facebook showed a prototype of its Blu-ray data-center storage system at the Open Compute Project summit meeting, which it plans to expand to 5 PB. The system was designed to store data that hardly ever needs to be accessed, or for so-called “cold storage.”  The Blu-ray system reduces costs by 50 percent and energy use by 80 percent compared with its current cold-storage system, which uses hard disk drives.

Why Facebook thinks Blu-ray discs are perfect for the data center. Jon Brodkin. Ars Technica. Jan 31, 2014.
While the Blu-ray storage system is just a prototype, Facebook hopes to get it in production sometime this year and share the design with the Open Compute Project community to spur adoption elsewhere.
"Economies of scale could take over really quickly, and they could start producing those discs for the Open Compute community at much lower cost than they do today because, believe it or not, this is one of those areas where really high-capacity Blu-ray discs are in relatively low demand on the consumer side and in relatively high demand on the data center side."
"Each disc is certified for 50 years of operation; you can actually get some discs that are certified for 1,000 years of reliability. Because the media is separate from the drives, if you ever have a drive issue, you simply replace the drive, and you won't have to replace the data within a disc. From a reliability and operational standpoint it's quite elegant and efficient."
"A large portion of that is going to be warm to cold data, and we need something better than tape and disk to store it." 

Friday, July 11, 2014

Preserving eBooks

Preserving eBooks. Amy Kirchhoff and Sheila Morrissey. DPC Technology Watch Report 14-01. 01 June 2014.
There is some question as to whether one can even speak of ‘selling’ and, correspondingly, ‘owning’ eBooks. The right to permanent possession, including perpetual access and preservation rights, is the exception rather than the norm in eBook licensing.Libraries and publishers are still experimenting with how to purchase or license eBooks and then how to lend them to patrons

There is concern about the possibility of modification, retraction or withdrawal of an eBook. This
happened in 2009 when Amazon deleted some editions from customers who had purchased some eBooks. Memory institutions need to be able to ensure the stability of eBook content in their collections and maintain control over any withdrawal or de-accessioning of that content.

Preservation of eBooks is not free. It is expensive to identify content for preservation, gather it, perform initial actions on it, and then preserve that content for the long term. Some approaches that exist are:
  • Collective model. Such as HathiTrust.
  • Subscription service. Portico 
  • Government support. The national libraries of the United Kingdom, France, the Netherlands, etc.
Some of the general formats used for eBook Publication include:
  • HTML
  • PDF
  • MOBI
  • EPUB4
  • OEB (Open eBook Publication Structure) superseded by EPUB
  • Microsoft LIT50
  • DAISY
  • Text Encoding Initiative

Recommended Actions for Libraries and other institutions:
  • Specify who has responsibility for preserving eBook content 
  • Co-ordinate with other institutions, to eliminate preservation gaps and avoid duplicating efforts
  • When acquiring or licensing eBook content, ensure the acquisition includes preservation rights, and prohibits DRM technologies in the preservation copy acquired from the vendor;
  • Consider and understand what preservation rights are provided when eBooks are licensed and exactly how long-term access will be ensured by the publisher;
  • Articulate preservation policies for the handling of embedded objects, including articulation of legal rights to the content, and workflow requirements to ascertain preservation risks for that embedded content;
  • Encourage publishers to participate in preservation institutions to ensure the long-term viability of their eBook content; and
  • Invest in maturing existing characterization tools, and extending the toolset. Establish whether there is a preservation requirement somehow to maintain the hardware,



Monday, June 30, 2014

University of Nebraska–Lincoln Selects Ex Libris Rosetta to Preserve Digital Collections

University of Nebraska–Lincoln Selects Ex Libris Rosetta to Preserve Digital Collections. Exlibris. June 24, 2013.
University of Nebraska–Lincoln (UNL) has adopted the Rosetta digital preservation solution. The university will implement Rosetta across the entire campus to preserve the school’s renowned digital humanities collections, campus websites, research data, and other digital assets. Rosetta will be under the management of the UNL library system, a member of the Association of Research Libraries and Committee on Institutional Cooperation. The Rosetta community comprises academic, research, and national libraries; museums; and archives.

Saturday, May 17, 2014

The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East.

The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East. John Gantz and David Reinsel. IDC, sponsored by EMC. December 2012.
The "digital universe" is a measure of all the digital data created, replicated, and used in a single year, and a projection of the size of that universe to the end of the decade. IDC's sixth annual study of the digital universe:
  • From 2005 to 2020, the digital universe will grow from 130 exabytes to 40,000 exabytes,
  • From now until 2020, the digital universe will about double every two years
  • A majority of the information in the digital universe, 68% in 2012, is created and consumed by consumers (TV, social media, images and video)
  • Yet enterprises have liability or responsibility for nearly 80% of the information in the digital universe.
  • By 2020, nearly 40% of the information in the digital universe will be "touched" by cloud computing providers
  • The amount of information individuals create themselves (writing documents, taking pictures, downloading music, etc.) is far less than the amount of information being created about them
  • Much of the digital universe is transient (not saved)
  • The amount of information in the digital universe that is "tagged" accounts for only about 3% of the digital universe in 2012, and that which is analyzed is half a percent of the digital universe.
  • The share of the digital universe attributable to emerging markets is up to 36% in 2012 and will be 62% by 2020. By then, China alone will generate 21% of the bit stream entering the digital universe.
     

Saturday, March 29, 2014

Recommended Format Specifications

Recommended Format Specifications. Library of Congress. March 2014.
Recommended Format Specifications are hierarchies of the physical and technical characteristics of creative formats, both analog and digital, which will best meet the needs of all concerned, maximizing the chances for survival and continued accessibility of creative content well into the future.

There are two primary purposes of the specifications. One purpose of the specifications is to provide internal guidance within the Library to help inform acquisitions of collections materials (other than materials received through the Copyright Office). A second purpose is to inform the creative and library communities on best practices for ensuring the preservation of, and long-term access to, the creative output of the nation and the world.
Six broad categories of creative output, and particular format specifications in descending order of preference.
  • Textual Works and Musical Compositions
  • Still Image Works
  • Audio Works
  • Moving Image Works
  • Software and Electronic Gaming and Learning
  • Datasets/Databases
Format Specifications: PDF

Friday, March 14, 2014

Measuring signals – challenges for the digitisation of sound and video

Measuring signals – challenges for the digitisation of sound and video.  Richard Wright. DPC Tech Watch Report. March 2012.
This report from the Digital Preservation Coalition outlines the unique challenges involved in digitising audio and audiovisual material. ‘Moving image and sound content is at great risk.’ ‘Preserving the quality of the digitized signal’ is one major digital preservation challenge for audiovisual files’.
Understanding how data changes as it is played back, or moved from location to location, is important for thinking about digitisation as a long term project. When data is encoded, decoded or reformatted it alters shape, therefore potentially leading to a compromise in quality. This is a technical way of describing how elements of a data object are added to, taken away or otherwise transformed when they are played back across a range of systems and software that are different from the original data object.


Wednesday, February 12, 2014

ISO Freely Available Standards

Freely Available Standards.

ISO Copyright for the freely available standards

The following standards are made freely available for standardization purposes. They are protected by copyright and therefore and unless otherwise specified, no part of these publications may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, microfilm, scanning, reproduction in whole or in part to another Internet site, without permission in writing from ISO. Requests should be addressed to the ISO Central Secretariat.

The documents you are about to download are a single-user, non-revisable Adobe Acrobat PDF file, to store on your personal computer. You may print out and retain one printed copy of the PDF file. This printed copy is fully protected by national and international copyright laws, and may not be photocopied or reproduced in any form. Under no circumstances may it be resold.

Monday, October 14, 2013

The WARC File Format (ISO 28500) - Information, Maintenance, Drafts

The WARC File Format (ISO 28500) - Information, Maintenance, Drafts.




I confirm - as convenor of the WARC format ISO working group - that there are no substantial modifications between the version on http://bibnum.bnf.fr/WARC/index.html and the ISO standard, except some little editorial changes. So it may be used as a trustworthy reference.

As far as I know, ISO organization resources comes largely from the selling of their standards, so it is not possible to make them freely available, except in some cases. The case of ISO/IEC standards is one of these exceptions; it is due to the fact that the standards are developed by two organizations with different publication rules (ISO and IEC).
Even as convenor, I had no free copies of the standard.
I will check again with ISO secretariat but I doubt it will be legal to make freely available the official version.

This is a reason why there is a common practice to publish draft standards
- such as we did on BnF website.

Best regards,
Clément