Monday, August 29, 2011

Institutional Repositories and Digital Preservation: Assessing Current Practices at Research Libraries

Institutional Repositories and Digital Preservation: Assessing Current Practices at Research Libraries. Yuan Li, Meghan Banach. D-Lib Magazine. May/June 2011.
If the digital scholarly record is to be preserved, libraries need to establish new best practices for preservation. For their part, creators need to be more proactive about archiving their work. Institutional Repositories may provide some help in preserving digitial materials, but some  question whether IRs were intended to provide long-term preservation of digital scholarship.  

The most important roles that IRs play are to collect, manage, and disseminate the digital scholarship that their communities produce. Most content in an IR is deposited by author self-archiving, by third party on behalf of the author, and by repository staff. Regardless of how content is deposited in the IR, the quality of deposited content should be examined before digital preservation actions are considered, since the quality of content can directly affect the success of digital preservation efforts. Problems may include format obsolescence, poor quality images, and insufficient metadata to manage and preserve the materials.

While most report that their IRs are currently providing long-term digital preservation, a closer look shows they are really in a planning process to provide long-term preservation rather than providing it in a fully operational way. An increasing number of research libraries have started to move digital preservation programs ahead by developing preservation policies.

Criteria for the Trustworthiness of Data Centres

Criteria for the Trustworthiness of Data Centres. Jens Klump. D-Lib Magazine. January/February 2011.
The rapid decay of URLs for research resources is an important reason to use persistent identifiers. The use of persistent identifiers implies that the data objects are persistent themselves. The rapid obsolescence of the technology to read the information, along with the physical decay of the media, represents a serious threat to preservation of the content. Since research projects only run for a relatively short time, it is advisable to shift the responsibility for long-term data curation from the individual researcher to a trusted data repository or archive.

We need criteria for the assessment of trustworthiness of digital archives. Some of the methods presented have been:
  •     Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC)
  •     Catalogue of Criteria for Trusted Digital Repositories (nestor Catalogue)
  •     DCC and DPE Digital Repository Audit Method Based on Risk Assessment (DRAMBORA)
  •     DINI-Certificate Document and Publication Services
  •     Data Seal of Approval (Sesink et al., 2008)
These provide useful feedback on developing additional criteria and auditing procedures to certify  trusted digital archives.

Google Strikes Deal With French Publisher La Martiniere Groupe

Google has signed a deal with French publishing house La Martiniere Groupe for the scanning of books no longer on sale but still protected by copyright.They will jointly set up a catalog of books to be scanned that are no longer sold by the publisher. La Martiniere Groupe will decide which books Google is allowed to scan and also which of the scanned books can then be sold on Google's Ebooks platform. That deal was seen as setting a precedent for how publishing companies across the continent can make money via the digitization of books still under their copyright protection but no longer sold in stores.

Thursday, August 25, 2011

The Conference on World Affairs Archive Online: Digitization and Metadata for a Digital Audio Pilot.

The Conference on World Affairs Archive Online: Digitization and Metadata for a Digital Audio Pilot.  Michael Dulock, Holley Long.  D-Lib Magazine. March/April 2011.
The University of Colorado Archives began a project to digitize a sample set of 80 tapes from their substantial collection of audio recordings. To prepare for the project the Media Specialist inspected a sampling of the audio in the collection to determine media formats and the collection's condition. The media specialist also listened to the selection of materials to rate the sound quality on a scale of excellent, very good, good, fair, or poor.  In addition to the media's age, playback equipment for analog audio formats is becoming increasingly more difficult to acquire and maintain.

They followed the technical  recommendations in the Collaborative Digitization Program's Digital Audio Best Practices. The team chose to digitize the materials at the recommended 44.1 kHz and 24 bit, since these specifications adequately capture the spoken word.

With existing metadata, volunteers added descriptive summaries and topical keywords.  The project used several schemas:  Broadcasting Metadata Dictionary Project (PBCore) for the audio, qualified Dublin Core for text documents, and Visual Resources Association Core metadata 4.0 (VRA Core) for photographs and other images.

Digital Preservation, Digital Curation, Digital Stewardship: What’s in (Some) Names?

Digital Preservation, Digital Curation, Digital Stewardship: What’s in (Some) Names?  Butch   Lazorchak. The Signal. August 23, 2011.
     We often use “digital preservation,” “digital curation” and  “digital stewardship” interchangeably without thinking about the differences or what the name is. 
Preservation is defined as keeping something in its original state.
Curation looks at selection, maintenance, collection and archiving of digital assets in addition to their preservation.

Curation is useful for looking at the entire life of the materials and concentrates on "building and managing collections of digital assets and so does not fully describe a more broad approach to digital materials management.

Stewardship looks at holding resources in trust for future generations which can include both preservation and curation.

Tuesday, August 23, 2011

Digital Video Preservation: Further Challenges for Preserving Digital Video and Beyond By Killian Escobedo

Digital Video Preservation: Further Challenges for Preserving Digital Video and Beyond. Killian Escobedo. The Signal. August 16, 2011.
Standards for preserving and maintaining digital video are now emerging. For archives, a sustainable video file format must either be loss-less or uncompressed.  There are efforts to establish the Motion JPEG2000 video codec and MXF wrapper as the preservation target format for digital video. But the codec has been widely adopted, which makes it difficult to support the format for preservation purposes. There are also other digital video preservation challenges, such as video files that are part of larger multimedia objects, like CD-ROMs, DVDs and websites. Other workflows and tools will have to be developed.  Preserving digital video files on CD-ROMs, DVDs, and such will require new metadata schema and strategies.

"Currently, the Archives is looking at screen capture software as a potential means of recording how a CD-ROM functions and links to other material before software and hardware obsolescence renders the content unplayable. A similar method can be employed for capturing the navigational structure of DVD menus and Flash-based websites".  Once the object cannot play on regular equipment, it will require recreating environments with obsolete hardware, software, and operating systems.

Wednesday, August 17, 2011

When Data Disappears.

When Data Disappears. Kari Kraus. The New York Times. August 6, 2011.
A writer said he didn't include digital media in his archive because he felt digital preservation is doomed to fail. “There are forms of media which are just inherently unstable.” It is more difficult, but it is not pointless.  "If we’re going to save even a fraction of the trillions of bits of data churned out every year, we can’t think of digital preservation in the same way we do paper preservation. We have to stop thinking about how to save data only after it’s no longer needed, as when an author donates her papers to an archive. Instead, we must look for ways to continuously maintain and improve it. In other words, we must stop preserving digital material and start curating it."

There are major challenges with digital preservation, but part of it is the amount of data being created.  The world created over 1.8 zettabytes of digital information a year. There will never be enough capacity to save everything if we continue to replicate the practices used to maintain paper archives. In the paper archives model, preservation begins at the end of the life cycle. Data preservation must happen earlier, ideally when the item is created. The decisions about what to save and how to save it must be made early in the life cycle; the data should then be curated, not preserved.  Not all data is worth preserving, either in paper or electronically.  Video games offer an interesting model that may be useful with other types of information.  That model "allows us to see preservation as active and continuing: managing change to data rather than trying to prevent it, while viewing data as a living resource for the future rather than a relic of the past"

The 2011 Digital Universe Study: Extracting Value from Chaos.

The 2011 Digital Universe Study: Extracting Value from Chaos.  IDS Website.  June 2011.
In 2011 the amount of information created and replicated will surpass 1.8 zettabytes in about 500 quadrillion 'files'. About 75% of the information is created by individuals. The amount of information individuals create themselves is less than the information being created about them (your digital shadow).  As the digital universe expands and gets more complex, processing, storing, managing, securing, and disposing of the information in it become more complex as well.  The calls to action include:

  • Investigate new tools for creating metadata
  • Decide on the most important data projects, along with the needed data sets and tools and create an enterprise data strategy
  • Stay close to the latest strategies and practices
  • Be aggressive in developing and managing storage management tools
  • Set the strategy and build a process for sharing resources
  • Begin creating the needed skill sets, mindsets, and processes needed to best use the data
  • Collaborate with partners and suppliers
The growth of the digital universe is a challenge but also brings a way for new and exciting uses of data.

Binary Powers of 10

Binary Powers of 10.  Website.
[There are lots of sites with this information, but this has some good information.  There are others that continue the list with lumabyte, though that is not on any standard list.  And of course my favorite is the brontobyte. The current list is extending with the alphabet going from Z to A.]
1 byte
1 byte.
1 kilobyte
1,024 bytes
1 megabyte
1,048,576 bytes
1 gigabyte
1,073,741,824 bytes
1 terabyte
1,099,511,627,776 bytes
1 petabyte
1,125,899,906,842,624 bytes
1 exabyte
1,152,921,504,606,846,976 bytes
1 zettabyte
1,180,591,620,717,411,303,424 bytes
1 yottabyte
1,208,925,819,614,629,174,706,176 bytes
1 xonabyte
1,237,940,039,285,380,274,899,124,224 bytes
1 wekabyte
1,267,650,600,228,229,401,496,703,205,376 bytes
1 vundabyte
1,298,074,214,633,706,907,132,624,082,305,024 bytes

About the Data Seal of Approval (DSA)

The Data Seal of Approval ensures that research data can still be processed in the future by establishing quality guidelines. 
There are five criteria that together determine whether or not the digital research data may be qualified as sustainably archived:
  1. The research data can be found on the Internet.
  2. The research data are accessible, while taking into account relevant legislation with regard to personal information and intellectual property of the data.
  3. The research data are available in a usable format.
  4. The research data are reliable.
  5. The research data can be referred to.
In addition, there are three groups that must use the data responsibly"
  1. The data producer is responsible for the quality of the digital research data.
  2. The data repository is responsible for the quality of storage and availability of the data: data management.
  3. The data consumer is responsible for the quality of use of the digital research data.
The seal shows that the data archive or repository is in compliance with the sixteen DSA guidelines:
  1. The data producer deposits the research data in a data repository with sufficient information for others to assess the scientific and scholarly quality of the research data and compliance with disciplinary and ethical norms.
  2. The data producer provides the research data together with the metadata requested by the data repository.
  3. The data repository has an explicit mission in the area of digital archiving and promulgates it.
  4. The data repository uses due diligence to ensure compliance with legal regulations and contracts.
  5. The data repository applies documented processes and procedures for managing data storage.
  6. The data repository has a plan for long-term preservation of its digital assets.
  7. Archiving takes place according to explicit workflows across the data life cycle.
  8. The data repository assumes responsibility from the data producers for access to and availability of the digital objects.
  9. The data repository enables the users to utilize the research data and refer to them.
  10. The data repository ensures the integrity of the digital objects and the metadata.
  11. The data repository ensures the authenticity of the digital objects and the metadata.
  12. The technical infrastructure explicitly supports the tasks and functions described in internationally accepted archival standards like OAIS.
  13. The data consumer must comply with access regulations set by the data repository.
  14. The data consumer conforms to and agrees with any codes of conduct that are generally accepted in higher education and research for the exchange and proper use of knowledge and information.
  15. The data consumer respects the applicable licences of the data repository regarding the use of the research data.
  16. The data producer provides the research data in formats recommended by the data repository.

Record Industry Braces for Artists’ Battles Over Song Rights.

Record Industry Braces for Artists’ Battles Over Song Rights.  Larry Rohter. New York Times.  August 15, 2011.
When copyright law was revised in the mid-1970s, musicians, like creators of other works of art, were granted “termination rights,” which allow them to regain control of their work after 35 years.  The record companies believe the termination right doesn’t apply to most sound recordings. The copyright law went into effect on Jan. 1, 1978, so the earliest any recording can be reclaimed is Jan. 1, 2013.  A resolution is probably not possible without a definitive court ruling.

Tuesday, August 16, 2011

Will Kindles kill libraries?

Will Kindles kill libraries? Eugenia Williamson. The Phoenix. July 27, 2011.
[There are many sources discussing this major change for publishing and libraries.  This post really looks at the preservation aspect.]

"Preserving materials for future generations is a big part of why libraries exist in the first place. According to the American Library Association, preservation upholds the First Amendment by contributing to the free flow of information."

But a library can't preserve a book it doesn't own, and many digital works are now being licensed rather than purchased. One company, OverDrive, is a middleman negotiating between the libraries and publishers.  It is unknown if there are long term rights for these materials, and if so, what the rights are, and how this fits into a preservation model.

As reported in Library Journal, that state's library system began using those services in 2006, and last year, that company proposed a new contract that would raise administrative fees 700 percent by 2015.

Kansas has announced their intent to petition for the right to terminate its contract which Kansas believes that it owns the e-books it licensed and has the right to transfer them to a new service provider. If the library cannot do this, they will have spent $568,000 for books it can no longer access, which is more than if they had purchased print copies that they would own.

Friday, August 12, 2011

New Statistics Model for Book Industry Shows Trade Ebook Sales Grew Over 1,000 Percent

New Statistics Model for Book Industry Shows Trade Ebook Sales Grew Over 1,000 Percent. Library Journal. Michael Kelley. August 9, 2011.
A new annual survey of the total U.S. book publishing industry shows growing revenue and exponential eBook sales.

The industry sold 2.57 billion books in all formats in 2010, a 4.1 percent increase over 2008.  Publishers' net sales revenue grew to $27.94 billion in 2010, a 5.6 percent increase over 2008. Net revenue from trade books grew 5.8 percent since 2008, to $13.94 billion.

Within the trade segment, eBooks, again excluding the robust growth that has occurred in 2011, grew from 0.6 percent of the total trade market share in 2008 to 6.4 percent in 2010, which translates to a 1,274.1 percent increase in publisher net sales revenue year-over-year, with total net revenue for 2010 at $878 million. In the same three years, 114 million ebooks were sold, a 1,039.6 percent increase. In adult fiction, ebooks represent 13.6 percent of the net revenue market share.

Online sales became an increasingly important distribution channel. Net sales revenue for content distributed online was $2.82 billion in 2010, a three-year overall growth of 55.2 percent. Net unit sales by publishers to online channels grew 68.6 percent, to 276 million in 2010.

For 2010, overall bricks-and-mortar trade retail remained the largest distribution channel in the United States (40.8 percent). In contrast to the eBook numbers, total net sales revenue of trade hardcovers in 2010 was $5.26 billion, an increase of only 0.9 percent over the three years, and its share of the market declined from 39.6 percent in 2008 to 37.7 percent in 2010. Softcover revenue was up 1.2 percent to $5.27 billion, with a similar decline in market share, and mass-market paperback net sales revenue was down 13.8 percent to $1.28 billion.

Washington State Archives - Digital Archives

The archives is dedicated specifically to the preservation of electronic records from both State and Local agencies that have permanent legal, fiscal or historical value. This is a Microsoft solution. The web interface and database storehouse were custom designed specifically for the Digital Archives.  The documents on their website about the  project (in PDF or PowerPoint formats) are worth reviewing.  They are :

Thursday, August 11, 2011

Building a Sustainable Institutional Repository

Building a Sustainable Institutional Repository. Chenying Li, et al. D-Lib Magazine. July/August 2011.
Institutional Repositories are an increasingly important resource and service offered by libraries. Increasing the use of the content is a key to building a sustainable IR. Two organizational types:

1. Structured Content Organization
Organizing content according to its role in the University, which provides a more orderly process of content organization and more efficient metadata.

2. Modular Content Publishing
Creating modules as independent publishing units that work together as a complete and comprehensive publishing system. This uses themed publishing and metadata aggregation.

It is becoming more important for libraries to provide users with the contents and services that are found in institutional repositories.  

Free Tools for Your Preservation Toolbelt.

Free Tools for Your Preservation Toolbelt. Randy Stern, Spencer McEwen.  Harvard.  June 2011. A presentation delivered at the Open Repositories 2011 conference
The Digital Repository Service models “objects” rather than files.  Examples:
- Delivery, archival master, and production master images comprise one object
- All images and OCR text for a book comprise one object

Object Preservation Metadata: Digital preservation requires accurate and sufficient technical metadata to support preservation planning and activities. Descriptive metadata is also valuable for identification and management by curators.  Standards-based schema maximize tool support and ability to exchange data with other repositories.

Here are some tools they use, which are open source or will be soon.

Tool 1 - FITS (File Information Tool Set)
Identifies, validates, and extracts technical metadata from files

Tool 2 - OTS-Schemas (Object Tool Set Schemas)
Java library for reading and writing documents in common XML schemas

Tool 3 - OTS (Object Tool Set)
Java library for creating, reading, updating, and writing METS Object Descriptors

Tool 4 - BatchBuilder
Builds OTS METS objects (and SIP) from directory hierarchies of content files

Wednesday, August 10, 2011

Five Tips for Designing Preservable Websites.

Five Tips for Designing Preservable Websites. Robin C. Davis.  The Bigger Picture.  Smithsonian Institution Archives. August 2, 2011.
Smithsonian Institution Archives is preserving the Institution’s history, including its large web presence. The Archives crawls each website using Heritrix, an open-source tool created by the Internet Archive, to capture content in an archival format. The purpose is to preserve the appearance, behavior, and content of digital objects. The Archives tailors crawl configurations to each specific website to capture as much of it as possible while adhering to the collections policy. Sometimes the structure of the site itself makes a perfect crawl difficult or impossible.

Five suggestions for web developers that can help ensure that their websites will be easier to crawl, to make accessible, and to preserve.
  1. Follow accessibility standards
  2. Avoid proprietary formats for important content or provide alternate versions
  3. Maintain stable URLs and redirect when necessary.  Avoid linkrot, meaning links which point to resources that are no longer available. Carefully plan and implement a URL design scheme with a policy of persistence. They have found websites with as many as 40% broken links.
  4. Design navigation carefully and include a sitemap. The crawler is usually set to capture only six levels deep. To help others discover your entire website, provide a sitemap and “view all” link for documents.
  5. Allow browsing of collections, not just searching, such as by arranging images by genre.
Designing a web site with preservation in mind can help safeguard it for the future. This is part of our cultural legacy.

Hawaiian Heritage sites set to launch on CyArk website.

Hawaiian Heritage sites set to launch on CyArk website. Press release. Hawaii 24/7. August 10, 2011.   
CyArk, with the help of its partners, conducted the field work for the Digital Preservation of three culturally significant Hawaiian sites, which include site animations, photography, panoramas, perspectives, and drawings. This information will showcase oft-overlooked heritage sites and highlight the need for cultural resource preservation.

CyArk is a non profit organization with the mission of digitally preserving cultural heritage sites through collecting, archiving and providing open access to data created by laser scanning, digital modeling, and other state-of-the-art technologies.

Start-up to release 'stone-like' optical disc that lasts forever

Start-up to release 'stone-like' optical disc that lasts forever.  Lucas Mearian. Computerworld. August 8, 2011.

Millenniata has partnered with Hitachi-LG Data Storage to launch an M-Disc read-write player in early October.  Any DVD player maker will be able to produce M-Disc machines by simply upgrading their product's firmware. Millenniata said it has also proven it can produce Blu-ray format discs with its technology - a product it plans to release in future iterations. Currently the discs write at  only 4x speed, but they are working to increase it.

Millenniata partnered with Hitachi-LG Data Storage to provide M-Ready technology in most of its DVD and Blu-ray drives. The technology is addressing the needs of the long-term data archive market.  This disc does not need special temperature or humidity controls.

Friday, August 05, 2011

Economics and Digital Preservation: Final Report of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access

Economics and Digital Preservation: Final Report of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access.  Fran Berman and Brian Lavoie. Library of Congress. July 21, 2011. PDF. [Old link has disappeared. To read, use the 2015 link]
Digital Preservation is both a technical and economic problem. There must be solutions to both for there to be success.  Even the most elegant technical solution is no solution at all if it is not economically sustainable.  Some of the challenges they list:
  • “One‐time” funding models are inadequate to address persistent long‐term access and preservation needs
  • Poor alignment between stakeholders in the digital preservation and access world and their roles, responsibilities and support models
  • Lack of institutional, enterprise, and/or community incentives to support the collaboration needed to enforce sustainable economic models
  • Complacency that current practices are “good enough” and / or the problem is not urgent.
  • Fear that digital access and preservation is too big to take on
Stakeholders are:
  • Those who benefit from use of a preserved asset
  • Those who select what to preserve
  • Those who own or have rights to an asset
  • Those who preserve the asset
  • Those who pay
There is no magic bullet, and there is no "free" solution.


1.        Create Sustainability‐friendly policies and mandates
2.        Invest in preservation infrastructure
3.        Create preservation‐aware communities
a.        Create public public‐private partnerships to align distinct stakeholder groups
b.        Convene expert communities to address the selection and preservation needs of valuable materials for which there is no stewardship
4.        Raise awareness
a.        Provide leadership in training and education For 21st century digital preservation,
b.        Promote digital preservation skills and awareness
5.        Take individual responsibility
a.        Provide nonexclusive rights to preserve and distribute created content
b.        Partner with preservation experts throughout the data lifecycle to ensure your data will be maintained in a form that will be useful over the long term
c.        Pro‐actively participate in professional organizations to create best practices and selection priorities.

Library of Congress Digital Preservation Newsletter.

Library of Congress Digital Preservation Newsletter. Library of Congress.  August 2011. [PDF]
The newsletter includes information on:
  •  “Make it Work: Improvisations on the Stewardship of Digital Information,” 
  • All About Archiving the Web 
  • Possible uniform law on the authentication of online legal materials
  • Exploring Cultural Heritage Collections With Recollection
    • Recollection is a free and open source platform that lets archivists, librarians, scholars and curators create easy to navigate web interfaces (like maps, timelines, facets, tag clouds) to their digital, cultural heritage collections.
  • Finding digital preservation training.  The training calendar.
  • Digital Time Capsules and our "Digital Afterlife"
    • Creating and organizing personal digital content for future access.
  • The Signal: Library of Congress blog to discuss digital stewardship in a way that is informative and appealing.
    • Tending the machines
  • What skills does a digital archivist or librarian need?  Skills students need to compete in the archives and libraries job market.  Expertise with programming, formats and standards is, of course, very important.  But other talents have a greater bearing on success in today’s workplace. Such as:
    • an ability to understand and adapt to new ways of using technology
    • eagerness to help refine how things are done
    • a basic understanding of how the different system parts contribute to doing the job at hand
    • ability to bridge two distinct social camps: the highly technical and the highly not-technical
    • how choose among these tools and software options to meet the needs of users
    • communication skills, including presentation, writing, speaking and persuading
    • ability to social media and to integrate photographs, graphics and video with text to get the right message out to as many people as possible

Thursday, August 04, 2011

GE pushes ahead with 500GB holographic disc storage

GE pushes ahead with 500GB holographic disc storage. Lucas Mearian. Computerworld. July 28, 2011.
GE hopes to license its technology for a 500GB holographic disc.  This was first announced in 2009. They hope to create a 1 TB disc.  InPhase Technologies is also working on a 300GB holographic optical disk.
[These are still a long way from implementation, if they ever do succeed.]

Digital Preservation Courses & Workshops For Organizations and Institutions

Digital Preservation Courses & Workshops For Organizations and Institutions. Library of Congress. August 2011.  The Library of Congress, as part of their outreach and education efforts, provide this calendar to help people access training related to digital preservation. The calendar can be sorted by date, course, format, location, and cost. To find out more about an offering, click on its title. The list includes a number that are free and online, which include:
  • Preserving Your Personal Digital Memories
  • Protecting Future Access Now: Models for Preserving Digitized Books and Other Content at Cultural Heritage Organizations
  • An Introduction to Digital Preservation

Digital Preservation in a Box: NDSA Outreach

Digital Preservation in a Box: NDSA Outreach. Butch Lazorchak. The Signal. August 3rd, 2011.
A group has been working on what it calls “Digital Preservation in a Box.” This is an introduction to the concepts of preserving digital information through a suite of resources that are available to anyone planning an outreach event, presentation, or preparing to teach introductory digital preservation concepts.There are a few resources listed and it will develop over time.

Poll: How do you back up your data?

Poll: How do you back up your data?   Geoff Gasior. The Tech ReportFebruary 16, 2011.
The more digital content we accumulate, the more we have to lose in the event of storage failure.  There are options. One poll showed how some people personally back up their data:
  • USB thumb drive: 4%
  • External hard drive: 35%
  • Optical disc: 4%
  • Memory card: 0.3% 
  • Tape: 1%
  • Network-attached storage device: 15%
  • Online backup service: 4%
  • Multiple methods: 19%
  • Gonna lose everything if my hard drive dies: 20%