Friday, August 28, 2009

Digital Preservation Matters - 27 August 2009

Encoded Archival Context – Corporate bodies, Persons, and Families. Society of American Archivists, Berlin State Library. August 21, 2009.

Archivists have expressed the need for a standard structure to record and exchange information about the creators of archival materials. Draft information and a schema (EAC-CPF) are available at this site and feedback is welcome.

10 Ways to Archive Your Tweets. Sarah Perez. ReadWriteWeb. August 11, 2009.

Tweets have an expiration date on them and become unsearchable after a week and a half, though that may be reduced as more content is added. Several options for saving these are explored.

Startup crafts DVD-Rs for the 31st century. Rik Myslewski. The Register. 23 July 2009.

The Millenniata company has developed a new DVD-R technology that it claims will be readable for 1,000 years. The Millennial Disc Series is designed to eliminate the need for governments, financial institutions, libraries, and others to regularly refresh and rotate their digital-data collections. The data is etched into a "carbon layer with the hardness of a diamond". It requires a specialized writer and discs [but readable on any DVD player]. The discs are stable from minus 100° to plus 200° centigrade, and are dunked in liquid nitrogen as part of the testing. These discs are one element of a data preservation strategy.

“Why you never should leave it to the University”. JISC-PoWR website. Blog Post by Brian Kelly on August 19th, 2009.

Discussion of an article about a person who lost his academic website after the School of Business had redesigned their web site. With the changes, the person lost about “ten years worth of virtually daily updates were gone That included most of the manuscripts for my published work. The same thing happened to lecture notes, powerpoint slides, course documentations, useful links, etc. It had all disappeared from the Web!”. The issues need to be discussed, and in the current climate that must include the costs: “disk storage may be cheap but management of content is not”. The JISC archive has a number of other interesting posts about preservation of blogs, websites, wikis; and preservation policies.

Digital Preservation in the Wild. Tim Donohue. Slide show. July 21, 2009.

Thirty slides about digital preservation. Some notes from it:

  • It is not about the technology.
  • You don’t have to preserve everything to the fullest extent if you say you aren’t.
  • Say what you do, do what you say.
  • We acknowledge our gaps

Labeling Library Archives Is a Game at Dartmouth College. Marc Beja. The Chronicle of Higher Education. August 25, 2009.

A digital-humanities professor is creating an Internet-based game where users create descriptive tags for library images to improve searching. Adding keywords can be costly. This could be a way for the library to generate metadata. Users points could gain points as they compete to label images that match the keywords of other players. It is being funded by NEH and should be available next summer. [Some image sites already have similar functionality.]

‘Digital-Only’ Confusion in Scholarly Publishing: American Chemical Society. Barbara Quint. Information Today. July 23, 2009.

What happens if (or when) scholarly publishing reaches the tipping point of going "digital-only"? Publishers have been creating digital versions for some time, but some are now moving to only digital versions. The American Chemical Society journals are all available electronically but none are going only digital. “Studies have shown that more and more users now prefer the digital mode.” They will be publishing two titles next year that not in print form. They will continue to monitor the situation, but for “today, and throughout 2010, online access and print subscriptions both remain options.”

Thursday, August 27, 2009

Reinventing academic publishing online. Part I: Rigor, relevance and practice.

Brian Whitworth, Rob Friedman. First Monday. 3 August 2009.

“The current gate–keeping model of academic publishing is performing poorly as knowledge expands and interacts, and that academic publishing must reinvent itself to be inclusive and democratic rather than exclusive and plutocratic.” Many of the applications that have become popular today are not media rich, but simple and text based, such as blogs, wikis, etc. Timeliness is important; materials out of date may not be useful. Innovators are the agents of change. “A system that rejects its own agents of change rejects its own progress.”

Why Academic Libraries Matter.

Barbara Fister. Peer to Peer Review; Library Journal. August 13, 2009.
Values the library can provide:
  1. A total experience that works well
  2. Provide meaning, depending on the user's goal
  3. Create relationships with the users and the institution
Find out how the library can fit in the life of the user. The library is important as a broker or purchaser of information. Library principles go beyond collections and local needs; it is about access, about the importance of knowledge and about what we do with information.

Monday, August 24, 2009

Introduction to ANDS

Introduction to ANDS. Ron Sandland. July 2009.
Share - the newsletter of the Australian National Data Service.
We need to find new ways to capture and share data. To do this we need to create accessible repositories and make it possible to access their holdings. Important issues are access control, storage solutions, training, and guidelines on best practices. They are launching online services.
The key challenge is to build an environment where researchers can store, share, and find data, as they do with publications.

Thursday, August 20, 2009

Digital Preservation Matters - 20 August 2009

The Next-Generation Architecture for Format-Aware Characterization: About JHOVE2. Website. August 18, 2009.

Because of limitations in the original JOVE program, NDIIPP, the California Digital Library, Portico, and Stanford University are collaborating on a new project. An alpha prototype is available for download. The project looks at identification, validation, feature extraction, and policy-based assessment for simple digital file and potentially complex digital object that may be in multiple files.The Digital Continuity Action Plan. Website. Archives New Zealand. 10 August 2009.

A unique inclusive and unified initiative in New Zealand to prevent important public records from being lost and to ensure information will be available tomorrow. A brochure gives an overview of their plan. It includes a note that “Sixty-seven percent of New Zealand public sector agencies hold some information that they can no longer access.” The full plan is set out in a 48p. pdf. The plan is to make the information available and authentic / trusted. If no action is taken, digital information will be lost. A proactive approach is needed to maintain digital information for the future. “Failure to implement digital continuity strategies will result in irretrievable loss of information.” Six goals (explained in detail in the longer document) are:

  1. Understanding: Communicate effectively and have a common understanding of the problem.
  2. Digital information is well-managed from the point of creation onwards.
  3. Infrastructure exists to support the interoperability of systems and efficient digital continuity.
  4. High-value information is identified, so critical information is not lost.
  5. Digital information is accessible now and in the future, and protected from unauthorized use.
  6. Information management is characterized by good governance, leadership and accountability.

Sony to back open e-book format. BBC News. 14 August 2009.

Sony has announced it will use the ePub open format reader instead of its proprietary standard. This will allow Sony the option of making its e-book store compatible with other readers.

Long Term Digital Preservation of Web Sites. Mikael Tylmad. Thesis. Royal Institute of Technology for the Swedish National Archive. May 31, 2009. [38p. PDF]

Websites have become a standard way for organizations to present information to the public. There are a number of archival concerns in keeping this information long term. Few web pages are written in standard HTML anymore; they use a number of different technologies, such as Flash, and many formats. “The fewer file types the better and if they are human readable it is

even better.” This requires archivists to keep the software as well as the entire website. Besides the textual and graphical parts of a web page, the relationship of the parts and how they are presented are important (content and context). Archived sites lose interactivity, become static. Links in Flash etc can be hidden from crawlers and important parts will be lost. Heritrix, used by Internet Archive, is a powerful solution to web archiving. Emulation through virtualization is another powerful solution. Another solution is SWAT (Snappy Web Archiving Tool). The tool, written in Ruby, is available at: It does the following:

  1. Harvests all files from the website and analyzes for future compatibility with DROID.
  2. Screenshots of all web pages are created as tiffs to show the page design
  3. Creates in XML metadata about files, links, etc (METS standard)
  4. The web archive with documentation are put in a tar package with an ADDML description.

Amazon Erases Orwell Books From Kindle. Brad Stone. The New York Times. July 17, 2009.

Amazon remotely deleted some digital editions of the books from the Kindle devices of readers who had bought them. And they appear to have deleted other purchased e-books from Kindles recently.

Chrysler Destroys Its Historical Archives; GM to Follow? Bob Elton. The Truth About Cars. July 26, 2009.

Archives are the foundation of historical research. Without access to primary material (documents, photographs, financial statements, engineering, test reports, etc) historians lack the sources needed to understand the past. Some automakers have worked to preserve and protect their historical documents. However Chrysler and GM have recently closed their library, the librarian laid off. All materials were “offered to anyone who could carry them away.” Many of the GM divisions no longer know the location of their historical documents, how they are organized or how researchers can gain access.

Digital Archives That Disappear. Inside Higher Ed. April 22, 2009

As digital archives have become more important and more popular, there are different opinions about how best to guarantee that they will be available long term. Some think the creators of the archives should keep control, while others believe larger organizations with more resources would be better. The article looks at the example of "Paper of Record," a digital archive of early newspapers with a strong collection of Mexican newspapers. The archive was purchased secretly by Google in 2006; shortly thereafter, the archive disappeared from view. Historians and others complained to Google about the loss of their ability to work. It appears from other sources that the articles are now partially available in the Google news reader.

Thursday, August 13, 2009

FW: Digital Preservation Matters - 13 August 2009

File Information Tool Set (FITS). August 6, 2009.

With the increase of digital projects that introduce new formats, it is increasingly important to have tools that deal with issues such as file format identification, validation and metadata extraction tools. FITS, developed by Harvard, acts as a wrapper for some existing tools, including JHOVE, Exiftool, the National Library of New Zealand Metadata Extractor, DROID, Ffident, and two original tools: FileInfo and XmlMetadata. The files can identify a file with a single result, or in the case of a conflict, can handle it in several ways. It is written in java and can be run from a command line or an interface. It is available for download and has a user guide.

Research Data Preservation and Access: The Views of Researchers. Neil Beagrie, et al. Ariadne. July 2009.

Data is becoming more central to interdisciplinary projects and has grown in size and complexity. This study tries to assess the feasibility and costs of developing and maintaining a shared digital research data service. It shows, with text and graphs, the disciplines where research data issues were of greatest concern, the storage features that are needed most, the retention period for data once the projects have ended, and how the data is shared. University managers have serious concerns about the cost, scalability and sustainability of purely local solutions.

Library of Congress Digital Preservation Newsletter. August 2009.

LC has developed new tools (including bagit) to transfer large quantities of digital content. BagIt, and related transfer tools, prepare to transfer data by packaging the collection in a directory with a manifest file that lists the contents. Specifications and other tools are on the tool and services page. More on this: 21st Century Shipping. D-Lib Magazine. Michael Ashenfelder. July/August 2009

The California Digital Library has opened its Web Archiving Service collections. The service was created to support the Web-at-Risk project, and is funded by the NDIIPP and the University of California.

A workshop on photometadata aimed at helping digital photographers use metadata when creating and distributing their work. The program demonstrated applications to embed metadata in photographs; it was stated that each digital photo can and should contain information about itself, its creator and its licensing conditions. Industry professionals told how metadata increased their business.

Online textbooks are gaining popularity, changing how students study. Dani Martinson. Missourian. August 6, 2009.

Online textbooks can provide additional information and resources for students, including direct links to audio and video. Digital textbooks are usually 50%cheaper than regular textbooks, though there is no buyback, and the books are often available only for a semester. Information can be updated easier and more frequently. A study found that the professors were more accepting of digital textbooks than students. They expect the demand will increase when the digital content is specifically designed for digital, rather than just a PDF version of the printed textbooks.

Wednesday, August 12, 2009

Elsevier Announces the “Article of the Future”

Elsevier Announces the “Article of the Future”. July 21, 2009.

Elsevier announced the ‘Article of the Future’ project, an ongoing collaboration with the scientific community to redefine how a scientific article is presented online. The project allows readers individualized entry points and routes through content, while exploiting the latest advances in visualization techniques. The prototype will be launched this week. The key feature is a hierarchical presentation of text and figures. A second key feature is the article with highlights and a graphical abstract.

In a Digital Future, Textbooks Are History

In a Digital Future, Textbooks Are History. Tamar Lewin. New York Times. August 8, 2009.

Textbooks have not gone the way of the scroll yet, but many educators say that it will not be long before they are replaced by digital versions — or supplanted altogether by lessons assembled from the wealth of free courseware, educational games, videos and projects on the Web. “In five years, I think the majority of students will be using digital textbooks. They can be better than traditional textbooks.”

“We believe that the world is going digital, but the jury’s still out on how this will evolve. We’re agnostic, so we’ll provide digital, we’ll provide print, and we’ll see what our customers want.”

CK-12 Foundation develops free “flexbooks” that can be customized to meet state standards, and added to by teachers. Its physics flexbook, a Web-based, open-content compilation, was introduced in Virginia in March.

“You can use them online, you can download them onto a disk, you can print them, you can customize them, you can embed video. When people get over the mind-set issue, they’ll see that there’s no reason to pay $100 a pop for a textbook, when you can have the content you want free.”

Create Data Destruction Policies

How and Why to Create Data Destruction Policies. Mark Grossman and Tate Stickles. Computerworld. 23 June 2009.
This column looks at creating an effective data destruction policy. Having a consistent data destruction policy followed by everyone at all times is vital. Consistency is key. Your data destruction policy needs to address how to classify and handle each type of data residing on your media. Educate your people and verify they are complying with your policy.

Document the entire data destruction policy so you will know what media is sanitized and destroyed. Your documentation should allow you to quickly answer those who, what, where, when, why, and how questions.

An important step of an effective data destruction policy is to have a process in place so you can follow up with regularly scheduled testing of your process and media to ensure the effectiveness of your policy.

Tuesday, August 11, 2009

Online textbooks gaining popularity, changing how students study

Online textbooks are gaining popularity, changing how students study. Dani Martinson. Missourian. August 6, 2009.
Online textbooks can provide additional information and resources for students, including direct links to audio and video. Digital textbooks are usually 50%cheaper than regular textbooks, though there is no buyback, and the books are often available only for a semester. Information can be updated easier and more frequently. A study found that the professors were more accepting of digital textbooks than students. They expect the demand will increase when the digital content is specifically designed for digital, rather than just a PDF version of the printed textbooks.

Measuring Mass Text Digitization Quality and Usefulness

Measuring Mass Text Digitization Quality and Usefulness. Simon Tanner. D-Lib Magazine. July/August 2009.
This article discusses the accuracy of Optical Character Recognition (OCR) output in a way that is relevant to the needs of the end users of digital resources. It looks at the benefits to be gained from measuring not just character accuracy but also word and significant word accuracy.

Tuesday, August 04, 2009

Digital Preservation Matters - 4 August 2009

OpenWMS: Workflow Management System for Digital Objects. Rutgers. August 2009.

Rutgers has released their OpenWMS software for creating metadata for analog and digital materials. It is platform-independent and open source. The web-accessible system that can be used as a standalone application or integrated with other repository architectures. It provides a complete metadata creation system with services to ingest objects and metadata into a Fedora repository and can export these objects and metadata, individually and in bulk in a METS/XML Wrapper.

RODA Open Source Repository for Archives. July 2009.

RODA is an open source digital repository specially designed for Archives, with long-term preservation and authenticity as its primary objectives. Created by the Portuguese Directorate-General for the Portuguese Archives and the University of Minho, it was designed to support the most recent archival standards and become a trustworthy digital repository. Try an online demo at

To download the full installation package or sources go to:

To register and participate in discussion forums and report issues

Digital Preservation Survey. Fedora Commons. July 2009.

The Fedora Preservation Solutions Community survey was created to gather information about and examples of digital preservation developments, practices, and needs, regarding the management of digital content in repositories, specifically using open source software like Fedora. The results were not specific to Fedora users. The results of the survey were shared at the Open Repositories Conference (May 18 – 21, 2009) in Atlanta, GA. The survey and the results are available from this site. Some items of interest in the results:

  • 55.7% who answered are currently archiving and preserving digital materials
  • 36.9% are planning to preserve digital materials
  • 71.6% are using an open source platform for their digital archive
  • 45.0% use Fedora, 43.3% use DSpace

The Preservation and Archiving Solution Community posted these above three items on the Fedora Commons website. The wiki and listserv are open to all who are interested in archiving and digital preservation.

The Research Library’s Role in Digital Repository Services. A report of the ARL Digital Repository Issues Task Force. Association of Research Libraries. January 2009. [52p. pdf]

Digital repositories are a key element of research cyber infrastructure. The repository services are built on a foundation of content, context, and access, which need to be balanced. They are still developing. Digital is not just a new way to collect and distribute, but it has brought new kinds of content and services. Institutions produce large and ever-growing quantities of data, images, multimedia works, learning objects, and digital records. The repositories should not be managed as isolated collections. They are about the users as much as the content, and services need to be developed to meet the needs. “As research libraries embark on repository service development, they enter a brand new business in many ways.”

Sustainability is about the institutional commitment and the ability to create persistent structures. Libraries have a key role in the new informational structures. Libraries should look at these areas:

  1. Understand needs of users and creators in order to develop repository-related services.
  2. Use a life-cycle management framework to guide services and policies.
  3. Express the value of repository services to justify resources, promote partnerships, efforts.
  4. Integrate collections into emerging services that are outside of library-managed repositories.
  5. Participate in shaping the technology of repositories and service mechanisms.

Important actions for libraries include:

  1. Build new kinds of partnerships and alliances, within and between institutions.
  2. Develop service strategies based on assessment of local needs
  3. Develop outreach and marketing strategies to connect others to the library environment
  4. Define responsibilities to guide the development of repository services for different types of content.

Books Online:

Amazon deal to reprint rare books. BBC News. 22 July 2009.

Amazon is working with the University of Michigan to provide reprints of 400,000 rare, out-of-print and out-of-copyright books. The books will be printed in soft cover editions. Items that have been out of print for years will “be able to go back into print, one copy at a time”.

Harvard U. Press to Sell 1,000 Books Online. Marc Beja. The Chronicle of Higher Education. July 22, 2009.

Harvard University Press created a profile with Scribd, and the press has already posted hundreds of works for download. They are charging for the materials. Others, such as New York University and MIT have also posted items on the website, but do not charge.