Friday, March 14, 2008

Digital Preservation Matters - 14 March 2008

The Diverse and Exploding Digital Universe. An Updated Forecast of Worldwide Information Growth Through 2011. John F. Gantz. IDG. Whitepaper. March 2008. [PDF]

The paper, created by IDG and sponsored by EMC, looks at the extent of the digital objects in the world. Their estimates put the 2007 digital resources at 281 exabytes. It is expected to increase tenfold by 2011. The amount created exceeded the total storage capacity available. Not everything is stored, and not everything needs to be kept. To deal with this volume of information, institutions must:

  • Include more than just IT in the processes. It is not just a technical problem
  • Develop policies for the creation, storage and retention of the material
  • Develop new tools and standards to handle large volumes of digital materials

A simple email with a 1MB attachment may take as much as 50MB of storage in all the replications and backups. The ‘digital shadow’ about a person is larger than the data they create themselves. The increased information is mostly visual in nature.



Long-term preservation costs - some figures. Available Online. Website. March 10, 2008.

The Archaeology Data Service has published revised costs for various digital preservation tasks. The service charges users a preservation cost when the items are ingested into the system: “asking researchers who work on fixed term projects to pay annual costs for storage is just not feasible.” The costs increase with the size and complexity of the data. For the example given of 2000 images, the estimated preservation cost would be about 3.5% of the total funds. The full charging policy is available at the Archaeology Data Service website.



Digital Library Federation Panel discusses Moving Image Preservation. Library of Congress. Website. March 2008.

The Library has made a presentation from Carl Fleischhauer available that looks at preserving moving images. It looks at encoding, wrappers, formats, metadata, and more. Some projects save video signal without compression which can be 70 – 100 GB per hour. Others use a lossless or lossy compression. The most frequent lossy compression used is MPEG-2, which is about 28GB per hour. The Material Exchange Format (MXF) is one wrapper that is being developed. The PB Core specification is one example of video metadata being worked on. The ending statement is: “work is under way but there is still plenty to do!” The slides and notes are available in PDF format at Video Formatting and Preservation DLF Presentation.



Sun and Fedora Introduce a Petabyte-scale Object Store. Fedora Commons News. February 20, 2008. [PDF]

A system with Fedora and the Sun storage system is described in a 6 page brochure. The integration was done with the help of researchers at Johns Hopkins University, and integrates the Fedora software with the storage system. Oxford University Services are in the process of deploying Fedora and Sun as part of a repository framework for all digital in their libraries. “…Sun is virtually guaranteeing that there will be no format obsolescence.

Friday, March 07, 2008

Digital Preservation Matters - 07 March 2008


National Archives Chooses Digital Vision to Automate Film and Video Restoration. Press Release. March 5, 2008.

NARA has selected Digital Vision to digitize and restore some of their 700,000 titles, in order to preserve them and make them available for public access. The solution, which requires minimal operator intervention, includes high-speed, 4K and 2K scanning and telecine systems. Both the film and video footage will be color corrected, along with sound normalizing and scratch/dust removal.


Scoping study for a registry of electronic journals that indicates where they are archived. JISC. 14 January 2008. [pdf]

A study to determine the scope and feasibility of a registry for archived e-journals. There is no single view of what constitutes a registry. Many feel it should be a place where the information is gathered, audited, then made available to local databases. Some ask why they are paying for the services when others also benefit without paying. A registry should contain the information on where an item is and how to access it. The study identified ten basic characteristics of digital preservation repositories:

  1. The repository commits to ongoing maintenance of digital objects for their communities.
  2. Demonstrates organizational ability to fulfill its commitment.
  3. Acquires and maintains contractual and legal rights and fulfills responsibilities.
  4. Has an effective and efficient policy framework.
  5. Acquires / ingests digital objects based on criteria that fit its commitments and capabilities.
  6. Maintains/ensures the integrity, authenticity and usability of digital objects over time.
  7. Creates and maintains preservation metadata about creation, maintenance, and actions taken.
  8. Fulfills dissemination requirements.
  9. Has a strategic program for preservation planning and action.
  10. Has technical infrastructure to adequately maintenance its digital objects


New Digital Preservation Newsletter from the Library of Congress. Press Release. March 3, 2008.

The Library of Congress has started a digital preservation newsletter. The March edition includes information on state partnerships, digital video reformatting, preservation tips, and NDIIPP information.


Publishers Phase Out Piracy Protection on Audio Books. Brad Stone. The New York Times. March 3, 2008.

Some of the largest publishers are removing the Digital Rights Management protections on audio books. This will allow the files to be copied to different devices and will allow retailers to sell content that will work on all digital devices. Random House was the first to move away from DRM and others appear to be following.

Friday, February 29, 2008

Digital Preservation Matters - 29 February 2008

Preservation in the Age of Large-Scale Digitization: A White Paper. Oya Y. Rieger. CLIR Reports. February, 2008

This looks at four large scale digitization efforts (Google Book Search, Microsoft Live Search Books, Open Content Alliance, and the Million Book Project) to identify issues that arise from the projects. Some notes:

  • Access and preservation goals are usually interrelated in a complex connection
  • There is not yet a common understanding of what preservation responsibility is
  • Making sure content is accessible over time may be different from preserving it
  • For brittle books, the digital copy may be the only one that survives into the future
  • Open Content Alliance maintains data in multiple repositories in order to preserve the files, test the preservation action, and restore lost files.
  • “Preserving digital objects entails the preservation of digitized materials, including those resulting from the reformatting process, to ensure their longevity and usability”
  • Preservation metadata is “the information a repository uses to support the digital preservation process”
  • “Technology alone cannot solve preservation problems. Institutional policies, strategies, and funding models are also important.”
  • “The challenge is not only to incorporate the preservation mandate in organizational mission and programs but also to characterize the goals in a way that will make it possible to understand the terms and conditions of such a responsibility.”
  • “There are also significant differences between a preservation program that focuses on bitstream preservation and one that encompasses the processes required to provide enduring access to digital content.”
  • A great section to read: Key Organizational-Infrastructure Requirements for Preservation Programs
  • The organizational preservation infrastructure (mandate, governance, funding) is critical for success.
  • Expanding educational opportunities for preservation and curation staff involved is important
  • Developing a common archival strategy is a complex process. Collaboration is important.
  • We need to reassess requirements and practices and devise policies for designating digital preservation levels



Archiving agreement Portico and KB. Press Release. Koninklijke Bibliotheek News. 26 February 2008.

The Royal library and Portico have agreed that an off-line copy of the Portico archive of over 60 million files will be held for safekeeping by the library. The copy will be kept in a secure access- and climate-controlled facility. This is one component of Portico’s strategy to ensure the security of the archive. This is also one way in which organizations can cooperate to preserve digital materials.

"Preserving electronic scholarly publications is a key priority for the KB, and formalizing this arrangement with Portico is a natural extension of the KB's active archival role."


OpenSolaris Project: HoneyComb Fixed Content Storage. Web site. February 27, 2008.

Sun Microsystems has donated the source code for the Sun StorageTek 5800 System, a digital archive storage system using the Solaris Operating System and open source software. Developers can download the code that runs on any x86 system for free Sun Makes Digital Archiving Free, Open.


Friday, February 22, 2008

Digital Preservation Matters - 22 February 2008

JPEG 2000 - a Practical Digital Preservation Standard? Robert Buckley. Digital Preservation Coalition. February 2008.

The Jpeg 2000 is an open standard developed by the ISO JPEG committee to improve the existing jpeg format. It is platform independent. The format includes:

• Lossless and visually lossless image compression
• Multiple derivative images from a single image
• Progressive display, multi-resolution imaging and scalable image quality
• The ability to handle large and high-dynamic range images
• The ability to interactive zoom, pan, and rotate.
• Metadata support

It is being used increasing for archival images. “Most applications, including those in the digital preservation domain, can meet their needs with JPEG 2000 codestreams…” With Jpeg 2000 you can make subsequent derivatives from a compressed image without having to decompress or recompress it. The Digital Preservation Coalition endorses this as a great step for digital preservation.



Microsoft Embraces Open Source. Elizabeth Montalbano. PC World. February 21, 2008.

Microsoft promised "greater transparency" in its development and business practices, and more access to proprietary protocols for Windows and Office and other software products. The new interoperability principles and actions should “ensure open connections, promote data portability, enhance support for industry standards, and foster more open engagement with customers and the industry, including open-source communities.” Microsoft is publishing documentation on protocols that were previously under trade secret licenses, and providing a covenant not to sue open-source developers for using the protocols. They will also start a Document Interoperability Initiative to address data exchange issues between formats.



Microsoft's New Stance and Data Preservation. Melissa Perenson. PCWorld Blog. February 21, 2008.

There is potential in Microsoft's statement that they are making changes to their “technology and business practices to increase the openness of its products and drive greater interoperability.” Archiving no longer just refers to the government or large institutions preserving their data. Media longevity is only part of the problem; the other is data format longevity. Microsoft needs to make sure that the file formats are readable in the future.



Toshiba Gives Up On HD DVD, Ends High-Def Format War. Antone Gonsalves. InformationWeek. February 19, 2008.

Toshiba has announced it will no longer make or market HD DVD players and recorders, which leaves the Blu-ray as the standard for the high-definition disc format. Partners who had dropped their support for HD DVD said that they wanted to eliminate the customer confusion over the incompatible technologies.

Friday, February 15, 2008

Digital Preservation Matters - 15 February 2008

How can we be sure we'll remember our digital past? Chris Gaylord. The Christian Science Monitor. February 14, 2008.

The same things that make digital so easy are the same things that make preservation difficult. “Losing personal files can be upsetting. But failing to protect academic, government, or corporate data could erase irreplaceable pieces of history.” The problems include the media and the formats. "It's the great challenge of the Informa­tion Age." Methods include emulation, writing programs to run the original programs on new equipment, and migration, updating the file without changing the content. Some are working on access models to pay for the preservation. There are several ways to address saving data, but you must know your needs and choose carefully.


Roundup of commentary on Harvard OA policy. Gavin Baker. Open Access News. February 13, 2008.

The Faculty of Arts and Sciences at Harvard adopted a Self-Archiving Mandate. The objective is to provide Open Access to its own scholarly output by saving the articles to its institutional repository. This has also mandated that the university be able to exercise the copyright for those articles. “In legal terms, the permission granted by each Faculty member is a nonexclusive, irrevocable, paid-up, worldwide license to exercise any and all rights under copyright relating to each of his or her scholarly articles, in any medium, and to authorize others to do the same, provided that the articles are not sold for a profit.” The Provost has said: “The goal of university research is the creation, dissemination, and preservation of knowledge….”


Supporting Digital Preservation & Asset Management in Institutions. Maureen Pennock. JISC. January 2008. [pdf]

This is a summary and reference list of various projects, models, workbooks, and resources, dealing with digital preservation and asset management. Some notes from the projects:

  • Automated tools are still not capable of assigning a value to digital assets, a vital step in determining where preservation resources are best spent.
  • Preservation is one part of the life cycle of the information; stages include creation, accession, ingest, metadata, storage and technical storage systems, access, and re-use
  • Metadata collection from depositors must be as simple as possible, and that automatically captured or generated metadata is the key to both information retrieval and information services.
  • An assessment of formats in the institutional repository is the first step to develop and implement a preservation action and a technology watch service.
  • Preservation and digital asset management is, in every case, wholly reliant upon one thing: money.
  • Preservation should not be considered as an end in itself: it should be considered within the life cycle of digital object management.


New Sharp 250mW Blu-ray Laser enables 6x faster Recording. Press release. February 14, 2008.

The company has introduced new components that will allow faster recording of blu-ray discs. The units will be available later this year.

Friday, January 04, 2008

Digital Preservation Matters - 04 January 2008

Sound Directions: Best Practices for Audio Preservation. Mike Casey, Bruce Gordon. Indiana University, Harvard University. 4 December 2007.

In 2005 Indiana University and Harvard University, with funding from the National Endowment for the Humanities, joined on a project to develop best practices for audio preservation. They have now released their report which has produced four key results:

1. publication of our findings and best practices,

2. development software tools for audio preservation,

3. audio preservation systems at each institution, and

4. preservation of critically endangered and highly valuable recordings.

The chapters of the 168 page document are divided into an overview of the key concepts for collection managers and curators, and a section of recommended technical practices for audio engineers, digital librarians, and other technical staff. The appendices include XML and METS documents, and also a listing of the 40 open source software tools that will be available in toolkit in 2008. The document chapters include: personnel and equipment needed; digital files; metadata; storage; preservation packages and interchange; systems and workflows; and a summary of best practices. Some notes from the report:

  • It is critical that audio preservation systems use technologies, formats, procedures, and techniques that conform to international standards and best practices.
  • The Broadcast Wave Format itself has become a de facto standard in the audio world.
  • Preservation transfer work is best undertaken in a studio designed as a critical listening space.
  • there is no substitute for experience with audio formats and equipment.
  • for both technical and economic reasons, the preservation of audio must rely upon transfer to, and storage in, the digital domain.
  • The wav or broadcast wav file is the best target preservation format.
  • the integrity of every file created for preservation must be verified over time
  • They use the MD5 hash algorithm for verifying data integrity of every file
  • Without metadata, digital audio preservation is not possible
  • Once content is digitized, the new strategy relies upon regular migration from one carrier to another in the digital domain
  • They have chosen LTO tape for their storage media over DVD and hard drives
  • The Fedora repository system is being used as the basis for the IU digital preservation repository.
  • Selection often consists of an assessment of both research value and preservation condition
  • Develop a prioritized list of recordings / collections for preservation treatment based on, at least, an analysis of research value and preservation condition
  • Audio preservation systems in the digital age necessarily require greater specialization, and more collaboration among specialists, than analog counterparts
  • preservation is partly a race against time due to format deterioration and obsolescence


Challenges of Film, Video, and New Media Preservation. Educause Live! December 19, 2007

This is an audio presentation on the challenges of film, video, and media preservation, including new types of media, obsolescence, copyright, and deterioration of objects. Over 50% of all moving images before 1950 have vanished. We need to look at preservation in a new way, don’t worry about the ‘original’ or physical embodiment as much; constantly reformat; long-term planning and administration is essential. We need a managed environment and monitor objects as well as the technical environment periodically. Also includes PowerPoint slides.


Google Plans Service to Store Users' Data. Kevin J. Delaney. Wall Street Journal. November 27, 2007.

Google, in trying to increase web-based computing, is developing a service to let users store contents of their computers on the internet. This would include files such as word-processing documents, digital music, video clips and images. Users could access their files via the Internet from different computers and share them online with friends. The service would face questions on issues such as data privacy, copyright, cost, and technical challenges of offering service without interruption.


China IC designers boosting use of RealNetworks file format in digital media players. Jimmy Hsu, Adam Hwang. Digitimes. 17 December 2007.

Integrated chip designers in China have been promoting solutions for digital media players based on the Real Media (RM) and Real Media variable bit rate (RMVB) multimedia file formats. The formats are able to compress a 3-4GB video into a 250-400MB file.

Friday, December 14, 2007

Digital Preservation Matters - 14 December 2007

CNI in DC: Integrated Digital Library on the Fedora Platform. David Kennedy. December 12, 2007.
This is one item in a blog report of the CNI conference and the Digital Curation Conference: National Perspectives conference. It is worth reading the others also. University of Maryland uses Fedora not for the IR (they use DSpace), but for the digital collections. They wanted to use it to build in sustainability and transitions. Some of their organizational issues were institutional support, development time, off the shelf vs. Fedora-type system, and others. It took almost 18 months of development. They found working with Fedora similar to java, and "programmer friendly." They use a hybrid metadata schema with METS wrappers. What have they learned?
  • metadata - uses a complex schema, but don't force users to understand the underlying schema
  • authentication - not dealt with yet, but need to do more work
  • archival storage - greater need for more space
  • need to have Quality Control standards when modifying objects and creating metadata

They have at least three or four developers working on the project, as well as a number of other team members. Since they use their own metadata scheme, it may not be possible to offer their work to others, so if they were to do it again, they may use a standard metadata schema.


New 1 day AIIM PDF/Archive Training Program. Atle Skjekkeland. AIIM Knowledge Center Blog. December 12, 2007.
The AIIM organization intends to introduce a new PDF/A training program next year. It will be focused on the use of PDF/A and its use as a file format in the archiving of data. The concept of PDF/Archive began as an AIIM standards committee in 2002 and has been accepted as an ISO standard.


Digital Preservation Pioneers: Margaret Hedstrom. Resource Shelf. December 13, 2007.
A brief bio about Margaret Hedstrom who has done a great deal for digital preservation. Her works include several articles that are definitely worth reading: Digital Preservation: A Time Bomb for Digital Libraries, It’s About Time, Invest to Save, and Incentives for Data Producers to Create Archive-Ready Data Sets.


Pooling Scholars’ Digital Resources. Andy Guess. Inside Higher Ed. December 12, 2007.
Access to documents and copyright issues have been two factors slowing the development of online scholarly repositories. George Mason University seeks to bypass libraries entirely and go directly to scholars by creating an open archive of scholarly resources in the public domain. They are creating a way for scholars to upload existing documents, make them text –searchable, and put them in a database available to the public. It will use the Zotero plug-in for Firebox, which stores web pages, collects citations and lets scholars annotate and organize online documents. It is funded by a two year Mellon grant.


Manakin: A New Face for DSpace. Scott Phillips et al. D-Lib Magazine. November/December 2007.
The increasing online scholarly communication makes digital repositories more important for preserving and managing information. This looks at Manakin which was designed to help create individual, customized repository interfaces separate from the underlying repository, which is currently DSpace. It helps a library ‘brand’ its content, better understanding of the metadata, and provides tools to create extensions of the repository. It uses schema, aspects and themes as the basic components. There is a movement to adopt Manakin as the default DSpace user interface.


SOA. IT Strategy Guide. Dave Linthicum. InforWorld. December 10, 2007. [pdf]
The essence of an organization must be identified so all activities influencing that can be identified and improved. This is the first step in realizing the benefits of a service-oriented architecture (SOA). This requires not only technology, but also a shift in the way business and IT work together. Organizations need to adopt clearly defined roles within an organization, allowing the stakeholders to understand each other’s goals and tasks. This includes understanding both the human aspects and the lifecycle management of the services. Management support for the strategy is crucial. This requires an investment in people and technology to establish the appropriate context for the strategy. “the hardest part isn’t the technology; it’s redrawing the business processes that provide the basis for the architecture — and the often contentious reshuffling of roles and responsibilities that ensues. It is important to define the value, get investment and commitment from the top, and concentrate on the long term.”


Census of Institutional Repositories in the U.S. Soo Young Rieh, et al. D-Lib Magazine. November/December 2007.
There are great uncertainties underlying institutional repositories regarding practices, policies, content, systems, and other infrastructure issues. This article looks at IR’s in five areas: leaders, funding, content, contributors, and systems, and how they are perceived. Some notes:

  • college and university libraries are the driving force behind most IRs,
  • vast majority of survey respondents have done no planning of IRs to date
  • only 10.8% respondents have actually implemented an IR
    • 52.1% have been operational less than one year,
    • 27.1% have been operational between one and two years,
  • respondents agree that the funding comes or will come from the library, typically by absorbing costs into routine library operating expenses
  • Majority of existing IRs contain fewer than 1000 items
  • DSpace is the most prevalent system for pilot-testing and use. Fedora and ContentDM are regularly pilot-tested but rarely implemented.

“Once each academic institution has a clear vision and definition of what the IR will be for its own community, subsequent decisions such as content recruitment, software redesigning, file formats guaranteed in perpetuity, metadata, and policies can flow from that vision.”

Friday, December 07, 2007

Digital Preservation Matters - 07 December 2007

Ten years after. Priscilla Caplan. Library Hi Tech. Editorial. Vol. 25 N. 4 2007.

This editorial from Priscilla reflects on the progress made in digital preservation in the past 10 years. Digital preservation in no longer a little known concept, but a problem to be solved. It is part of the mainstream. Much has been accomplished, though there is still a lot of progress to be made. Europe has a different approach; it sees this as “part of a set of curation activities.” Their approach would “help reduce our apparent confusion between institutional repositories and preservation repositories.” Few institutions will have the resources to run a true preservation repository. “Digital curation may be departmental, and archiving institutional, but I believe preservation will have to be consortial.” The US approach has been to focus on short term projects rather than long term infrastructure. There are still some basic infrastructure needs: schema, conversion utilities, and registries. We also need to develop centers to promote and assist digital preservation. We need to provide more education for both data creators and data curators.


Standards Group Accepts PDF. Sumner Lemon. IDG News Service. December 05, 2007.

Adobe PDF 1.7 has been approved as an ISO standard. The ballot for approval of PDF 1.7 to become the ISO 32000 Standard was passed by a vote of

13-1. Specialized subsets of PDF (PDF/Archive etc) had been proposed or approved as standards by ISO. The approval of PDF 1.7 is now an "umbrella" standard to unify these different subsets. Adobe gives up some control over the development of future versions.


Project SPECTRa: JISC Final Report. March 2007.

The principal aim of the SPECTRa project (Submission, Preservation and Exposure of Chemistry Teaching and Research Data) was to provide the high-volume ingest and reuse of experimental data through institutional repositories. It used the DSpace platform because of existing infrastructure and previous experience. They developed Open Source software tools and customizations which could easily be incorporated within chemists' workflows. Metadata was based on Dublin Core. They felt that serious preservation work must be at the institutional, rather than departmental, level. The metadata, identifiers, and normalizing data in open formats would make long-term preservation more possible. Preservation of chemistry data file formats is a difficult area. Their approach was to capture essential metadata at submission or extract it automatically from the data files if possible. All files should be validated against specifications. Depositing files in an institutional repository should guarantee against the loss or corruption of the raw data, but this is insufficient to ensure future usability. A policy of format migration will be necessary for much of the data.

Other project's findings included:

• it has integrated the need for long-term management of experimental chemistry data with the maturing technology and organizational capability of digital repositories;

• scientific data repositories are more complex to build and maintain than are those designed primarily for text-based materials;

• the specific needs of individual scientific disciplines are best met by discipline-specific tools, though this is a resource-intensive process;

• institutional repository managers need to understand the working practices of researchers in order to develop repository services that meet their requirements;

• IPR issues relating to the ownership and reuse of scientific data are complex, and would benefit from authoritative guidance based on UK and EU law.


Google Plans Service to Store Users' Data. Kevin J. Delaney. Wall Street Journal. November 27, 2007.

Google is developing a service to let users store contents of their computers, such as word-processing documents, digital music, video clips and images. It would let users access their files via the Internet from different computers and share them online with friends. The service would face questions on issues such as data privacy, copyright, cost, and technical challenges of offering service without interruption.


Iron Mountain Acquires Xepa Digital, LLP. Press Release. November 19, 2007.

Iron Mountain acquired Xepa, a company that deals with converting analog and out of date digital audio and video to high resolution digital file formats. They will offer on-site digital conversion for the items being stored.


Saturday, December 01, 2007

IT Disasters

The top 10 IT disasters of all time. Colin Barker ZDNet.co.uk. 22 Nov 2007.

A list of some of the worst IT-related disasters and failures caused by faulty hardware and software or human error.

  1. Faulty Soviet early warning system nearly causes WWIII (1983)
  2. The AT&T network collapse (1990)
  3. The explosion of the Ariane 5 (1996)
  4. Airbus A380 suffers from incompatible software issues (2006)
  5. Mars Climate Observer metric problem (1998)
  6. EDS and the Child Support Agency (2004)
  7. The two-digit year-2000 problem (1999/2000)
  8. When the laptops exploded (2006)
  9. Siemens and the passport system (1999)
  10. LA Airport flights grounded (2007)

Friday, November 30, 2007

Digital Preservation Matters - 30 November 2007

Council Conclusions on scientific information in the digital age: access, dissemination and preservation. The Council Of The European Union. November 2007.

The Council of the European Union presents some conclusions regarding digital preservation and recommendations during the next few years:

  • access to and dissemination of scientific information is crucial and can help accelerate innovation;
  • effective digital preservation of scientific information is fundamental for current and future development of research
  • it is important to ensure the long term preservation of scientific information, publications and data, and include scientific information in preservation strategies;
  • monitor good practices for open access to scientific information and development new models
  • experiment with open access to scientific data and publications to understand contractual needs
  • encourage research and experiments into digital preservation on deploying scientific data as widely as possible for open access to and preservation of scientific information.


Shifting Gears: Gearing Up to Get Into the Flow. Ricky Erwayr. OCLC. October 2007.

Efforts to digital special collections mean we need to re-look at what we are doing. Do we digitize for access or preservation, or both. How do our selection criteria affect the digitizing efforts. Access is important. We should preserve the unique items to the best of our ability, but it doesn’t mean we only have once chance to do it right. We may want to re-digitize when the technology improves. Scan items as part of the initial accessioning process; create a single unified process. Metadata can be improved as needed; it can be an iterative approach. Move to a program approach, not just special projects. It should be part of the regular budget. To do a better job we need to “integrate digitization into all workflows and user services”.


Digital library surpasses initial goal of 1 million books. International Herald Tribune. November 27, 2007.

The Universal Library project has surpassed its latest target, having scanned more than 1.5 million books. At least half the books are out of copyright or scanned with the permission of copyright holders. The library's mission is to make information freely available and to preserve rare and decaying texts. It is the largest university-based digital library of free books and its purpose is noncommercial. The library has books published in 20 languages, including 970,000 in Chinese, 360,000 in English, 50,000 in the southern Indian language of Telugu and 40,000 in Arabic.


Presentations from iPRES - 2007 International Conference on Preservation of Digital Objects. National Science Library . November 2007.

This site contains many pdf files of the presentations given at the October iPres conference in China. These are interesting to review. Some that I found particularly useful include:

  • Exploring and Charting the Digital Preservation Research Landscape, Seamus Ross
  • Chinese Digital Archival Network of Foreign STM Material, Xiaolin Zhang
  • A Practical Approach to Digital Preservation: Update from PLANETS, Helen Hockx-Yu
  • Challenges of Digital Preservation: Early Lessons from the Portico Archive, Eileen Fenton
  • Developing a CAS E-Journal Archiving System, Zhixiong Zhang
  • Comparative Evaluation of Major IR Systems for Preservation, Ting Zeng
  • New Partnerships for Scientific Data Preservation and Publication Systems, Zhongming Zhu


Towards the Australian Data Commons: A proposal for an Australian National Data Service. The ANDS Technical Working Group. October 2007.

This paper, among other topics, discusses the reasons to focus on data management, the issues, and the programs to deliver the data. While the paper looks specifically at a national data service, there are aspects that are useful for local digital preservation. Here are some interesting notes from it.

  • Important activities include identifying and deploying policies and technologies to allow users to gain seamless access to data collected within multiple institutionally operated repositories.
  • The intent is to provide common services to support research to make it easier to discover, access, use, analyze, and combine digital resources as part of their activities. They should also support and advise researchers and research data managers about appropriate digital preservation strategies.
  • We are in a data deluge. It can only continue and grow in intensity as the number, frequency and resolution of data sources rises and as information becomes universally ‘born digital’.
  • Data is an increasingly important and expensive ingredient of research activities and needs increasing attention to be managed efficiently and effectively.
  • The sponsors of data capture and care should help determine the accessibility of the data
  • Not everyone can use the same solution, so there may need to be multiple responses.
  • There should be a registry of repositories with services offered
  • Provide assistance to others on adopting the plans and getting the service they need.
  • Collecting and managing the metadata is critical. Best to collect early and automatically.

The data service believes it can contribute most effectively by developing services and activities that enable stewardship within multiple federations of data management and data user communities.

In ten years time, it will be successful if:

  • A data commons exists in a network of research repositories and the data is discoverable;
  • Researchers and data managers perform well with well formed data management policies;
  • More research data is routinely deposited into stable, accessible and sustainable environments;
  • More people have relevant expertise in data management


Stewardship of digital resources involves both preservation and curation. Preservation entails standards-based, active management practices that guide data throughout the research life cycle, as well as ensure the long-term usability of these digital resources. Curation involves ways of organizing, displaying, and repurposing preserved data.


Friday, November 16, 2007

Digital Preservation Matters - 16 November 2007

Electronic Records Management and Digital Preservation: Protecting the Knowledge Assets of the State Government Enterprise. Eric Sweden. NASCIO. October 2007. [pdf]

Electronic records management and digital preservation must be a shared responsibility, including understanding and support, from the CIO. Everyone needs to be part of managing digital assets. These initiatives must be managed on the organizational level. The team needs enterprise architects, project managers, electronic records managers, librarians and archivists to ensure the knowledge assets are managed properly. Technology create both opportunities and challenges. The goal of Digital Preservation systems is to make sure the information they contain remains accessible to users over a long period of time. A challenge is to keep bit streams intact and usable long term. You need to know what to preserve and how to preserve the records. The strategy must address preservation for the life of the record. There is not a single best way to preserve digital materials. Digital materials do not allow preservation procrastination. If a record needs to be maintained for over 10 years, the original technology will probably be obsolete. Digital Preservation must be a routine operation, not a special event.


RSA 2007: long-term data storage presents legal risks. Ian Grant. Computer Weekly. 23 Oct 2007.

Art Coviello, executive vice-president of EMC, stated at a conference that storing every piece of data long term may place organizations at risk of legal liability. The organization needs to know what data they have, who is looking at it and what they are doing with it. They should classify data and users before they store data. This is needed to protect the data and to reduce information clutter.


Keep 'Smoking Gun' E-Mails From Backfiring. H. Christopher Boehning, Daniel J. Toal. New York Law Journal. October 25, 2007

While this is written from a legal and not archival perspective, the article discusses the importance of validating / authenticating electronic documents. It lists the legal rules for authenticating emails and other electronic documents, including:

  • testimony by a witness with knowledge of the object;
  • circumstantial means ("appearance, contents, substance, internal patterns or other distinctive characteristics, taken in conjunction with circumstances," such as the email address;
  • hash values that serve as a digital fingerprint; comparison to existing documents;
  • self authentication of items with labels, tags, or ownership marks.


The Aftermath: Examining the E-Discovery Landscape After the 2006 Rule Changes. Eric Sinrod. FindLaw. October 16, 2007.

Another article emphasizing the importance of records management plans for electronic data. It mentions that “Data can be located live on networks, servers, hard drives, laptops, PDAs and on backup tapes.” Purging according to retention policies is important. Data may be required in ‘native’ format with all metadata intact.


‘Digital curators’ lead cultural IT projects. Shane Schick. ComputerWorld Canada. 8 Nov 2007.

As cultural organizations try to reach new audiences online and integrate their collections into multimedia-friendly exhibits, they are starting to face the same challenges as others who have been moving away from paper-based processes. These challenges include not only figuring how to digitize content but what gets preserved first, what can wait and what doesn’t need to be digitized at all. Institutions face the difficulty of trying to preserve something indefinitely, without knowing how formats might change over time. They must collecting the right hardware and software along with the content itself. “Archives are now building in budgets for migration strategies for data.”


Friendly Advice Machine. John Cleese. Iron Mountain. October 2007.

On the lighter side: For those with an interest in digital archiving and secure storage, and a ‘British’ sense of humor, these clips may be of interest.



Friday, November 09, 2007

Weekly Readings - 9 November 2007

HD Photo to become JPEG XR. Stephen Shankland. CNet News. November 2, 2007.
The Joint Photographic Experts Group has approved Microsoft's HD Photo format as a standard called JPEG XR. This is an important step to make the format neutral. It is designed for the next generation of digital cameras and was based on Microsoft’s Windows Media Format. Microsoft is committed to make the patents available without charge. The standardization process typically takes about a year. (See also http://www.jpeg.org/newsrel19.html).

PRONOM and DROID - new versions released. Neil Beagrie. National Archives UK. November 2, 2007.
The National Archives in the UK has released new versions of PRONOM and DROID. PRONOM is an online registry of file formats, software, and other technical information used for digital preservation purposes, available at http://www.nationalarchives.gov.uk/pronom. DROID (Digital Record Object Identification) is open source software at http://droid.sourceforge.net/ that is used to identify file formats in batch mode. They are freely available.

An overview of LOCKSS, how it works, and issues related to it. (LOCKSS, developed at Stanford University, stands for Lots of Copies Keeps Stuff Safe.) One of the main issues surrounding it is the issue of trust. “Trusting a single provider, a single institution, and a single archive represents the real risk”. LOCKSS is built on the principle of building confidence in the archive. LOCKSS was built to archive electronic journals but has been enhanced to also archive blogs on Google’s Blogger.

Looking Ahead. Lee J. Nelson. Advanced Imaging Magazine. November 9, 2007.
The article looks at some of the industry trends. Included is an announcement on an HD Photo Plug-in for Adobe Photoshop. “HD Photo is geared for end-to-end digital photography, offering better image quality, greater preservation of data and advanced features. Its still image codec for continuous-tone images is underpinned by lossy and lossless compression, multiple colorspaces, wide dynamic range and extensive metadata.”

Government Pledges £25m To Preserve Uk's Film Archives. 24 Hour Museum. October 17, 2007.
The British government has taken steps to preserve the country’s film archives. They have given money to the UK Film Council to secure the films in the archives. “It’s absolutely right that they should be safe and accessible for future generations.” The £25million plus £3million are to be used to preserve, restore and increase access to the collections, some of which are deteriorating and in danger of being lost.

The Library and Xerox are studying the potential of using the JPEG 2000 format in large repositories of digital materials. The project is designed to help develop guidelines and best practices for digital content. The trial will include up to 1 million tiff images to be converted to JPEG 2000. Xerox will build and test the system, and they look specifically to create profiles for the objects. Xerox already created a profile for using the JPEG 2000 format for newspapers.