Digital Preservation Matters: Preservation Readings 7 October 2005

New ISO standard will ensure long life for PDF documents. ISO Press Release. 7 October 2005. [Updated link, August 7, 2015.]

The PDF and archival PDF file formats have been approved as ISO standards. The standard “enables organizations to archive documents electronically in a way that will ensure the preservation of content and visual appearance over an extended period of time. It also allows documents to be retrieved and rendered with a consistent and predictable result in the future, independent of the tools and systems used for creating, storing and rendering the files.” This will have a significant impact on the digital preservation community. It will allow documents to be delivered in a standard way for a long time. "PDF/A files will be more self-contained, self-describing, device-independent than generic PDF 1.4 files, and should allow information to be retained longer as PDF." It is estimated that over 9% of the surface web consists of PDF documents. The current standard is ISO 19005, Document management – Electronic document file format for long-term preservation – Part 1, Use of PDF 1.4 (PDF/A-1). Future updates will provide compatibility with additional changes to the PDF specification, but will still standards and applications. An announcement from AIIM and NPES The Association for Suppliers of Printing, Publishing and Converting Technologies is at: Click here

Digital History: A Guide to Gathering, Preserving, And Presenting the Past on the Web. Daniel J. Cohen, Roy Rosenzweig. University of Pennsylvania Press. 2005.
http://chnm.gmu.edu/digitalhistory/
http://chnm.gmu.edu/digitalhistory/preserving/

This website has a free online version of the book. It looks at the qualities of digital media and networks that potentially allow us to do things better: capacity, accessibility, flexibility, diversity, manipulability, interactivity, and hypertextuality, as well as the hazards of quality, durability, readability, passivity, and inaccessibility. “One vision of the digital future involves the preservation of everything—the dream of the complete historical record. The current reality, however, is closer to the reverse of that—we are rapidly losing the digital present that is being created because no one has worked out a means of preserving it.”

One chapter specifically deals with digital preservation, with the fragility of digital materials, technical considerations, websites, selection of materials, and the future of digital materials. Future preservation should be a part of the planning of any digital project. Readers may now understand that digital preservation may require as much work or more than preserving paper. Any web project requiring a great deal of time to produce also needs a great deal of time to preserve. “It would be a shame to ‘print’ your website on the digital equivalent of the acidic paper.”

The Library of Congress estimates that possibly as much as 10% of their disc collection already contain serious data errors. “No acceptable methods exist today to preserve complex digital objects that contain combinations of text, data, images, audio, and video and that require specific software applications for reuse.”

Archivists who have studied the problem of constant technological change, have realized that “the ultimate solution to digital preservation will come less from specific hardware and software than from methods and procedures related to the continual stewardship of these resources.” The book talks about various technologies and software, such as DSpace and Fedora. “Because digital copies are so cheap, it does not hurt to have copies of digital documents and images in a variety of formats; if you are lucky, one or more will be readable in the distant future.” Backups of files is not preservation.. Preservation also involves dealing with the technological changes. Digitization is not preservation, because currently digital copies cannot be perfect copies of analog materials. But digitization may be the best solution in some cases. “Digital preservation is here to stay.” It is not the total answer, but it is another tool to use. “For now, you are the best preserver of your own materials.” Backup your work and create good documentation.

Microsoft says Office beta coming in November. Ina Fried. CNET News.com. October 3, 2005.

Microsoft to support PDF in Office 12. Martin LaMonica. . CNET News.com. October 3, 2005.

Microsoft has been under pressure to provide open formats for Office. It has announced that the next version of Office (version 12 due in the second half of 2006), will provide support for the PDF format: it will let users convert an Office document to PDF, but PDF files are not readable within Office applications. The Microsoft XML-based document format will be the default setting. Office 12 does not support OpenDocument. Windows Vista will have a format, called Metro, that will offer features similar to PDF. Microsoft has said that they have been getting 120,000 requests a month for PDF support. Office currently supports rtf and html formats.

Seagate exec: Hard disks anything but obsolete. Martyn Williams. Computerworld. October 5, 2005.

“Hard disk drive technology is anything but dead and isn't in danger of being replaced by memory chips anytime soon” said a Seagate executive in response to a Samsung announcement. This may be more of a reflection of the battle for the storage market.

A new way to stop digital decay. The Economist. September 15, 2005

The digital documents of today face a serious threat, the threat of disappearing. Even simple files may not be readable in the future if the software or hardware needed to read it is obsolete. One strategy is to migrate copies to new hardware and software, but that may be difficult, and may also have problems. The National Library of the Netherlands is exploring the possibility of a Universal Virtual Computer that is being developed by IBM. It will have the ability to run programs that can read different file formats. In the future, libraries will have to write software that emulates the virtual computer on each new generation of computer systems. But when that is done, the programs will be able to read the documents using the decoding programs that can be written and tested today, while the format is still readable. Decoding programs have been written for jpeg and gif, and the PDF format will be added.

Descriptive metadata for copyright status. Karen Coyle. First Monday. 3 October 2005.
http://firstmonday.org/issues/issue10_10/coyle/index.html

One of the main characteristics of digital materials is that they can be reproduced easily. This has caused a near crisis in terms of intellectual property rights because of the networked world. Two approaches to resolve the problem have been to 1) change the copyright law, and 2) protect the digital format. This paper tries to define the metadata needed to provide the copyright information that determines the use of the item. The metadata must be able to capture copyright status and to assert what copyright information is unknown. It must also be able to provide contact information for those who need more information. Currently there is little copyright information in a MARC record. Besides typical information, the metadata needs metadata needs elements to show the copyright information taken from the piece only, and additional research undertaken to determine the copyright if it is unknown. Adding copyright information is a burden for those who create the metadata; the lack of information though creates an even larger burden for those who would like to use the material. “Copyright–related metadata, therefore, should be seen as an essential component of the resource description.” This should be kept with the work itself.

Yahoo Works With 2 Academic Libraries and Other Archives on Project to Digitize Collections. Scott Carlson, Jeffrey Young. The Chronicle of Higher Education. October 3, 2005.
http://chronicle.com/free/2005/10/2005100301t.htm

Yahoo will be working a number of partners to digitize millions of volumes. These include the University of California, the University of Toronto, the Internet Archive, Adobe, the European Archive, the National Archives of England, O'Reilly Media, and Hewlett Packard Labs. The project will not include copyrighted books, unless they have permission. The texts will be available to be searched by other search engines as well as Yahoo. The project is modeled on open source software projects. The Internet Archive has been working on a pilot project with the University of Toronto for about a year. So far, about 2,000 books have been scanned.

Digital Preservation Matters

Friday, October 07, 2005

Preservation Readings 7 October 2005

No comments: