Friday, September 30, 2005

Preservation Matters

This blog has been created to post information about preservation of materials, particularly digital materials. Anyone is welcome to post preservation matters to it.

Much of the information that I post here comes from internet resources that I read and summarize. Other comments and resources are welcomed.

The name "Preservation Matters" comes from a newsletter that I publish.

Sept 30, 2005

The Value Proposition in Institutional Repositories. Erv Blythe and Vinod Chachra. EDUCAUSE Review. September 2005.

http://www.educause.edu/apps/er/erm05/erm0559.asp

Institutional repositories must have institutional organization, coordination, and investment. However they will be successful only when the individuals in the community participate. Institutional repositories are a “managed collection of digital objects, institutional in scope, with consistent data and metadata structures for similar objects, enabling resource discovery.” They need to allow reading, upload, exporting, and resource sharing. The repository focuses on developing, enhancing, and protecting the value in the creative output of the members of the sponsoring institution. They need to have a broad scope. There needs to be a critical mass to be successful. They are valuable to an institution by housing the items together, allowing interconnections, archiving and preservation. The value to individuals is through sharing resources for research and teaching. But the value is also a function of low cost, as measured by their time and effort.

Glossary: Image Terminology and Acronyms. TASI. September 28, 2005.

http://www.tasi.ac.uk/glossary/images_glossary.html

TASI has just released an updated glossary of image terminology and acronyms. It also includes technical terms and acronyms related to digital imaging, which provide explanations and additional links if needed. Some examples are:

Archival Image A digital image taken at the highest practicable resolution and stored securely

Digital Preservation The Arts and Humanities Data Service (AHDS) describes digital preservation as "the preservation of digital materials and to the preservation of paper based materials and other artefacts through their digitisation"

Image Archive Collection of images kept in secure storage

Artefacts A term used to denote unwanted blemishes, which may have been introduced to an image by electrical noise during scanning or during compression

Oldies, Music Rights, and the Digital Age. Peter McDonald. EDUCAUSE Review. September 2005.

http://www.educause.edu/apps/er/erm05/erm0557.asp

The recording industry looks at current sales yet looks to national sound archives to preserve the music for the future. “At almost every turn, the industry has stymied the legitimate efforts of recorded sound archives to provide digital preservation of and access to their vast collections of “oldies” (recordings from 1890 to the 1950s).” Archives can let users listen to recordings, and in certain conditions, provide fair use “copies” on an item-by-item basis. But the archives are prohibited from creating digital repositories of commercial audio files. The archives are all about access, whereas the recording industry is all about revenue. The two groups need to find a way to work together.

Toshiba First in World to Develop Notebook PC with HD DVD-ROM Drive. Press Release. 27 September, 2005.

http://www.toshiba.co.jp/about/press/2005_09/pr2702.htm

Toshiba has introduced a laptop with high-definition imaging in the world’s first notebook PC with a slim-type HD DVD-ROM drive. It will be commercially available next year. The height of the drive is less than 13 mm. It has a single optical lens that can read HD DVD discs and read and write to standard DVD and CD. It also comes with a high resolution LCD display.

StoneD: A Bridge between Greenstone and DSpace. Ian H. Witten, et al. D-Lib Magazine. September 2005.

http://www.dlib.org/dlib/september05/witten/09witten.html

This article compares Greenstone with DSpace, the similarities and differences. They present StoneD which is a bridge between the two systems that allows data to migrate between the two or be used in combination. The two systems have different goals and strengths, though the can both build digital collections. Greenstone is primarily for building and distributing collections, mostly on the web. DSpace is for self depositing institutional repositories and preservation of information. StoneD allows for data to be imported and exported between the two systems to take advantage of the strengths of each.

USB Flash Drives - You CAN Take It with You. Imation website. 2005.

http://www.imation.com/didyouknow/technology_info/USB_Flash_Drives_Take_it_with.html

· Flash memory has a write endurance limit. This limit is the number of times the flash memory cell can be written until it can not be restored to its initial condition. The industry refers to this as the erase cycles. The endurance is rated between 10,000 and 100,000 erase cycles for different types flash memories.

Flash SSDs - Inferior Technology or Closet Superstar? Kelly Cash. BiTMICRO Networks. 2005.

http://www.bitmicro.com/press_resources_flash_ssd.php

· … flash memory chips have a limited lifespan. Further, different flash chips have a different number of write cycles before errors start to occur. Flash chips with 300,000 write cycles are common, and currently the best flash chips are rated at 1,000,000 write cycles per block (with 8,000 blocks per chip). Now, just because a flash chip has a given write cycle rating, it doesn't mean that the chip will self-destruct as soon as that threshold is reached. It means that a flash chip with a 1 million Erase/Write endurance threshold limit will have only 0.02 percent of the sample population turn into a bad block when the write threshold is reached for that block.

· With usage patterns of writing gigabytes per day, each flash-based SSD [solid-state disks] should last hundreds of years, depending on capacity.

Friday, September 23, 2005

Sept 23, 2005

The digital Dark Age. The Sydney Morning Herald. September 23, 2005.

http://www.smh.com.au/articles/2005/09/22/1126982184206.html

Article about digital preservation, the possibility of losing digital information, and the digital dark age. The computer is the most dramatic record keeping system since the invention of printing. The concern over the obsolescence of hardware and software systems, which may cause a problem in reading digital information. Emulators may help read the information. The State Records Authority of New South Wales, Australia, has created a strategy for preserving the information, called Future Proof. It includes conservation, conversion, and migration of information, plus retaining the original equipment (which is a major effort with considerable problems). The answer may lie in a combination of solutions, including keeping a hard copy of the information.

More information about Future Proof is available at their website:

http://www.records.nsw.gov.au/publicsector/rk/guidelines/techdependent/TechDependentTOC.htm

The ten strategies are:

1. Take a planned approach

2. Build partnerships

3. Build recordkeeping systems

4. Use recordkeeping metadata

5. Move records through new formats, media & systems

6. Manage the media

7. Use technical standards

8. Practice data management

9. Retain equipment/technology

10. Use viewer/player technology

Copy Your Digital Photos Onto Film. Mark Goldstein. PhotographyBLOG . September 19, 2005.

http://www.photographyblog.com/index.php/weblog/comments/copy_your_digital_photos_onto_film/

Press release about a laboratory that copies digital images to film. Copying to the latest standards is difficult, and current media may not be readable in the future because of hardware or software problems. With this system, “the picture is systematically reproduced in colour and resolution to the analogue image.” The capability of the recorder is 11 million pixels; the customer can send their images to the lab for copying.

[The blog responses at the end are interesting to read as they discuss digital preservation.]

Toshiba Develops 30Gb Dual-Layer HD DVD-R Discs. PhysOrg.com. September 21, 2005.

http://www.physorg.com/news6649.html

Toshiba announced a 30GB dual-layer HD DVD-R (recordable) disc which extends the capacity of optical discs. The disc is based on the same structure as current DVDs, with bonding of two layers, organic dye, and a spin-coating process for spreading the dye on the discs. The manufacturers will start tests next month to verify the disc compatibility. They hope to finalize the specifications by year end.

Too Much ETL Signals Poor Data Management. Ken Karacsony. Computerworld. September 5, 2005.

http://www.computerworld.com/databasetopics/data/story/0,10801,104330,00.html?source=NLT_DM&nid=104330

When a system uses extensive extract, transform and load (ETL) processes, it is a symptom of poorly managed data and a poorly developed data strategy. IT staff are maintaining more databases that recreate or move data between systems. Much of this is redundant. The best way is to create a single, sharable database for each major area and design the database to meet the needs of its users. Since information is a organizational asset, it doesn’t belong to just one group or department. So databases must be designed for both the producers and consumers of the data. The entire organization must be involved in defining the relationships and attributes. “The database, and not the application, is the center of the universe.”

Friday, September 16, 2005

Sept 16, 2005

Technology Watch Report: Preservation Metadata. Brian Lavoie. Digital Preservation Coalition. September 2005.

http://www.dpconline.org/docs/reports/dpctw05-01.pdf

Preservation metadata is the information that supports and documents the long-term preservation of digital materials, especially:

· Provenance: Origin and history, and chain of custody

· Authenticity: The document is what it is supposed to be and has not been altered

· Preservation activity: What was done to preserve the item and what the effects were

· Technical environment: The hardware or software to read the document

· Rights management: Any limitations on preserving or accessing the materials

It makes the archive self documenting. The metadata will accumulate over time. Automated tools are needed for preservation metadata to keep costs from rising to prohibitive levels. We must be able to distinguish preservation metadata from other types. “Preservation metadata is descriptive, structural, and

administrative metadata that supports the long-term preservation of digital materials.” Preservation metadata is important because digital items are technology dependant, they are easily altered, and they are bound by intellectual property rights. There is often a brief window of opportunity in which to act. Digital preservation activities are often to avert damage before it happens, rather than repair it later. It is difficult to anticipate what metadata will be needed over time. Preservation metadata requires we “get it right” the first time.

A preservation metadata schema must be comprehensive, oriented toward implementation, and interoperable. Metadata plays an important role in preserving content long term and using it. The OAIS model is based on the information packet and establishes preservation metadata. PREMIS helps relate the theory and practice of preservation metadata. METS, a metadata standard, is an XML based structure that can store the metadata, either internally in the METS file, or referenced externally. It is cheaper and more efficient to collect metadata on an item when it is most readily available. We need to explore collaborative methods of gathering and sharing metadata. Resources need to be continually tested and refined.

Preserving The Archive - A Race Against Time. WAMU radio interview. August 19, 2005.

http://www.wamu.org/audio/mc/05/08/m1050819-8582.ram [Audio]

Interview with Michael Taft, Head of the Archive of Folk Culture at the Library of Congress and Matthew Barton, an Audio Engineer. They have mountains of material to deal with. As technology progresses, they can do more with some of these recordings. They have about 100,000 audio items in the collection and possibly only transferred about 5%. Digital transfer is about preserving these items. None of these media were meant to last forever, and most of the media used were not for professional use. It is a race against time to preserve the items before they are lost. A CD is just another medium that will deteriorate over time, and when it goes it really goes, not like a wax cylinder that you can still listen to as it degrades. We don’t throw out the original, because there may be new ways of getting the recordings off the media, such as taking a digital image of a broken phonograph album and being able to recreate the music.

Expanding the Stage for Political Theater. Jerome McDonough. Bija Gutoff. Apple. September 2005.

http://www.apple.com/pro/video/mcdonough/index.html

Description of a project to “to preserve and make available on a global basis” these cultural documents. The number of scholars whose work depends on video documentation is increasing. “But videos don’t last very long. Without the digital library, these performances are not only inaccessible for study, but they’re in danger of disintegrating.” “We want to preserve these materials for the long term — and in the library world, long term means 300 to 500 years!”

They try to capture performances as 4:2:2 uncompressed 10-bit files, but the large files must be compressed to put on DVD or the Internet. They hope to create uncompressed masters on hard drives, and move away from DigiBeta. “When you throw out color information during sampling, you’re using lossy compression; then, each time you change formats, you introduce artifacts that can damage the video stream. But if we capture the complete video signal, we’ll be able to migrate to new formats without having to worry about introducing artifacts.” “Having access to these performances is vital to scholars who want to achieve a deeper understanding of the cultural and political life of the Americas.” “By bringing all this material together in one place, making it publicly available and ensuring that it will live on and be available in the future, our library is making a real contribution to scholarship in the world.

Web ARchive Access (WERA). Website. Nordic Web Archive. August 31, 2005.

http://nwa.nb.no/

WERA is a tool for searching and displaying archived collections of web documents. The documents can also include different versions of the same document. An overview shows the dates of the various versions. The archived web documents are stored in ARC files. The tool is freely available for download.

Samsung Predicts End Of Hard Drives. Chris Mellor. Computerworld. September 13, 2005.

http://www.computerworld.com/newsletter/0,4902,104582,00.html?nlid=HW2

Samsung has just created a 16GB flash chip, and expects that computer hard drives will be replaced by solid-state flash memory. Flash memory continues to double its density about every 12 months. Laptop memory cards with 32GB of memory should be available in 2006 or 2007. Here is a link with more information on flash memory and how it works. http://en.wikipedia.org/wiki/Flash_memory

Friday, September 09, 2005

Sept 9, 2005

National Archives Names Lockheed Martin to Build Archives of the Future. NARA Press Release. September 8, 2005.

http://www.archives.gov/press/press-releases/2005/nr05-112.html

NARA has awarded a $308 million, six year contract to Lockheed Martin to build the Electronic Records Archives. The system will “capture and preserve the electronic records of the federal government, regardless of format, ensure hardware and software independence, and provide access to the American public and Federal officials.” This comes after a year-long competition between two firms.

The CPU's next 20 years. Tom Yager. InfoWorld. September 07, 2005.

http://www.infoworld.com/article/05/09/07/37OPcurve_1.html

The Intel Itanium computer processor is not like the current computing processors. It is seen as incompatible with everything else. The road ahead isn’t about hardware at all. It will be about development suites and tools that can optimize an application based on changing environments. We’ll end up with a naturally occurring matrix of CPU types and deployment patterns that provides customers with meaningful choices.

LDS Church to put microfilmed records online. Daily Herald. September 10, 2005.

http://www.harktheherald.com/modules.php?op=modload&name=News&file=article&sid=64017

The LDS Church announced plans to digitize and index the more than 2 million rolls of microfilmed genealogical records which are stored in granite vaults near Salt Lake City. "Currently, you have to look at images on paper or burn them on a CD and distribute those to index the data. We're moving the whole process to the Internet."

DSpace Federation 2nd User Group meeting. Conference held on 6 July 2005. 9 September 2005.

http://www.lib.cam.ac.uk/dspace/usergroup2005/programme.htm

The program from the DSpace user group meeting is now online. The website includes descriptions of the presentations and the PowerPoint slides; some include Word or PDF documents as well. The presentations include:

· The Australian National University: Case Study: Creating Publications from a DSpace Repository

· Using Multiple Metadata Formats in DSpace

· Exploring Strategies for Digital Preservation for DSpace@Cambridge

· Expanding the Focus of the IR: Scholars' Bank at the University of Oregon

· Introduction: Incorporating local developments to DSpace

· Conversion and metadata extraction frameworks

· Use of DSpace as an audiovisual archive

Sony goes 8x for Double Layer DVD+R writing. Kelly Ellis. PC Pro. September 6, 2005.

http://www.pcpro.co.uk/news/77145/sony-goes-8x-for-double-layer-dvdr-writing.html

Sony has introduced a double layer DVD burner that can burn the 8.5 GB discs at 8x speed. It can also burn many other DVD formats and speeds, and also CDs.

Random Musings on Apple’s iPod Nano. Harry McCracken. PC World. September 07, 2005

http://blogs.pcworld.com/techlog/archives/000872.html

“If flash memory was as cheap per gigabyte as hard-disk space, and available in disk-like capacities, the hard drive might go away. That’s not going to happen anytime soon, but I suspect that flash memory will start to replace drive storage in some devices over the next few years, resulting in smaller, more reliable products.”

Friday, September 02, 2005

Sept 2, 2005

An Audit Checklist for the Certification of Trusted Digital Repositories. RLG. August 2005.

http://www.rlg.org/en/pdfs/rlgnara-repositorieschecklist.pdf

This 70 page document is for those who are responsible for digital repository certification and for those who will carry out the process. The requirements touch every part of the repository and the institution. The analysis of the functions and requirements of the repository can help assure the repository is operating according to best practices. The document relies on the “Trusted Repository” and OAIS documents. This draft is for public comment. The document outlines the audit and certification process, the criteria to be used, a checklist, and a glossary of terms. This is a very thorough method of certifying that the repository and the organization exist and are following the standard practices as outline in other documents and international standards.

The audit & certification criteria are organized as follows:

· Organization. Making sure the organization is viable, that it has the appropriate staff and structure; that there is accountability for actions; that it is financially sustainable. The organizational attributes are just as important as the technical. It must follow prevailing standards, policies, and practices. Ongoing training is important. Repository review processes should be annual. The appropriate contracts, agreements, and licenses must be in place to detail the rights, responsibilities and expectations of those involved.

· Repository function. The processes and procedures exist to ingest, manage and provide access to digital materials for the long-term. There are minimal conditions for the preservation of the information packages. Documented and demonstrated strategies must be in place. Metadata must allow the items to be located and managed. The digital objects accepted by a repository for preservation should reflect both its mission statement and the interests of the designated community, and the relationships must be clearly understood. Complete documentation is needed, which may include metadata, codes, sample forms, record layouts, explanations, minimum and maximum values, and related studies and results. The repository must know what will be preserved for each object. It must verify and authenticate each object, and monitor the integrity of the items. Every item must have descriptive information, and the method of getting it needs to be documented. The minimum requirements may be very basic. Access should always deliver what is requested, or else a reason why it is not possible.

· The designated community. The users should be identified, and they must be able to understand the information. The information returned must be useable. The understandability and usability should also be verified.

· Technologies & technical infrastructure. The technical aspects are not prescribed, but good computing practices are required. This practices certification looks at general system infrastructure requirements; the use of technologies and strategies appropriate to the community; and security. Security can refer to the environment, data, systems, personnel, physical plant, security needs, etc. The disaster plan should be tested regularly.

Plasmon Launches Compliant Write Once UDO Media. Computer Technology Review. August 30, 2005.

http://www.wwpi.com/index.php?option=com_content&task=view&id=823&Itemid=39

Plasmon has launched the new UDO (Ultra Density Optical) Compliant Write Once media. It is designed for archive applications that are subject to regulatory compliance and for Information Lifecycle Management environments. The media combines Write Once authenticity as well as the ability to physically destroy records on the media according to data retention and disposition regulations. This hybrid media is in addition to Rewriteable media and also Write Once media.

Hitachi Unveils World's First Terabyte DVD Recorder. Reuters. eWeek. August 24, 2005.

http://www.eweek.com/article2/0,1895,1851795,00.asp

Hitachi has unveiled the world's first hard disk drive/DVD recorder that can store one terabyte of data. The primary target market is for digital broadcasting. The recorders will go on sale next month in Japan.

Indian classical treasure-trove goes digital. Fakir Balaji. Hindustan Times. August 30, 2005.

http://www.hindustantimes.com/news/181_1476380,00040006.htm

A project between Carnegie Mellon University and the Indian government will digitize a million rare manuscripts, palm leaves, copper plates and age-old classical literature. About 130,000 documents have been scanned in31 digital centers across the country. The target is to reach about a million by 2008. The documents are brought to the centers, digitized, and returned to the owners. They intend to offer full text searching. Font recognition is a problem for the optical character recognition. The site is available at http://dli.iiit.ac.in/.

Sun Starts Digital Rights Project. Tom Sanders. Forbes. August 23, 2005.

http://www.forbes.com/personaltech/2005/08/23/sun-copyright-technology-cx_vnu_0823sun.html?partner=my_msn

Sun Microsystems intends to create an open and free digital rights management (DRM) technology, which ensures access to digital content for legitimate users but blocks use that violates copyright licenses. Other companies have created similar systems. The large number of DRM systems and the incompatibilities are causing problems. By creating the open technology, they feel they can set an industry standard. "We fundamentally believe that a federated DRM solution must be built by the community, for the community."