Digital Preservation Matters: Weekly readings

Friday, February 17, 2006

Weekly readings - 17 February 2006

The Archive Ingest and Handling Test: The Johns Hopkins University Report. Tim DiLauro, et al. D-Lib Magazine. December 2005.

Johns Hopkins University (JHU) performed the Archive Ingest and Handling Test with two repository applications, DSpace and Fedora. Their model consisted of two classes of objects (data-group and data-item), which consisted of identifier, name, metadata, and group and item ids. They used METS as a wrapper for the metadata for a digital object. They generated the SIP packet and then ingested it into the repositories. The bulk ingestion was extremely slow; DSpace and Fedora had constraints on the process. Errors, memory limits, and server issues caused crashes. The size of the collection was also a factor. In the second phase, JHU exported the data to Stanford and imported the data from Harvard. Each of the participants chose a different approach for their dissemination packet, but there would have been an advantage to having common elements in the process. In the format transformation phase, their goal was to create a flexible mechanism to allow systematic migration of specified content. They chose to migrate JPEGs to TIFFs; they added metadata about this conversion to the TIFF itself, in addition to the item metadata. The problems with DSpace and Fedora were reported.

Lessons learned included:
· The log files produced during ingest, export, and migration are important and should be structured so they can be used later.
· After a bulk operation there should be an easy way to rollback the changes.
· Memory consumption was a big issue.
· When processing the objects, it was necessary to access objects one at a time and write any intermediate results to disk.
· The export was complicated by storing the METS in each DataObject; instead it should have been assembled from the content and then reassembled during export.
· The configuration information for the repositories should have been stored in one config file and referred to symbolically.
· Be prepared for tools not working perfectly.
· The metadata provided was inconsistent.
· Format registries will form the basis for future automated processes for ingestion and management / migration of content already in repositories.

---

Archive Ingest and Handling Test: The Old Dominion University Approach. Michael L. Nelson, et al. D-Lib Magazine. December 2005.

Old Dominion University (ODU) was the only non-library to participate in the test. The focus was on
· self-archiving objects
· archive models & granularity
· archive export & import

They used the MPEG-21 Digital Item Declaration Language (DIDL) complex object format They do not have an institutional repository so the process is more of a pre-ingest phase. The metadata descriptors follow the general Dublin Core structure. The file processing workflow is represented in the article. The imported items are given a new identifier based on the MD5 of the file name. There is an additional MD5 checksum generated on the contents of the object for verification needs. Each object was processed with JHOVE to provide technical metadata, and also checked in the Format Registry Demonstrator. Format conversion is handled as a batch process for archived items and the work flow is outlined. Through this process, they regarded the repository as “a preservation threat the data must survive.”

---

Plextor Ships Industry's First 18x DVD±R Burner. Press Release. Computer Technology Review. February 13, 2006.

Plextor has announced a DVD±R/RW CD-R/RW drive aimed at users who require reliability, high performance, and premium recording features. This drive has a recording speed of 18X DVD±R on certified 16X DVD±R media. It also supports lower speeds.

---

Plumbing and storing e-archives: an industry blooms. Brian Bergstein. January 31, 2006.

With the increase of digital communications, memos, presentations and other bits of data are increasingly finding their way into "electronic discovery" centers. "The big risk for companies is too much data that there's really no business need for, being kept in ways that if they had to go looking for it, would be uneconomic." "In litigation today, if e-discovery is done wrong, it can have huge implications.” Other times evidence comes not from what's in a file, but from its "metadata" -- the automatically applied labels that explain such things as when a file was made, reviewed, changed or transferred.

Friday, February 17, 2006

Weekly readings - 17 February 2006

No comments: