Showing posts with label migration. Show all posts
Showing posts with label migration. Show all posts

Wednesday, November 16, 2016

A Doomsday Scenario: Exporting CONTENTdm Records to XTF

A Doomsday Scenario: Exporting CONTENTdm Records to XTF. Andrew Bullen. D-Lib Magazine. November/December 2016.
     Because of budgetary concerns, the Illinois State Library asked Andrew Bullen to explore how their CONTENTdm collections could be migrated to another platform. (The Illinois Digital Archives repository is based on CONTENTdm). He chose methods that would allow him to quickly migrate the collections using existing tools, particularly PHP, Perl, and XTF which they use as the platform for a digital collection of electronic Illinois state documents. The article shows the perl code written, metadata, record examples, and walks through the process. He started A Cookbook of Methods for Using CONTENTdm APIs. Each collection presented different challenges and required custom programming. He recommends reviewing the metadata elements of each collection and normalizing like elements as much as possible, and plan what elements can be indexed and how faceted browsing could be implemented. The test was to see if the data could be reasonably converted so not all parts were implemented. In a real migration, CONTENTdm's APIs could be used as a data transfer medium.

Tuesday, June 28, 2016

Protecting the Long-Term Viability of Digital Composite Objects through Format Migration

Protecting the Long-Term Viability of Digital Composite Objects through Format Migration. Elizabeth Roke, Dorothy Waugh. iPres 2015 Poster. November, 2015.
     The poster discusses work done at Emory University’s Manuscript, Archives, and Rare Book Library to "review policy on disk image file formats used to capture and store digital content in our Fedora repository". The goal was to to migrate existing disk images to formats more suitable for long-term digital preservation. Trusted Repositories Audit & Certification (TRAC) requires that digital repositories monitor changes in technology in order to respond to changes. Advanced Forensic Format offered a good solution for capturing forensic disk images along with disk image metadata, but Libewf by Joachim Metz, which is a library of tools to access the Expert Witness Compression Format (EWF) has replaced it. They have decided to acquire raw disk images, or when not possible, to use tar files, because the disk images may be less vulnerable to obsolescence.

In attempting to migrate formats, they had to develop methods for migrating the files setup the repository to accept the new files. They also rely on PREMIS metadata.  The migration of disk images from a proprietary or unsupported format to a raw file format has made it easier for us to manage and preserve these objects and mitigates the threat of obsolescence for the near term. There have been some consequences. Some metadata is no longer available. Also, the process will be more complicated and require other workflows, and files will no longer contain embedded metadata. "The migration to a raw file format has made the digital file itself easier to preserve."



Wednesday, April 06, 2016

Validating migration via emulation

Validating migration via emulation. Euan Cochrane. Digital Continuity Blog. Apr 07, 2016.
     "Automated migration of content between files of different formats can often lead to content being lost or altered." Verifying the migration of content is mostly a manual process, and when done for a large number of objects it is not-cost effective. A possible way to do this is to automatically migrate to preferred formats as much as possible and give users the option of working with the object in the “original” software as well as an emulation service. The users could look at both the migrated and emulated versions and verify that the migrated object is valid. By involving multiple users, the migrated object becomes a trusted object.

If this were done together with migration or emulation on demand, then validated digital objects could be separately ingested into a digital preservation system and preserved along with the original version. This could reduce the storage of migrated versions by "only preserving 'validated' migrated versions" and also ensure that trusted content was "available and properly preserved". 

Saturday, February 27, 2016

Back in a Flash

Back in a Flash. Edith Halvarsson. Open Preservation Foundation, ehalvarsson's Blog. 27 Jan 2016.
     Flashback is a proof of concept project run by the British Library’s Digital Preservation Team that examines emulation and migration solutions as methods for preserving the content on CD, DVD , 3.5” and 5.25” disks.  The team acquired original hardware for their legacy lab to analyze and deal with content from those formats. They have found that the old hardware can have problems. The first step is a capture process which extracts data from the storage media and characterizes its physical components and lists the files on the media. The content can be placed in a controlled environment that ensures that the bits are retained regardless of deteriorating storage media. The technical information about the content is important for preservation planning.

For less complex content such as text the solution is to migrate the file from for old or obsolete formats to  more contemporary and reliable formats. The large majority of the content though is so "tightly bound up with its original environment that it cannot be migrated", which is the case for software. For these, the option is to emulate the item’s original hardware and software environment which were supplied by the University of Freiburg via BwFLA – Emulation As A Service. Flashback is gathering data about the performance and viability of emulating groups and comparing characteristics of the software on original hardware and emulators. 

Friday, November 13, 2015

Alternatives for Long-Term Storage Of Digital Information

Alternatives for Long-Term Storage Of  Digital Information. Chris Erickson, Barry Lunt. iPres 2015. November 2015.   Poster  Abstract
     This is the poster and abstract that Dr. Lunt and I created and was presented at iPres 2015. The most fundamental component of digital preservation is storing the digital objects in archival repositories. Preservation Repositories must archive digital objects and associated metadata on an affordable and reliable type of digital storage. There are many storage options available; each institution should evaluate the available storage options in order to determine which options are best for their particular needs. This poster examines three criteria in order to help preservationists determine the best storage option for their institution:
  1. Cost
  2. Longevity
  3. Migration Time frame
Each institution may have different storage policies and environments. Not every situation will be the same. By considering the criteria above (the storage costs, the average lifespan of the media and the migration time frame), institutions can make a more informed choice about their archival digital storage environment. The poster has more recent cost information than what is in the abstract.

Thursday, October 22, 2015

Preparing for format migration

Preparing for format migration. Chris Erickson. Presentation to the Utah State Archives fall conference. October 22, 2015. [PDF presentation]
     The presentation begins with terms and definitions of digital preservation, obsolescence, fixity, migration, refreshing, and formats. Formats include hardware, software, media, and systems. The purpose of migration is:
  1. Avoid media failure
  2. Avoid obsolescence
  3. Benefit from new technologies
The goal of migration is to change the object to deal with software and hardware developments but not affect the original representation. There are some cautions (cited):
  • “Data migration success rates are never 100%”
  • Successive storage/migration cycles accumulate failures, data corruption and loss.
  • Even if data migration is flawless, repeated migrations will take its toll on the data “the nearly universal experience has been that migration is labor-intensive, time-consuming, expensive, error-prone, and fraught with the danger of losing or corrupting information.”
The presentation provides an overview of creating a migration plan, advance preparations and follow up actions. Some of the issues are from my personal data migrations, as well as corporate examples. In the end, it is important to clearly understand what you have and what you need to do, then to start, even if it is a small step.

Tuesday, September 22, 2015

Taking Control: Identifying Motivations for Migrating Library Digital Asset Management Systems

Taking Control: Identifying Motivations for Migrating Library Digital Asset Management Systems. Ayla Stein, Santi Thompson. D-Lib Magazine. September/October 2015.
     "Digital asset management systems (DAMS) have become important tools for collecting, preserving, and disseminating digitized and born digital content to library patrons." This article looks at why institutions are migrating to other systems and in what direction. Often migrations happen as libraries refine their needs. The literature on the migration process and the implications is limited; this article provide several case studies of repository migration.A presentation by Lisa Gregory "demonstrated the important role digital preservation plays in deciding to migrate from one DAMS to another and reiterated the need for preservation issues and standards to be incorporated into the tools and best practices used by librarians when implementing a DAMS migration".  Repository migration gives institutions the opportunity to move from one type of repository, such as home grown or proprietary, to another type.  Some of the reasons that institutions migrated to other repositories (by those ranked number 1) are:
  • Implementation & Day-to-Day Costs
  • Preservation
  • Extensibility
  • Content Management
  • Metadata Standards
Formats they wanted in the new system included:

Response Num. %
PDF 28 98
JPEG 26 90
MP3 22 76
JPEG2000 21 72
TIFF 21 72
MP4 19 66
MOV 17 59
CSV 16 55
DOC 13 45
DOCX 12 41

For metadata, they wanted the new system to support multiple metadata schema; administrative, preservation, structural, and/or technical metadata standards; local and user created metadata, and linked data. In addition, METS and PREMIS were highly desirable.

The new system should support, among others:
  • RDF/XML
  • Ability to create modules/plugins/widgets/APIs, etc.  
  • Support DOIs and ORCIDs
Preservation features and functionality were the ability to:
  • generate checksum values for ingested digital assets.
  • perform fixity verification for ingested digital assets.
  • assign unique identifiers for each AIP
  • support PREMIS or local preservation metadata schema.
  • produce AIPs.
  • integrate with other digital preservation tools.
  • synchronize content with other storage systems (including off site locations).
  • support multiple copies of the repository — including dark and light (open and closed) instances.
The survey suggests that "many information professionals are focused on creating a mechanism to ensure the integrity of digital objects." Other curatorial actions were viewed as important, but some "inconclusive results lend further support claims of a disconnect between digital preservation theory and daily practices". About two-thirds were moving to open source repositories, while one fifth were moving to proprietary.


Tuesday, August 11, 2015

Digital Heritage: Semantic Challenges of Long-term Preservation

Digital Heritage: Semantic Challenges of Long-term Preservation. Christoph Schlieder. Semantic Web – Interoperability, Usability, Applicability. 05/06/2010.
     This is an excellent article on long term preservation. It is argued that a period of 100 years constitutes an appropriate temporal frame of reference for addressing the problem of semantic aging. Ongoing format migration constitutes currently the best option for temporal scaling at the semantic level.

Digital preservation focuses on finding solutions that scale well along the temporal dimension. In the pre-digital world, the preservation of written records over long periods of time depended on several factors:
  1. The record needs to be preserved physically
  2. The ability to read the record and language need to exist
  3. There must be a community that still shows interest in the record
There are a number of reasons why digital preservation may fail.
  •  Media aging: Any medium that carries a digital encoding will physically deteriorate until it is no longer possible to recover the original bit stream.
  • Semantic aging: The evolution of data formats and the fact that knowledge about data semantics quickly disappears if not specified explicitly.
  • Cultural aging: The community loses interest in some content; the documents are no longer retrieved, and is not maintained and transmitted any more, its loss is almost unavoidable
It is important to identify the time scale with preserving digital items. In the short term, 10 years, there are any number of solutions. So that the short term problem "may be considered solved". The most  ambitious time frame for digital preservation is the one promoted by the Long Now Foundation, which is a formidable period of 10,000 years. This article looks at a more modest period, 100 years. The agenda for this period is:
  1. find strategies to access digital contents from the past 50 years in spite of aging factors
  2. plan the preservation of currently accessible digital content for future use the next 50 years
This problem cannot be solved just by "simply agreeing on a standard format for digital archiving". There are two digital preservation initiatives, migration and emulation. Migration seems to work best for a document-centered workflow, and emulation "constitutes the best solution for archives of highly interactive media". Digital preservation initiatives conceived of preservation as an ongoing process around a digital curation lifecyle model and incorporate migration strategies to reduce risks. This is the basic link between digital preservation and the semantic web. How well this works depends on how solutions scale over the 100 years. "Only by looking at periods that are significantly longer than the 10 years" can we tell how the solutions will work. The aging processes have always been at work. "The best way to overcome the effects of semantic ageing is by migrating digital records into new formats." Long-term preservation forces the research community to adopt a much longer temporal frame of reference."

"Taking cultural ageing seriously means to abandon the idea that digital preservation operates like a time capsule. The picture of content that is enclosed in a digital capsule to be opened at some moment in the future is misleading because it is not the past that sends messages to the future. Rather, it is the present that makes choices, selecting content from the past and linking to it. This ongoing process of linking from the present into the past makes up digital heritage." 

Related posts:

Monday, August 10, 2015

One downside to digital innovation: as formats die, we lose our past

One downside to digital innovation: as formats die, we lose our past. Jess Zimmerman. The Guardian. 5 August 2015.
     Flash, a proprietary animation software made by Adobe, used to be a leading platform for multimedia but it has fallen out of favor due to security and compatibility issues.  The death of Flash may destroy Flash content by making it not only obsolete but irretrievable. When formats become obsolete often something’s lost in the changeover. "That’s not an inevitable factor of age – books, an unusually obsolescence-resistant format, have remained accessible for hundreds of years. But for many other technologies, continued survival means shedding the past." This has happened to audio and video before, such as with VHS tapes.  "Access to obsolete video formats will always be constrained by the fact that they require an older, tricky-to-source piece of hardware."

"But as early users move into middle age and beyond, we can’t expect our youth – digital or otherwise – to be accessible forever. We’re aging, but the internet’s aging too." We may need to migrate the digital content to other locations to be more shareable and compatible. "But changes in look and outlook don’t erase the past; it remains as a monument to an obsolete age. Changes in technology sometimes do."

Related posts:

Friday, July 31, 2015

Rosetta Customer Testimonial - Jennifer L. Thoegersen, University of Nebraska–Lincoln

Rosetta Customer Testimonial - Jennifer L. Thoegersen, University of Nebraska–Lincoln. Jennifer Thoegersen. University of Nebraska–Lincoln / Ex Libris. July 5, 2015. [YouTube video.]
      Jennifer Thoegersen, Data Curation Librarian at the University of Nebraska–Lincoln, talks about her experience with using Rosetta for managing and preserving different types of digital content, and its impact at UNL. The challenges that they were facing included having digital materials throughout the library and the campus that they were backing up but they wanted to do more to actively preserve and manage the materials far into the future. Libraries have been tasked to be the gatekeepers for the information. They have lots of different types of content, such as research data, audiovisual content, born digital content, websites, digitized images. They have moved content from ContentDM into Rosetta.

One of the things she really likes about being a Rosetta user is that the Rosetta User Community is very helpful. The group provides insights to working with different types of situations and challenges and they share code as well. The major benefit for UNL is the ability to validate their content, monitor our digital assets over an extended period of time, and being able to tailor the system to meet their needs. Rosetta is an open, extendable, and customizable digital preservation system. The implementation team worked well, and they have also been able too work with the system developers to suggest improvements and have those changes added to the system.

Related posts:

Friday, May 15, 2015

What Do We Mean by ‘Preserving Digital Information’? Towards Sound Conceptual Foundations for Digital Stewardship

What Do We Mean by ‘Preserving Digital Information’? Towards Sound Conceptual Foundations for Digital Stewardship. Simone Sacchi. Dissertation University of Illinois at Urbana-Champaign.2015. [PDF]
Preserving digital information is a fundamental concept in digital and data stewardship. This dissertation explains what successfully ‘preserving information’ really is, and provides a framework for understanding when and why failures might happen and how to avoid them. The lack of a formal analysis of digital preservation is problematic. Some notes and quotes from the dissertation:
  • At a high level of generality, bit preservation means enabling the possibility for the same (set of ) bit sequence(s) to be discriminated at different points in time, and, potentially, across changes in the underlying storage technology." 
  • Bit level preservation is a mean, not the goal, in digital stewardship. 
  • As suggested by the OAIS definition of digital preservation, successful digital preservation is about “maintaining” or “preserving” information.
  • Preserving information appears to be a metaphorical expression where a complex set of requirements needs to be satisfied in order for an agent to be presented with intended information
  • The best contemporary theories of digital preservation do not focus on the preservation of any sort of object, but rather on preserving access.
  • it is impossible to preserve a digital document as a physical object. One can only
    preserve the ability to reproduce the document.
  • "You cannot prove that you have preserved the object until you have re–created it in some form that is appropriate for human use or for computer system applications.”
  • “digital records are not stable artefacts”; they last only when certain circumstances are met
  • Bit preservation is only the first required step for successful digital stewardship. Interpreting the bits such that an intended digital material obtains through appropriate performances is essential as well.
  • Successful digital preservation of information can be conceived as sustained and reliable communication mediated by digital technology and agents involved in the communication process.

Saturday, February 28, 2015

OxGarage Conversion

OxGarage Conversion. Website. February 27, 2015.
An interesting web tool from the University of Oxford for converting documents to different formats.  OxGarage is a web, and RESTful, service to transform documents between a variety of formats, which uses the Text Encoding Initiative format as a pivot format.The initial option is to select:
  • Documents
  • Presentations
  • Spreadsheets
There are dozens of source and target formats listed, such as Word, WordPerfect, RSS, PDF, ppt, csv, xls, and so forth. There is an option when you upload an XML document with links to images on your computer to also add the images.  If you have a document with links to images on the internet, these will be downloaded and included with your document.
 

Monday, December 08, 2014

Agreement Elements for Outsourcing Transfer of Born Digital Content.

Agreement Elements for Outsourcing Transfer of Born Digital Content. Ricky Erway, Ben Goldman and Matthew McKinley. Dublin, Ohio: OCLC Research. 2014. [PDF]
The article Swatting the Long Tail of Digital Media: A Call for Collaboration (2012) held that few institutions would be able to have the hardware, software, and expertise to be able to read all digital media types. A group of archival practitioners started a pilot project to test outsourcing of the transfer of content from physical media they couldn’t read in-house. They realized the need for agreements between repositories and service providers to spell out the terms of such collaboration. The group began compiling a list of elements that should be considered when creating these agreements.

This article suggests elements to consider when creating an agreement for outsourcing the transfer of born-digital content from a physical medium, while encouraging adherence to both archival principles and technical requirements. The main areas are:
  1. General Provisions: desired outcome, description of work, responsibilities and liabilities
  2. Information Supplied by Service Provider: handling instructions
  3. Information Supplied by Client: content, inventory,
  4. Statement of Work: processing, exceptions, documentation, delivery, acceptance
  5. Cost and Liability: schedule of costs and charges, responsibilities of each party
The parties should agree upon a clear set of requirements regarding the services that the Service Provider is to provide. 




Friday, October 17, 2014

Safeguard the Future of Your Data: Digital Preservation Technology for the U.S. Federal Market.

Safeguard the Future of Your Data: Digital Preservation Technology for the U.S. Federal Market. Hitachi brochure. 2014.
Hitachi’s Digital Preservation Platform (HDPP) is a non-magnetic storage solution that has the ability to preserve unlimited amounts of data for decades on end with minimal migration. The projected capacity of the storage solution is 1 PB per rack by the end of 2014. Offline media is also supported.

Cost-efficiency is another factor when considering long-term preservation. Traditional archives use a migration strategy that requires regular media refreshing which has proven to be costly over time. Migration is an ongoing process that takes a significant amount of resources.

Blu-ray optical media and M-DISC media ensure longevity and compatibility across generations of technology so the data can still be accessible as formats continue to evolve. Blu-ray discs are projected to 1 TB per disc. Mdisc capability is currently at 25 GB per disc, with plans for 300 GB per disc. Brochure also includes quick specs and diagrams.
 

Sunday, July 14, 2013

Edward R. Murrow's audio essays with the famous -- and not-so-famous -- have been digitized and put online

Edward R. Murrow's audio essays with the famous -- and not-so-famous -- have been digitized and put online. Computerworld. Lucas Mearian. July 12, 2013.
Over 800 oral essays from Edward R. Murrow's 1950s radio series, This I Believe, have been placed online for public use by Tufts University. The audio collection comes from almost 800 reel-to-reel tape recordings "that were nearly lost forever due to natural wear and tear from more than 50 years in less than ideal storage." The engineers captured the analogue recordings using a 96K, 24-bit high resolution WAV format.


Thursday, September 20, 2012

Swatting the Long Tail of Digital Media:A Call for Collaboration.

Swatting the Long Tail of Digital Media:A Call for Collaboration. Ricky Erway. OCLC Research.
September 2012. 
 Archiving born digital content stored on a wide range of physical media types requires specialized
knowledge, expertise, and equipment to read and preserve the content on physical media, ranging from punched cards to flash drives. In general, transferring content from a particular physical medium requires a compatible computer that can read the data in the format that is stored on the medium, but also other hardware and software components, such as cables and drivers. A community-based approach could establish software and workstations for antiquated technology (SWAT ) sites where a few institutions acquire and maintain the technology and expertise to read data and transfer content from particular types of obsolete media. 

Tuesday, May 01, 2012

Preserving Moving Pictures and Sound.

Preserving Moving Pictures and Sound. Richard Wright. DPC Technology Watch Report 12-01. March 2012. [PDF]
This excellent report is for anyone with responsibility for collections of sound or moving image content and an interest in preservation of that content. For audiovisual materials, digitization is critical to the survival of the content because of the obsolescence of playback equipment and decay and damage of physical items, whether analogue or digital.

The basic technology issue for audio/visual content is to digitize all items on the shelves, either for preservation or access. The risk of loss is high. Another issue is moving content from the current media to digital files. A third issue is preserving the digital files. This report describes the techniques for preservation planning, digitization and digital preservation of audiovisual content, and describes the technologies.  Preservation of these materials is difficult because they are physically, culturally, and economically different.
Explanation of signals and carriers. "Digital technology produces recordings that are independent of carriers. Carrier independence is liberation". Digital preservation of the digitized signal means to preserve the numbers, but also the technology to decode the numbers. ‘Maximum integrity’ means keeping the full quality of the audio and video. As far as possible, the new preservation copy should be an exact replica of the original: the content should not be modified in any way’. This may be difficult to achieve.

The two basic kinds of preservation action are: 1) changing the audiovisual content within a collection, or normalization; 2) changing the system that holds the collection.
There are four main factors in an analogue or digital conservation program:
  1. packaging (wrappers), handling and storing;
  2.  environmental conditions;
  3.  protecting the masters; and
  4.  condition monitoring, maintaining quality.

The four PrestoPRIME requirements for effective access to time-based media are:
  1. granularity: division of the content into meaningful units;
  2. navigation: the ability to select and use just one unit,
  3. citation: the ability to cite a point on the time dimension of an audio or video file, with a permanent link
  4. annotation: the ability of a user of content to make time-based contributions
Other topics include access rights, implications for small collections or institutions. The digitization standards, encoding, wrapper and metadata are all agreed and well documented.  There is no reason for the basic encoding to ever be changed, though wrappers may eventually become obsolete. All archives need to be aware of the risk of loss of embedded metadata. Finally, surveys have shown that in universities there is a major problem of material that is scattered, unidentified, undocumented and not under any form of preservation plan.

Thursday, December 08, 2011

A literature review: What exactly should we preserve? How scholars address this question and where is the gap

A literature review: What exactly should we preserve? How scholars address this question and where is the gap.  Jyue Tyan Low. Cornell University Library. 7 Dec 2011.
There are generally two approaches to long-term preservation of digital materials
  1. preserving the object in its original form as much as possible along with the accompanying systems,
  2. migration or transformation: transforming the object to make it compatible with more current systems but retaining the original “look and feel.
Migration is the most widely used method, but there can be changes to the original.  If some of the original properties are lost, what then are the essential properties to maintaining its integrity?  Currently there are no formal and objective way to help stakeholders decide what the significant properties of the objects are, which are defined as:
The characteristics of digital objects that must be preserved over time in
order to ensure the continued accessibility, usability, and meaning of the
objects, and their capacity to be accepted as evidence of what they purport
to record.
An important goal of digital preservation is more than just retrieving the objects, it is to ensure the authenticity of the information.  A digital object can change as long as the final output is what it is expected to be.  The properties to preserve come from the purpose of the object, and at least one purpose for the object needs to be defined. Archivists have created standards that look at records in the context of their creation, intended use and preservation.  It is important to ask what features of the object is important when delivering to the user.  There may be many uses to many communities that were not intended by the object creator, so we should not let the ideal limit the reasonable.

Friday, April 30, 2010

Digital Preservation Matters - April 30, 2010

Digital Preservation: An Unsolved Problem. Jonathan Shaw. Harvard Magazine. April 27, 2010.

With the advantages of digital, why do libraries not embrace the digital future now? One of the main obstacles is the issue of preservation. For books: "the greatest risks to printed material are the environment, wear and tear, security, and custodial neglect." For digital: using data is one of the best ways to preserve it because you know it is usable; digital data must be read and checked constantly to ensure integrity. Another concern about digital is that current formats may not be readable in the future (reference to June 2009 New Yorker cover). Born digital materials are not as easy to save since they have many different formats. This is difficult for librarians keeping records of the university's intellectual life, because of both the legal and digital challenges. "We are in a period of unprecedented lack of documentation of academic output."

---

Gutenberg 2.0. Harvard's libraries deal with disruptive change. Jonathan Shaw. Harvard Magazine. April 27, 2010.

In the scientific disciplines, information, from online journals to databases, must be recent to be relevant. Books in libraries to some seem more like a museum. Some think that massive digital projects will make research libraries irrelevant. The future of libraries is clearly digital. "Yet if the format of the future is digital, the content remains data. And at its simplest, scholarship in any discipline is about gaining access to information and knowledge." Access to the information will mean different things and be done in different ways. In the meantime, "Who has the most scientific knowledge of large-scale organization, collection, and access to information? Librarians."

How do we deal with large scale collections and the access to the information? "We ought to be leveraging that expertise to deal with this new digital environment. That's a vision of librarians as specialists in organizing and accessing and preserving information in multiple media forms, rather than as curators of collections of books, maps, or posters." The role of libraries isn't going away, but it is changing.

The idea that libraries will be stewards of vast data collections raises very serious concerns about the long-term preservation of digital materials. The worry is that the longevity of the resources has not been tested. There are 3 copies of the 109 TB Harvard repository. It is in a constant process of checking and refreshing to make sure everything is readable.

---

The Floppy is Dead: Time to Move Memories to the Cloud. Lance Ulanoff. PC Magazine. Apr 26, 2010.

The decision by Sony to stop producing 3.5-inch disks marks an end to that format. The end of any popular format can have a ripple effect on the technology world. If the data is not migrated to later formats it could "trapped on its obsolete format". All media will become obsolete sometime, it is the natural progression of technology. Since change is inevitable the article suggests everyone consider cloud-based backup storage options. It suggests that this is better than storing data on eventually-to-be-obsolete media.

---

Google is not the last word in information. Lia Timson. Sydney Morning Herald. April 29, 2010.

Interesting article concerning primary and secondary sources, what is on the internet and how it gets there, special collections, etc.

  • "Better still is the lesson and the realisation that information and history don't just appear on Google. Someone has to publish it onto the web, put it there in the first place."
  • "As educators we must ask that assignment bibliographies include more than just "three websites". We must insist on a variety of media as sources, including interviews with real people, be they witnesses, historians or surviving relatives, and even insist on trips to the local library."
  • … researching is much wider and deeper than searching online.

---

A Gentle Reminder to Special-Collections Curators. Todd Gilman. The Chronicle of Higher Education. April 29, 2010.

Article and a librarian's experience trying to use special collections. The "job is not to keep readers from your books but just the opposite: to facilitate readers' use of the collections."

---

Tuesday, February 09, 2010

Digital Preservation Matters - February 8, 2010


Online Recordkeeping: It's All in a Name. Mimi Dionne. Internet Evolution. February 2, 2010.


The born-digital record lifecycle has five stages, in chronological order: creation; distribution and use; storage and maintenance; retention; and disposition or archival preservation. All five stages are important. One of the best practices for born-digital records is uniform file naming protocols, including location, to encourage strong content management. These should align with the records retention policies. Organizations are better off if they select the information they need to retain and destroy what they don’t need. “The benefits of implementing a records program that includes regular records destruction have far-reaching influence not only on compliance issues and maintenance of a company’s IT environment but also the health of its budget.”


---


SPIE to Preserve E-Books in Portico. Press Release. Portico. 2 February 2010.

Portico has agreed with SPIE (the international society for optics and photonics) to preserve its collection of e-books, currently 93 items. It already participates with Portico to preserve its e-journals. Portico now holds over 34,000 e-books and over 10,000 e-journals. The SPIE has also announced the launch of their digital library, which includes 120 SPIE Press titles from the Field Guides, Monographs, and Tutorial Texts series.


---


Long-Term Preservation Of Web Archives – Experimenting With Emulation And Migration Methodologies. Andrew Stawowczyk Long. IIPC. December 2009. [54 p. PDF]

The decision to emulate or migration are largely based on personal beliefs, rather than on any particular evidence. We do not know which of these is more useful in the long term. All objects change over time, so ensuring long-term, useful access to collections requires we first define the most important aspects of an object that needs to be preserved. The “Preservation Intent” may be useful for this, which is what the institution intends to preserve for any given digital object and for how long. Also needed is the creator’s intent, the contextual information and the technical information.

Two possible approaches for institutions may be:

  1. preserve digital objects over the next twenty years;
  2. find means of preserving objects for longer.

Or an approach may include both: preserve items for 20 years while the search for longer preservation mechanisms continues. “Significant properties” means the properties of a digital object that are essential to the representation of the intended meaning of that object.

The author does not recommend either emulation or migration as a perfect solution to the problem at this current time. Also, their findings and recommendations include:

  1. There are no tools suitable for long-term preservation of very large web archives
  2. All preservation actions need to be based on a clearly defined “Preservation Intent”
  3. Migration and emulation offer some time extensions to for short term access to digital objects.
  4. Emulation seems to present higher risks as a long-term preservation methodology.

It is not possible to preserve it all. Priorities need to be established for practical, long-term preservation solutions. The best hope for adequate long-term preservation, lies in continuous and systematic work, researching various preservation methodologies, and improving our understanding of the future use of web archives.

---

Is NAND flash about to hit a dead end? Lucas Mearian. Computerworld. February 4, 2010.

IM Flash Technologies has said that shrinking the technology much further may not be possible because of problems with bit errors and reliability. The number of electrons that can be stored in the memory cell decreases with each generation of flash memory, making it more difficult for the cells to reliably retain data.

---

CNRI Digital Object Repository™. Corporation for National Research Initiatives. 19 January 2010.

(CNRI) has developed a new version of its Digital Object Repository Software. It is open source, flexible, scalable, secure, and has a suite that provides a common interface for accessing all types of digital objects. Redundancy is supported by a mirroring system with software to ensure that replicated objects are kept in sync.