Friday, July 31, 2009

Update on Progress - Fedora Preservation and Archiving

From: fedora_preservation-bounces@email.rutgers.edu On Behalf Of Ron Jantz
Sent: Friday, July 31, 2009 9:06 AM
To: fedora_preservation@email.rutgers.edu
Subject: Update on Progress - Fedora Preservation and Archiving

To All,

This email provides an update on progress of the Fedora Preservation and Archiving Solution Community (Wiki url attached below). From our Birds of a Feather (BOF) discussion at the OR 2009 conference, there was considerable interest in preservation policies. We have posted on the Wiki several policies from libraries and archives that you may find useful. There is also a "model" document available that you can use to develop a preservation policy framework.

There has been considerable interest in sites that are actually doing digital preservation. From the survey that we conducted in the Spring, many of you submitted urls - these have been posted under the category "Preservation and Archiving Sites" on the Wiki.

From the BOF session, there was also considerable interest in case studies. We are working to trial a few case studies. However, we would also like to get feedback from the list on what topics, or more specifically, what case studies might be of interest.

Finally, we are exploring the possibility of doing another BOF discussion at the iPres 2009 conference in San Francisco (October 5 & 6). Your comments and feedback will help us focus the BOF on your specific needs.

Thanks for your interest and comments. If you know others who want to join the list, the url is attached below.

Preservation and Archiving Solution Community - Core Team

Chris Erickson, Nancy McGovern, and Ron Jantz

-----------------------------------------------------------------------------------------------------

Wiki:

http://www.fedora-commons.org/confluence/display/FCCWG/Preservation+and+Archiving

Join the list:

https://email.rutgers.edu/mailman/listinfo/fedora_preservation

_______________________________________________

Fedora_preservation mailing list

Fedora_preservation@email.rutgers.edu

https://email.rutgers.edu/mailman/listinfo/fedora_preservation

Friday, April 24, 2009

Digital Preservation Matters - 24 April 2009

Archiving Writers' Work in the Age of E-Mail. Steve Kolowich. The Chronicle of Higher Education. April 10, 2009.

Archiving materials from authors has difficulties in the digital age. Authors have kept their information on hard drives, floppies, other kinds of disks, and a variety of formats. Three things are clear:

  1. the digital age will transform the way libraries preserve and exhibit literary collections
  2. universities must spend money on new equipment and training for their archivists.
  3. scholars will be able to learn more about writers than they ever have before.

Archivists must know how to transfer data to new machines, since old machines will not survive for long. They must continue doing what they have been doing, but now do more. The files may give more information about authors and their influences. This is just the beginning because the authors may also have online accounts, such as Facebook, MySpace, Flickr, email, and other. It is not always clear who owns what data. “The speed at which universities adopt digital curation may depend on their willingness to divert funds from more traditional areas.”



DVD Copying Case: Why You Should Care. Christopher Bree. Macworld . April 24, 2009.

Details of a court case about copying DVDs and copy-protection. It has implications for fair use, archival copies, and the technology to create the copies.



The UN's World Digital Library. Frances Romero. Time. Apr. 22, 2009

On April 21, UNESCO and the Library of Congress unveiled the world digital library which will allow institutions to share cultural and educational data. It can browse objects by Place, time, topic, type, and institution.

Friday, April 17, 2009

Digital Preservation Matters - 17 April 2009

Working Together or Apart: Promoting the Next Generation of Digital Scholarship. Report of a Workshop Cosponsored by the Council on Library and Information Resources and The National Endowment for the Humanities. March 2009. [88p. pdf]


Asking Questions and Building a Research Agenda for Digital Scholarship. Amy Friedlander

Searching across large collections is important but there are other opportunities for analysis and presentation, such visualization so users can identify patterns and differences as well as display results. The next generation will be graphically oriented so visual means will be important for the analysis and not just the presentation. The Web is a graphical medium and can increase the possibility of confusion and misinformation. It also has a different notion of literacy.

The challenges of managing digital collections over time are substantial, but the goals are clear:

  • Allow digital collections to be explored, expanded, and repurposed
  • Users must trust the repositories to safeguard their contents and view on request
  • Managing digital collections is a fundamental condition for any research agenda.


Tools for Thinking: ePhilology and Cyberinfrastructure. Gregory Crane, et al.

“Our ultimate goal must be to make the full record of humanity accessible to every human being.” The universal library is an unattainable point of reference but something to work towards. We need to build an infrastructure with at least three kinds of access:

  • Access to digital representations of the human record. This may have more information that the physical object.
  • Access to labeled information about the human record.
  • Access to automatically generated knowledge:

The Changing Landscape of American Studies in a Global Era. Caroline Levander.

Digital archives can offer new opportunities for rethinking and bringing materials together. A digital archive can reach an researchers who may not otherwise have access to the materials. They can bring together materials that exist in different geographic locations and increase the collaboration among an international audience.


A Whirlwind Tour of Automated Language Processing for the Humanities and Social Sciences. Douglas W. Oard

We never seem to get to the ends that we are trying to achieve. This may be because “those who could build these marvels don’t really understand what marvels we need, and we, who understand what we need all too well, don’t really understand what can be built.”

To get the future right:

  • Build useful tools, but don’t try to automate the intellectual work of scholars.
  • Dream big. Progress comes from the vision of what is needed with the understanding of what is possible
  • Waste money wisely.
  • Don’t reinvent the wheel.
  • Make friends. Others have been working on these projects.



Information Visualization: Challenge for the Humanities. Maureen Stone.

“Digital archiving creates a vast store of knowledge that can be accessed only through digital tools.” Users of this information will need to be able to use the tools of digital access, exploration, visualization, analysis, and collaboration. This is a new form of literacy which must become fundamental for humanities scholars. Collaboration or sharing is fundamental to the Web and to digital archiving.



Art History and the New Media: Representation and the Production of Humanistic Knowledge. Stephen Murray

Instant and free access to information across geographic and institutional boundaries has made its value plummet in an economic sense. We value what is scarce, not what is plentiful, and the precious entity is now attention, which is always finite and claimed by many sources at the same time.

Digital Preservation Matters - 10 April 2009

Library of Congress in New Media Initiatives. Weekly News Digest. March 30, 2009.

The Library of congress will start sharing video and audio content on YouTube and iTunes in order to make its resources more-widely accessible. New video and podcasting channels will be devoted to LC content. The GSA (General Services Administration) also announced agreements with Flickr, YouTube, Vimeo, and blip.tv to allow other federal agencies to participate while meeting legal requirements and the needs of government. GSA plans to negotiate agreements with other providers; LC will explore these new media services.

LC has loaded 3,100 historic photos on Flickr in a photo-sharing service and will add new photos each week. Users have helped curators with new information on the photos with public review and tagging. Their photos have received more than 15 million views.


More authors turn to Web and print-on-demand publishing. Elham Khatami. CNN.com. April 6, 2009.

Companies like Author Solutions or Lulu.com allow any author to submit a digital manuscript. These publishers use print on demand, which only produce hard copies of the books when a customer buys one. The author retains the copyright to his or her book and is responsible for all costs, from printing to marketing. Lulu has digitally published more than 820,000 titles, with about 5,000 new titles added each week. Author Solutions has helped about 70,000 authors publish over 100,000 titles, which costs from $399 to $12,999. Print-on-demand publishing is growing, and self-publishing through "vanity presses" is diminishing. On-demand publishing is more flexible, and there is less of a commitment on the author's part. Traditional publishers can benefit from the services of self-publishing companies, and can use this to find new and upcoming authors.


Preserving digital photos: What not to do. Isaiah Beard. Page2Pixel. Apr. 6, 2009.

Concerns about preserving born digital photos. Trying to preserve them with a printed copy leads to loss of image fidelity, loss of technical metadata, besides the inability to adjust or enhance the image. It is best to keep them in digital format. The world of digital curation is addressing the best practices for doing this.


Copyright and Related Issues Relevant to Digital Preservation and Dissemination of Unpublished Pre-1972 Sound Recordings by Libraries and Archives. June M. Besek. CLIR Report. March 2009. [93p. pdf]

This looks at the complex ownership rights related to pre-1972 unpublished recordings and the related laws which govern them; it particular it looks at streaming works rather than downloading them. Experts believe that “the future of audio preservation is in the digital arena” and that involves making multiple copies. There are many issues that must be resolved, such as the definition of “premises” fair use, what does “published’ mean, and the new digital technologies available.

Tuesday, April 07, 2009

Digital Preservation Matters - 03 April 2009

Nevada Statewide Digital Initiative. Website. Updated 3 April 2009.

The purpose of the Nevada Statewide Digital Initiative is to: “Increase access to the collections held by Nevada's cultural heritage institutions through digital access to materials by residents of Nevada and scholars and researchers interested in Nevada's culture and history.” The series of activities to build statewide collaboration include:

  1. creating a collection policy;
  2. creating a website that links existing projects;
  3. adopting statewide best practice and standards;
  4. creating local partnerships that would build up to statewide partnerships;
  5. developing a digital pilot project curate and manage their digital materials.

Millenniata continues to make progress with its patent-pending Millennial Disc and Millennial Writer. Press Release. February 2, 2009.

This press release has information about a new optical disc that has been developed. It is designed to be a permanent archiving product that has no degradable components and “safely stores data for 1,000 years”. The technology makes a permanent change to the disc. It is referred to as Write Once Read Forever™ and can be read in a standard DVD drive. [check back for test results.]


Systemwide organization of information resources: a multiscalar environment. Lorcan Dempsey. Higher Education in a global economy: the implications for technology and JISC. 23 March 2009. [pdf presentation]

Interesting presentation that looks at libraries and their environment. Compares core components of companies and libraries. Examines a grid of Uniqueness and Stewardship, from Freely accessible web resources in the low-low quadrant to Special collections in the high-high quadrant, and shows where preservation appears. Moving from the institution to the multiscalar level.


Digital Project Staff Survey of JPEG 2000 Implementation in Libraries. David Lowe, Michael J. Bennett. University of Connecticut. March 20, 2009. [xls]

Preliminary findings of a survey about JPEG 2000, and to understand the community perception of it. JPEG 2000 is the product of efforts for an open standard. The concerns about implementing JPEG 2000 include: limited software tools, lack of functionality, and uncertainty of need. Some survey results of interest:

  • 59.5% said they use the format,
  • 19.7% use for new archival collections,
  • 16. 3% use for converting tiff collections
  • 53.5% use for online access

Other questions discuss the tools used and include comments about them.


Rocks Don't Need to Be Backed Up. Henry Newman. Enterprise Storage Forum. March 27, 2009.

General article about the need for digital preservation. “The first thing we need is a standardized framework for file metadata, backup and archival information.” “The integrity of modern data is not guaranteed except at high cost.” “We have no real framework to change and transcribe formats.”

[This is more about transferring information between computer systems rather than archival metadata. It shows the lack of interaction between digital preservation worlds. Some of the comments about the article are interesting.]


Goodbye, Encarta. A cautionary tale for newspapers? John Yemma. The Christian Science Monitor. March 31, 2009.

An article about how Wikipedia replaced the Encarta digital encyclopedia and what that points to. What Encarta did not do was to embrace the power of the internet, which includes almost instant updating. The “lesson is that general knowledge … can’t withstand an effort that was developed specifically for the Internet and that harnesses gifted amateurs.” There is power in open-source knowledge. Organizations can take their values with them, but it can’t take the old model, nor the old work habits. “The Web is its own universe with its own rules.”


INSIGHT into issues of Permanent Access to the Records of Science in Europe. PARSE.Insight. March 27, 2009. [pdf]

This document is to give an overview and details of technical and non-technical components which would be needed for science data infrastructures. The infrastructure components are aimed at bridging the gaps between areas of functionality, developed for particular projects, separated by either discipline or time. These components should play a unifying role in science data. They are developed within a European wide infrastructure, but there should also be advantages if these components are used more widely. The group has defined four main roles: funding, research, publishing, and storage/preservation.

Science Data Infrastructure: those things, technical, organization and financial which are usable across communities to help in the preservation, re-use and open access of digital holdings.

Preservation: meant in the OAIS sense of maintaining the usability and understandability of a digital object.

Representation Information: the OAIS term for everything that is needed in order to understand a digital object.

The report discusses some major threats. Those who responded marked these as “Important” or “Very Important”:

  1. Users unable to understand or use the data e.g. the semantics, format, etc
  2. Not able to maintain hardware, software or environment to make the information inaccessible
  3. No chain of evidence causing uncertain provenance or authenticity
  4. Access and use restrictions may not be respected in the future
  5. Inability to identify the data location
  6. The current data custodian may cease to exist
  7. Those responsible to look after the digital holdings may let us down

Any of the components must be able to be handed to another organization, and the Persistent Identifiers must transfer and resolve correctly. In general it is not possible to state that an object is authentic, other than providing evidence, such as technical details, to show the provenance of the object, or a social decision of trust.


Wednesday, April 01, 2009

Digital Preservation survey

We are interested in gathering some basic information regarding digital preservation initiatives by organizations, especially examples involving repositories and related tools. If you are working on a digital preservation project or initiative, please complete the survey. It's brief and will only take a few minutes:

http://www.surveymonkey.com/s.aspx?sm=QxAuojbaOoS2LpoJiqWW8A_3d_3d.

We are interested in responses regarding the use of any repository, not only Fedora. We are conducting the survey as part of the launch of the Fedora Commons Preservation and Archiving Solutions Community, a new group that is focused on providing examples of preservation in action. A challenge for organizations is in getting practical examples to use in modeling their own implementations. The preservation solution community hopes to bring together individuals and organizations to make implementation easier.

It would be ideal if you could complete the survey by April 15, 2009 because we are hoping to present preliminary results at a birds-of-a-feather (BOF) session at the Open Repositories Conference 2009 in mid-May in Atlanta, Georgia. However, the survey will remain open after the above date to continue to gather responses.

Thanks for your input.
Chris Erickson, Ron Jantz, Nancy McGovern
Fedora Preservation Solutions Community - core team

Friday, March 27, 2009

Digital Preservation Matters - 27 March 2009

Farewell to the Printed Monograph. Scott Jaschik. Inside Higher Ed. March 23, 2009.

The University of Michigan Press announced it will shift its scholarly publishing from a traditional print operation to primarily digital. They expect most of their monographs to be released only in digital editions. Readers will still be able to use print-on-demand systems, but the press will consider the digital monograph the norm. They say it's time to stop trying to make the old economics of scholarly publishing work. The press expects to publish more books, and to distribute them electronically to a much broader audience. "We will certainly be able to publish books that would not have survived economic tests and we'll be able to give all of our books much broader distribution." Michigan plans to develop site licenses so that libraries could gain access to all of the press's books over the course of a year for a flat rate.

Other presses are also experimenting with the digital format. Pennsylvania State University Press publishes a few books a year in digital, open access format. All chapters are provided in PDF format, half in a format to download and print, and half in read only. Readers may pay for print-on-demand versions.


PREMIS Data Dictionary for Preservation Metadata. Sarah Higgins. DCC Watch Report. 25 March 2009.

This is a 3 page overview to the PREMIS data dictionary, “the current authoritative metadata standard for digital preservation” and a brief look at its use in an Institutional Repository.


Thomson Introduces mp3HD File Format. Press Release. March 19, 2009.

The company has introduced the new mp3HD format which “allows mathematically lossless compression of audio material while preserving backward compatibility to the mp3 standard.” The mp3HD files have additional information, that when combines with the mp3 portion of the file, can be played on an mp3HD-capable player. Standard mp3 players would play only the mp3 portion of the file. A program can create mp3HD files from stereo material in 16 bit 44.1Khz wav files. It is available on Linux and Windows.


Internet Archive to unveil massive Wayback Machine data center. Lucas Mearian. Computerworld. March 19, 2009.

The Internet Archive has a new computer that fits in a 20-foot-long outdoor metal cargo container filled with 63 server clusters with 4.5 million petabytes of storage and 1TB of memory. They have 151 billion archived web pages in addition to software, books and a moving image collection with 150,000 items and 200,000 audio clips. The Internet Archive also works with curators in about 100 libraries to help guide the Internet crawls.


Tuesday, March 24, 2009

Challenges facing Church history

R. Scott Lloyd. Church News. March 14, 2009.
Lecture presented at the Church History Symposium at BYU on "Preserving the History of the Latter-day Saints."
Mark L. Grover, a subject librarian at BYU who has spent 30 years gathering the history of the Church in Latin America, lamented that original records and documents are often in jeopardy of being destroyed by those who don't understand or appreciate their significance. There are several approaches being taken. "Some significant historical material surely has vanished, but much of it is still intact in private possession, and there is an increasingly greater probability that digital technology will improve the preservation odds."

Friday, March 20, 2009

Digital Preservation Matters - 20 March 2009

International Data curation Education Action (IDEA) Working Group: A Report from the Second Workshop of the IDEA. Carolyn Hank, Joy Davidson. D-Lib Magazine. March 2009.
This is a report of the workshops held in December, with links to programs and resources. In general the article acknowledges that curation of digital assets is a central challenge and opportunity for libraries and other data organizations. In order to meet this challenge, skilled professionals are needed who are trained “to perform, manage, and respond to a range of procedures, processes and challenges across the life-cycle of digital objects.” The presentations discuss developing a graduate-level curriculum to prepare master's students to work in the field of digital curation. Among the curricula at the institutions are: preparing faculty to research and teach in the field; data collection and management, knowledge representation, digital preservation and archiving, data standards, and policy. Collaboration between schools is important since the all recognize that no school can do it all. One item in particular: The skills, role and career structure of data scientists and curators: An assessment of current practice and future needs.


Report on the 2nd Ibero-American Conference on Electronic Publishing in the Context of Scholarly Communication (CIPECC 2008). Ana Alice Baptista. D-Lib Magazine. March 2009.
Some notes from this article:
  • IR (institutional repository) initiatives occur mostly in public universities
  • the main motivation for implementing an IR: answer specific demands and needs to digitally store the institution's scientific memory, rather than support for Open Access principles.
  • 40% of the analyzed IRs are maintained and coordinated by two or more sectors within each university
  • the databases with more than 3,000 documents are, in practice, OPACs with links to the full text versions.
  • the next step forward: provide new metrics on the impact factor (Scientometrics)

Items in this newsletter include:
  • CBS program on “Bye, Tech: Dealing with Data Rot.” Looks at obsolescence of computer hardware, software, and formats. “So the basic lesson is: Look after your own data and make sure that you take steps to keep it moving onto new formats about once every ten years." There are links where you can both read and watch the program. Their conclusions:
1. You should convert whatever you can afford to digital.
2. Store your tapes and films in a cool, dry place.
3. And above all, remain vigilant. As you now know, every ten years or so, you're going to have to transfer all your important memories to whatever format is current at the time, because there never has been, and there never will be, a recording format that lasts forever.
  • Federal Agencies Collaborate on Digitization Guidelines. A working group is developing best practices for digitizing recorded sound and moving images.


Got Data? A Guide to Data Preservation in the Information Age. (Updated link-August 2015.)  Francine Berman. Communications of the ACM. December 2008.
Digital data is fragile, even though we all assume it will be there when we want it. “The management, organization, access, and preservation of digital data is arguably a "grand challenge" of the information age.” This article looks at the key trends and issues with preservation:
  1. More digital data is being created than there is storage to host it.
  2. Increasingly more policies and regulations require the access, stewardship, and/or preservation of digital data.
  3. Storage costs for digital data are decreasing (but other areas are increasing).
  4. Increasing commercialization of digital data storage and services.
These four trends point to the need to take a comprehensive and coordinated approach to data cyber infrastructure. The greatest challenge in this is to develop a economically sustainable model. One approach is to create a data pyramid to the stewardship options. This shows that multiple solutions for sustainable digital preservation must be devised. There is also a need for ongoing research into and development of solutions that address these technical challenges as well as the economic and social aspects of digital preservation. They add 10 guidelines:
Top 10 Guidelines for Data Stewardship
1. Make a plan.
2. Be aware of data costs and include them in your overall IT budget.
3. Associate metadata with your data.
4. Make multiple copies of valuable data. Store some off-site and in different systems;
5. Plan for the transition and cost of digital data to new storage media ahead of time.
6. Plan for transitions in data stewardship.
7. Determine the level of "trust" required when choosing how to archive data.
8. Tailor plans for preservation and access to the expected use.
9. Pay attention to security and the integrity of your data.
10. Know the regulations.

The Library of Congress has been moving into the digital world, and one way is by a scanning project with the Internet Archive that has put 25,000 books online to date. "To preserve book knowledge and book culture means preserving every word of every sentence in the right sequence of pages in the right edition, within the appropriate historical, scholarly and bibliographical context. You must respect what you scan and treat it as an organic whole, not just raw bits of slapdash data." A lot of items that have not literally seen the light of day are being downloaded. The cost is just 10 cents a page.

Friday, March 13, 2009

Digital Preservation Matters - 13 March 2009

Update from the Blue Ribbon Task Force on Sustainable Digital Preservation and Access. Sayeed Choudhury. Preservation and Archiving Special Interest Group (PASIG). November 19, 2008. [11p. pdf]

This is a presentation about the task force. The next item is the report from the task force.

There is a focus in the task force on the economic dimensions of digital preservation. Some notes from the update: “Definition of Economic Sustainability: The set of business, social, technological, and policy mechanisms that encourage the gathering of important information assets into digital preservation systems, and support the indefinite persistence of digital preservation systems, enabling access to and use of the information assets into the long-term future.”

Economically sustainable digital preservation requires:

  • Recognition of the benefits of preservation
  • A process for selecting long term digital materials
  • Ongoing, efficient allocation of resources to digital preservation activities;
  • Organization and governance of digital preservation activities

The task force website: http://brtf.sdsc.edu/


Blue Ribbon Task Force Interim Report: Sustaining the Digital Investment: Issues and Challenges of Economically Sustainable Digital Preservation. December 2008. [78p. pdf]

Digital information is fundamental to modern society but there is no agreement on who is responsible or who should pay for access to and preservation of the information. Creating sustainable economic models for digital access and preservation is a focus of this group. They are looking at the current and best practices and to find or create useful models. This is an urgent task. Access to data in the future requires actions today. “Institutional, enterprise, and community decision makers must be part of the access and preservation solution.” Preserving data now is an investment in the future. Digital information has value far into the future. Sometimes we make best guesses or a hedge against the future. Decisions not made now often cost far more in the future. Without ongoing maintenance digital assets will fall into disrepair. Maintaining the assets is a problem with many sides: technical, legal, financial, and policy. This crosses all industries. There is also an opportunity cost.

Preservation is not a one-time cost; it is an ongoing commitment to a series of costs, requiring ongoing and sustaining resource allocations. Economic sustainability requires

  • Recognition of the benefits of preservation on the part of key decision-makers;
  • Incentives for decision-makers to act in the public interest;
  • A process for selecting digital materials for long-term retention;
  • Mechanisms to secure an ongoing, efficient allocation of resources
  • Appropriate organization and governance of digital preservation activities.

Decision-makers need to be aware of the value-creating opportunities from preservation. Understanding the scope of digital preservation is important. It examines the various economic models now being used; also shows a graph of Types of Information Retained the Longest. It is difficult to separate digital preservation costs from other costs. There is no substitute for a flexible, committed organization dedicated to preserving a collection of digital material. The Final Report of the task force is to be published at the end of 2009.

“Too often, digital preservation is perceived as an activity that is separable from the interests of today’s stakeholders, aimed instead at the needs of future generations. But in practice, digital preservation is very much part of the day-today process of managing digital assets in responsible ways; it is much more about ensuring that valuable digital assets can be handed off in good condition to the next succession of managers or stewards five, ten, or fifteen years down the road than it is about taking actions to benefit generations of users a hundred years hence.”


Fusion-io unveils SSD drives with 1.5GB throughput, 1.2TB capacity. Lucas Mearian. Computerworld. March 13, 2009.

Fusion-io, a Salt Lake based company, has announced a server-based solid-state drive with 1.5GB/sec. throughput. “Currently, the cards come in 160GB, 320GB and 640GB capacities. A 1.28TB card is expected out in the second half of this year.”


Preservation as a Process of a Repository. Tarrant, D. and Hitchcock, S. Sun Preservation and Archiving Special Interest Group. 18 - 21 November 2008. [pdf, ppt, pptx]

The presentation begins with different definitions of repository and Institutional Repository. Lynch defines IR as a set of services and processes, and a commitment to the digital materials created by an organization and its members. Diagrams of processes and OAIS / DCC and other models. Analysis of the preservation process. EPrints and digital preservation and repositories.

Friday, March 06, 2009

Digital Preservation Matters - 6 March 2009

DPE Digital preservation video training course. Digital Preservation Europe. February 2009.

The Digital Preservation Europe group has posted their Digital Preservation Video Training Course on the internet. These videos, from October 2008 cover topics such as

  • Introduction to Digital Preservation
  • OAIS Model and Representation Information
  • Preservation Analysis Workflow and Preservation Descriptive Information
  • Digital Preservation Preparation and Requirements
  • File Formats, Significant Properties
  • Metadata
  • Planning, infrastructure
  • Trusted Repositories

The workshop was to give the participants an understanding of digital preservation, issues, challenges, an understanding of the roles, models and file elements.



Can We Outsource the Preservation of Digital Bits? Peter Murray. DLTJ Blog. March 5, 2009.

With the increasing need large-scale digital preservation storage, The Iron Mountain storage facility may be considered. It is cloud based, and some of the preservation files do not need to be on the expensive SAN storage. A diagram of the architecture is included.



digiGO! — VIdeo Content Support From Front Porch Digital To AFN. Satnews Daily. March 02, 2009.

Front Porch Digital, which recently acquired SAMMA Systems, will install the DIVArchive product for the American Forces Network (AFN) Broadcast Center. The product line includes a semi-automated system for the migration and preservation of videotape to digital files.



iPRES 2008: Proceedings of The Fifth International Conference on Preservation of Digital Objects. British Library. March 2, 2009. [pdf]

This is the compilation of the iPres 2008 proceedings, all 319 pdf pages, which looks at tools and methods for digital preservation. This is the first full collection papers of the conference in addition to presentations of the conference. Some of these have been reviewed earlier, some will be included later.



Samsung stuffs 1.5TB onto three-platter hard drive. Lucas Mearian. Computerworld. March 5, 2009.

Samsung Electronics announced its first 500GB-per-platter hard drive. The hard disk has 1.5TB on three platters. With fewer platters and fewer moving parts, the drive should be more reliable.

Western Digital announced its 500GB per platter, 2TB capacity drive in January. The drive is 40% lower in power consumption in idle mode and 45% lower in reading/writing mode, and has a retail price of $149.



Preservation and Archiving Special Interest Group (PASIG) Fall Meeting. Paul Walk. Ariadne. January 2009.

This is a report on the Sun-PASIG November 2008. There are quite a number of presentation. Some items of interest:

Martha Anderson emphasized the need to preserve 'practice' as well as data, so that even if the technology we use changes, our decisions and thinking behind the processes are preserved. Chris Wood talked about storage: there will be an increase in the use of solid-state storage, but tape and disk will remain viable for some time to come. Tape is a viable storage medium and is still relatively cheap. Blu-Ray optical storage is also a good bet for the medium term. In the next few years, the cost of buying equipment will be less than the cost of running the equipment.

Friday, February 27, 2009

Digital Preservation Matters - 27 February 2009

Understanding PREMIS. Priscilla Caplan. Library of Congress. February 1, 2009. [pdf]

This is an overview of the PREMIS, the preservation metadata standard. There are different types of metadata, descriptive, administrative, structural, and preservation metadata which supports activities intended to ensure the long-term usability of a digital resource. Preservation metadata is "the information a repository uses to support the digital preservation process." PREMIS is not concerned with discovery and access, nor does it try to define detailed format-specific metadata. It defines only that metadata commonly needed to perform preservation functions on all materials; the focus is on the repository system and its management. It can also be a checklist for evaluating possible software purchases. It looks at pieces of information, not elements, which are ways of representing information in a system.

“One of the main principles behind PREMIS is that you need to be very clear about what you are describing.” PREMIS defines five kinds of entities: Intellectual Entities, Objects, Agents, Events and Rights. It also talks about file objects, representation objects, and bitstream objects. It expects that information will be transferred in XML formats.


Preservation at the Network Level: Challenges, Opportunities. Constance Malpas. OCLC. ALA Presentation. January 29, 2009.

Institutional value of print collections is being reassessed as scholarly workflows move to the Web. Digital preservation infrastructure addresses the survival of the content, but doesn’t change the value of print as a distribution medium. Large institutions are shifting resources to digital preservation, while smaller institutions rely on agreements for print preservation. The

MARC 583 tag (Preservation Action Note) is under consideration as a disclosure mechanism. There are 400,000 MARC 583 tags in WorldCat, representing 1 million library holdings. The tags indicate titles for which some type of physical or digital preservation action is scheduled or has already been performed (microfilming, digitization, web archiving, assessment, repair, re-housing, de- acidification, etc.) The greatest challenge may be workflow integration between preservation and technical services. Preservation & Digitization Actions: MARC 583


Fresh start for lost file formats. BBC News. 13 February 2009.

A European project intends to create a universal emulator that can open and play obsolete formats. With this, they hope to ensure that digital materials such as games, websites and multimedia documents are not lost. This will require constant updating to make sure that formats are supported in the future. "Every digital file risks being either lost by degrading or by the technology used to 'read' it disappearing altogether." Without this we risk a “blank spot” in history. “Britain's National Archive estimates that it holds enough information to fill about 580,000 encyclopaedias in formats that are no longer widely available.” They believe that in the long term, emulation is a more workable solution than migration to new formats, which also runs the risk of data corruption and loss.



Probing Question: Can we save today's documents for tomorrow? Adam Eshleman. PhysOrg.com. February 12, 2009.

General article about the difficult of saving digital files. At Penn State, preserving e-mail and text messages is one of the University’s greatest priorities, and especially electronic communications from high-profile people, like the university president. “These files will one day become important historical documents.” Previous presidential papers are on paper, but for the future they will be on a server. “It’s a different research paradigm.” Preserving digital files is not a onetime event. It requires ongoing decisions to keep them readable. There are steps we can take to preserve electronic documents, but there are currently no final answers.


Digital Archivists, Now in Demand. Conrad De Aenlle. The New York Times. February 7, 2009.

Pre-digital information and records need to be adapted to computers and current information needs. “The people entrusted to find a place for this wealth of information are known as digital asset managers, or sometimes as digital archivists and digital preservation officers. Whatever they are called, demand for them is expanding.” Much of their effort is devoted to organizing and protecting material in digital form. Familiarity with information technology is necessary, but there is more to it than that. The need for people in these fields is expected to triple over the next decade, in the public and private sectors.

Thursday, December 04, 2008

Hard drive sounds

Datacent Data Recovery has recorded the sounds of failing hard drives.
Read more about each product and common problems, or listen to the sounds. They advise if you hear the sounds and your drive is still working, to back it up immediately. An interesting site.

PDF/A Competence Center.

The PDF/A Competence Center is a cooperation between companies and experts in PDF technology. The aim is to promote the exchange of information and experience in the area of long-term archiving in accordance with ISO 19005: PDF/A. The site contains links and information about the standard.

Transmitting data from the middle of nowhere.

Lucas Mearian. Computerworld. December 2, 2008.
Interesting look at a marine survey company and the ways it has developed to transmit large amounts of data from remote locations. They have set up servers that reduce the amount of information transmitted by removing duplicate information then replicate the information. The data is backed up using IBM's Tivoli software and a StorageTek tape library with LTO-3 and LTO-4 tape drives for archiving data. "So data movement is the number one problem for anyone in the survey or field scientific world." They use Data Domain storage appliances for data backup / disaster recovery; it has a compression algorithm that is supposed to make sure that transmitted does not already exist on the system at the San Diego data center. It can also stream the data for maximum performance.

Friday, November 28, 2008

Digital Preservation Matters - 28 November 2008

The Future of Repositories? Patterns for (Cross-) Repository Architectures. Andreas Aschenbrenner, et al. D-Lib Magazine. November/December 2008.

Repositories have been created mostly by academic institutions to share scholarly works, for the most part using Fedora, DSpace and EPrints. While it is important to look at manageability, cost efficiency, and functionalities, we need to keep our focus on the real end user (the Scholar). The OpenDOAR directory lists over 1200 repositories. The repository adoption curve shows cycles, trends, and developments. “It is the social and political issues that have the most significant effect on the scholarly user and whether or not that user decides to use a repository.” The repository's primary mission is to disseminate the university's primary output. Researchers, not institutions, are the most important users of repositories. The benefits of repositories may not be clear to researchers, and the repository needs to “become a natural part of the user's daily work environment.” To do this we should focus on features such as:

  • Preserve the user's intellectual assets in a long-term trusted digital repository
  • Allow scientific collaboration through reuse of publications as well as primary data
  • Embed repositories into the user's scientific workflows and technology (workbench)
  • Customize the repository to the local user needs and technology
  • Manage intellectual property rights and security

Individual repositories may not be able to address all these issues. Preservation is one of the main motivators for people to use a repository. “Trust in a stable and secure repository service is established through the repository's policies, status among peers, and added-value services.” Users want someone to take responsibility for the servers and tools. Trust depends on:

  • The impact a service has on users' daily lives
  • How the service blends into their routine,
  • If the repository's policies and benefits works for the users.


Managing the collective Collection. Richard Ovenden. OCLC. 6 November 2008. [pdf]

A PowerPoint presentation on managing a collection in the future. Looks at Uniformity vs. Uniqueness, and the sameness of e-resources. The collective collection is now an aggregated digital collection rather than a distributed print collection. Access to the core aggregated collection is no longer a factor of time and craft but one of money. With this new sense of uniformity, uniqueness has a new value.

Local unique: Sensible stewardship of locally-generated assets:

Institutional repositories
University archives
Research Data

Global unique: Selected and curated content that has been actively acquired through competition

“Traditional” Special collections
Personal digital collections
Copy-specific printed books

Personal digital collections: new phenomenon, new problem.

Acquisition from older media

New management issues

Implications of Google: Google is not curated!
Preservation of the unique more important than ever.
Who will bear the cost of keeping print?
New models of collaboration


Expectations of the Screenager Generation. Lynn Silipigni Connaway. OCLC. 6 November 2008. [pdf]

Lot of information here. Some notes: Some attitudes of the newer generation: Information is information; Media formats don’t matter; Visual learners; Different research skills. They meet Information Needs mostly through the Internet or other people. They are attracted to resources based on convenience, immediate answers, and no cost. They prefer to do their own research. They don’t use libraries because they don’t know they, they are satisfied with other sources, the library takes too long or too difficult to use. The image of libraries is Books. They do not think of a library as an information resource. Search engines are trusted about the same as a library. What can we do? Encourage, promote, use creative marketing, build relationships, understand their needs better.


Digital preservation of e-journals in 2008: Urgent Action revisited. Portico. January 2008. [pdf]

The document has been out for a while, but I found it interesting in light of current efforts. It presents the results of a survey concerning eJournals. The survey was designed to:

  1. Analyze attitudes and priorities that can be used to guide Portico
  2. Assist library directors in prioritizing and allocating limited resources.

Here are some of the findings:

  • 76% said they do not yet participate in an e-journal preservation initiative.
  • 71% felt it would be unacceptable to lose access to e-journal materials permanently
  • 82% agreed that “libraries need to support community preservation initiatives because it’s the right thing to do.”
  • 73% agreed that “our library should ensure that e-journals are preserved somewhere
  • 4% believed preservation could be achieved by publishers holding redundant copies of eJournals

Libraries are unsure about how urgent the issue is and whether they need to take any action in the next two years. This appears to follow the interest of the faculty in the issue. Where the library was interested in eJournal preservation, 74% had been approached by faculty on the issue. When the library was not interested, only 34% had ever been approached by faculty, and less than 10% had ever been approached by faculty more than twice. Many libraries feel the issue is complicated and are not sure who should preserve the eJournals. They are uncertain about the best approach, and there are competing priorities. “Research institutions are far more likely than teaching institutions to have taken action on e-journal preservation.” Most libraries do not have an established digital preservation budget, and the money is borrowed from other areas, such as the collections budget.

Friday, November 21, 2008

Digital Preservation Matters - 21 November 2008

Archives: Challenges and Responses. Jim Michalko. OCLC. 6 November 2008. [pdf]

Interesting view of ‘The Collective Collection’. A framework for representing content.

  • Published Content: books, journals, newspapers, scores, maps, etc.
  • Special Collections: Rare books, local histories, photos, archives, theses, objects, etc.
  • Open Web Content: Web resources, open source software, newspaper archives, images, etc.
  • Institutional Content: ePrints, reports, learning objects, courseware, manuals, research, data, etc.
Describes an End-to-End Archival Processing Flow:
Select>Deliver>Describe>Acquire>Appraise>Survey>Disclose>Discover


Managing the Collective Collection: Shared Print. Constance Malpas. OCLC. 6 November 2008. [pdf]

Concern that many print holdings will be ‘de-duped’ and that there will not be enough to maintain the title. Some approaches are offsite storage, digitization, distributed print archives. “Without system-wide frameworks in place, libraries will be unable to make decisions that effectively balance risk and opportunity with regard to de-accessioning of print materials.” The average institutional holdings for in WorldCat: for serials=13; for books=9. Up to 40% of book titles have a single institution holding. There is a need for a progressive preservation strategy.


Ancient IBM drive rescues Apollo moon data. Tom Jowitt. Computerworld. November 12, 2008.

Data gathered by the Apollo missions to the moon 40 years ago looks like it may be recovered after all, thanks to a donation of an “ancient” IBM tape drive. The mission data had been recorded onto 173 data tapes, which had then been 'misplaced' before they could be archived. The tapes have been found but now they did not have a drive to read the data; one has been found at the Australian Computer Museum Society. It will require some maintenance and to restore to working condition. "It's going to have to be a custom job to get it working again," which may take several months.


Google to archive 10 million Life magazine photos. Heather Havenstein. Computerworld. November 18, 2008.

Google plans to archive as many as 10 million images from the Life magazine archives, and about 20% are already online. Some of the images date back to the 1750s; many have never been published. The search archive is here.


PREMIS With a Fresh Coat of Paint. Brian F. Lavoie. D-Lib Magazine. May/June 2008.

Highlights from the Revision of the PREMIS Data Dictionary for Preservation Metadata. This looks at PREMIS 2.0 and the changes made:

  • Update to the data model clarifying relation between Rights and Agents, and Events and Agents
  • Completely revised and expanded Rights entity: a more complete description of rights statements
  • A detailed, structured set of semantic units to record information about significant properties
  • Added the ability to accommodate metadata from non-PREMIS specifications
  • A suggested registry to be created of suggested values for semantic units

Thursday, November 20, 2008

Top five IT spending priorities for hard times

Tom Sullivan. ComputerWorld. November 19, 2008.

With the current economic times, organizations are busy looking to see what costs they can cut. Analysts agree these areas need to be funded.
  1. Storage: Disks and management software. For many the largest expenditure is storage. Data doubles yearly
  2. Business intelligence: Niche analytics. Information and resources to help accomplish keys goals.
  3. Optimizing resources. Get the most out of what you already have.
  4. Security. Keeping the resources secure.
  5. Cloud computing: Business solutions.

PC Magazine will be online only

Stephanie Clifford. International Herald Tribune. November 19, 2008.

Ziff Davis Media announced it was ending print publication of its 27-year-old flagship PC Magazine; following the January 2009 issue, it will be online only. "The viability for us to continue to publish in print just isn't there anymore." PC Magazine derives most of its profits from its Web site. More than 80 percent of the profit and about 70 percent of the revenue come from the digital business. This is not too much of an adjustment since all content goes online first, and then the print version has been choosing what it wants to print.

A number of other magazines have ended their print publications. The magazines that have gone to online only have been those that are declining. "Magazines in general are going to be dependent on print advertising for a long time into the future."

Massive EU online library looks to compete with Google

PhysOrg.com. November 19, 2008.

The European Union is launching the Europeana digital library, an online digest of Europe's cultural heritage, consisting of millions of digital objects, including books, film, photographs, paintings, sound files, maps, manuscripts, newspapers, and documents.

The prototype will contain about two million digital items already in the public domain. By 2010, the date when Europeana is due to be fully operational, the aim is to have 10 million works available of the estimated 2.5 billion books in Europe's more common libraries. The project plans to be available in 21 languages, though English, French and German will be most prevalent early on.