Friday, December 14, 2007

Digital Preservation Matters - 14 December 2007

CNI in DC: Integrated Digital Library on the Fedora Platform. David Kennedy. December 12, 2007.
This is one item in a blog report of the CNI conference and the Digital Curation Conference: National Perspectives conference. It is worth reading the others also. University of Maryland uses Fedora not for the IR (they use DSpace), but for the digital collections. They wanted to use it to build in sustainability and transitions. Some of their organizational issues were institutional support, development time, off the shelf vs. Fedora-type system, and others. It took almost 18 months of development. They found working with Fedora similar to java, and "programmer friendly." They use a hybrid metadata schema with METS wrappers. What have they learned?
  • metadata - uses a complex schema, but don't force users to understand the underlying schema
  • authentication - not dealt with yet, but need to do more work
  • archival storage - greater need for more space
  • need to have Quality Control standards when modifying objects and creating metadata

They have at least three or four developers working on the project, as well as a number of other team members. Since they use their own metadata scheme, it may not be possible to offer their work to others, so if they were to do it again, they may use a standard metadata schema.

New 1 day AIIM PDF/Archive Training Program. Atle Skjekkeland. AIIM Knowledge Center Blog. December 12, 2007.
The AIIM organization intends to introduce a new PDF/A training program next year. It will be focused on the use of PDF/A and its use as a file format in the archiving of data. The concept of PDF/Archive began as an AIIM standards committee in 2002 and has been accepted as an ISO standard.

Digital Preservation Pioneers: Margaret Hedstrom. Resource Shelf. December 13, 2007.
A brief bio about Margaret Hedstrom who has done a great deal for digital preservation. Her works include several articles that are definitely worth reading: Digital Preservation: A Time Bomb for Digital Libraries, It’s About Time, Invest to Save, and Incentives for Data Producers to Create Archive-Ready Data Sets.

Pooling Scholars’ Digital Resources. Andy Guess. Inside Higher Ed. December 12, 2007.
Access to documents and copyright issues have been two factors slowing the development of online scholarly repositories. George Mason University seeks to bypass libraries entirely and go directly to scholars by creating an open archive of scholarly resources in the public domain. They are creating a way for scholars to upload existing documents, make them text –searchable, and put them in a database available to the public. It will use the Zotero plug-in for Firebox, which stores web pages, collects citations and lets scholars annotate and organize online documents. It is funded by a two year Mellon grant.

Manakin: A New Face for DSpace. Scott Phillips et al. D-Lib Magazine. November/December 2007.
The increasing online scholarly communication makes digital repositories more important for preserving and managing information. This looks at Manakin which was designed to help create individual, customized repository interfaces separate from the underlying repository, which is currently DSpace. It helps a library ‘brand’ its content, better understanding of the metadata, and provides tools to create extensions of the repository. It uses schema, aspects and themes as the basic components. There is a movement to adopt Manakin as the default DSpace user interface.

SOA. IT Strategy Guide. Dave Linthicum. InforWorld. December 10, 2007. [pdf]
The essence of an organization must be identified so all activities influencing that can be identified and improved. This is the first step in realizing the benefits of a service-oriented architecture (SOA). This requires not only technology, but also a shift in the way business and IT work together. Organizations need to adopt clearly defined roles within an organization, allowing the stakeholders to understand each other’s goals and tasks. This includes understanding both the human aspects and the lifecycle management of the services. Management support for the strategy is crucial. This requires an investment in people and technology to establish the appropriate context for the strategy. “the hardest part isn’t the technology; it’s redrawing the business processes that provide the basis for the architecture — and the often contentious reshuffling of roles and responsibilities that ensues. It is important to define the value, get investment and commitment from the top, and concentrate on the long term.”

Census of Institutional Repositories in the U.S. Soo Young Rieh, et al. D-Lib Magazine. November/December 2007.
There are great uncertainties underlying institutional repositories regarding practices, policies, content, systems, and other infrastructure issues. This article looks at IR’s in five areas: leaders, funding, content, contributors, and systems, and how they are perceived. Some notes:

  • college and university libraries are the driving force behind most IRs,
  • vast majority of survey respondents have done no planning of IRs to date
  • only 10.8% respondents have actually implemented an IR
    • 52.1% have been operational less than one year,
    • 27.1% have been operational between one and two years,
  • respondents agree that the funding comes or will come from the library, typically by absorbing costs into routine library operating expenses
  • Majority of existing IRs contain fewer than 1000 items
  • DSpace is the most prevalent system for pilot-testing and use. Fedora and ContentDM are regularly pilot-tested but rarely implemented.

“Once each academic institution has a clear vision and definition of what the IR will be for its own community, subsequent decisions such as content recruitment, software redesigning, file formats guaranteed in perpetuity, metadata, and policies can flow from that vision.”

Friday, December 07, 2007

Digital Preservation Matters - 07 December 2007

Ten years after. Priscilla Caplan. Library Hi Tech. Editorial. Vol. 25 N. 4 2007.

This editorial from Priscilla reflects on the progress made in digital preservation in the past 10 years. Digital preservation in no longer a little known concept, but a problem to be solved. It is part of the mainstream. Much has been accomplished, though there is still a lot of progress to be made. Europe has a different approach; it sees this as “part of a set of curation activities.” Their approach would “help reduce our apparent confusion between institutional repositories and preservation repositories.” Few institutions will have the resources to run a true preservation repository. “Digital curation may be departmental, and archiving institutional, but I believe preservation will have to be consortial.” The US approach has been to focus on short term projects rather than long term infrastructure. There are still some basic infrastructure needs: schema, conversion utilities, and registries. We also need to develop centers to promote and assist digital preservation. We need to provide more education for both data creators and data curators.

Standards Group Accepts PDF. Sumner Lemon. IDG News Service. December 05, 2007.

Adobe PDF 1.7 has been approved as an ISO standard. The ballot for approval of PDF 1.7 to become the ISO 32000 Standard was passed by a vote of

13-1. Specialized subsets of PDF (PDF/Archive etc) had been proposed or approved as standards by ISO. The approval of PDF 1.7 is now an "umbrella" standard to unify these different subsets. Adobe gives up some control over the development of future versions.

Project SPECTRa: JISC Final Report. March 2007.

The principal aim of the SPECTRa project (Submission, Preservation and Exposure of Chemistry Teaching and Research Data) was to provide the high-volume ingest and reuse of experimental data through institutional repositories. It used the DSpace platform because of existing infrastructure and previous experience. They developed Open Source software tools and customizations which could easily be incorporated within chemists' workflows. Metadata was based on Dublin Core. They felt that serious preservation work must be at the institutional, rather than departmental, level. The metadata, identifiers, and normalizing data in open formats would make long-term preservation more possible. Preservation of chemistry data file formats is a difficult area. Their approach was to capture essential metadata at submission or extract it automatically from the data files if possible. All files should be validated against specifications. Depositing files in an institutional repository should guarantee against the loss or corruption of the raw data, but this is insufficient to ensure future usability. A policy of format migration will be necessary for much of the data.

Other project's findings included:

• it has integrated the need for long-term management of experimental chemistry data with the maturing technology and organizational capability of digital repositories;

• scientific data repositories are more complex to build and maintain than are those designed primarily for text-based materials;

• the specific needs of individual scientific disciplines are best met by discipline-specific tools, though this is a resource-intensive process;

• institutional repository managers need to understand the working practices of researchers in order to develop repository services that meet their requirements;

• IPR issues relating to the ownership and reuse of scientific data are complex, and would benefit from authoritative guidance based on UK and EU law.

Google Plans Service to Store Users' Data. Kevin J. Delaney. Wall Street Journal. November 27, 2007.

Google is developing a service to let users store contents of their computers, such as word-processing documents, digital music, video clips and images. It would let users access their files via the Internet from different computers and share them online with friends. The service would face questions on issues such as data privacy, copyright, cost, and technical challenges of offering service without interruption.

Iron Mountain Acquires Xepa Digital, LLP. Press Release. November 19, 2007.

Iron Mountain acquired Xepa, a company that deals with converting analog and out of date digital audio and video to high resolution digital file formats. They will offer on-site digital conversion for the items being stored.

Saturday, December 01, 2007

IT Disasters

The top 10 IT disasters of all time. Colin Barker 22 Nov 2007.

A list of some of the worst IT-related disasters and failures caused by faulty hardware and software or human error.

  1. Faulty Soviet early warning system nearly causes WWIII (1983)
  2. The AT&T network collapse (1990)
  3. The explosion of the Ariane 5 (1996)
  4. Airbus A380 suffers from incompatible software issues (2006)
  5. Mars Climate Observer metric problem (1998)
  6. EDS and the Child Support Agency (2004)
  7. The two-digit year-2000 problem (1999/2000)
  8. When the laptops exploded (2006)
  9. Siemens and the passport system (1999)
  10. LA Airport flights grounded (2007)

Friday, November 30, 2007

Digital Preservation Matters - 30 November 2007

Council Conclusions on scientific information in the digital age: access, dissemination and preservation. The Council Of The European Union. November 2007.

The Council of the European Union presents some conclusions regarding digital preservation and recommendations during the next few years:

  • access to and dissemination of scientific information is crucial and can help accelerate innovation;
  • effective digital preservation of scientific information is fundamental for current and future development of research
  • it is important to ensure the long term preservation of scientific information, publications and data, and include scientific information in preservation strategies;
  • monitor good practices for open access to scientific information and development new models
  • experiment with open access to scientific data and publications to understand contractual needs
  • encourage research and experiments into digital preservation on deploying scientific data as widely as possible for open access to and preservation of scientific information.

Shifting Gears: Gearing Up to Get Into the Flow. Ricky Erwayr. OCLC. October 2007.

Efforts to digital special collections mean we need to re-look at what we are doing. Do we digitize for access or preservation, or both. How do our selection criteria affect the digitizing efforts. Access is important. We should preserve the unique items to the best of our ability, but it doesn’t mean we only have once chance to do it right. We may want to re-digitize when the technology improves. Scan items as part of the initial accessioning process; create a single unified process. Metadata can be improved as needed; it can be an iterative approach. Move to a program approach, not just special projects. It should be part of the regular budget. To do a better job we need to “integrate digitization into all workflows and user services”.

Digital library surpasses initial goal of 1 million books. International Herald Tribune. November 27, 2007.

The Universal Library project has surpassed its latest target, having scanned more than 1.5 million books. At least half the books are out of copyright or scanned with the permission of copyright holders. The library's mission is to make information freely available and to preserve rare and decaying texts. It is the largest university-based digital library of free books and its purpose is noncommercial. The library has books published in 20 languages, including 970,000 in Chinese, 360,000 in English, 50,000 in the southern Indian language of Telugu and 40,000 in Arabic.

Presentations from iPRES - 2007 International Conference on Preservation of Digital Objects. National Science Library . November 2007.

This site contains many pdf files of the presentations given at the October iPres conference in China. These are interesting to review. Some that I found particularly useful include:

  • Exploring and Charting the Digital Preservation Research Landscape, Seamus Ross
  • Chinese Digital Archival Network of Foreign STM Material, Xiaolin Zhang
  • A Practical Approach to Digital Preservation: Update from PLANETS, Helen Hockx-Yu
  • Challenges of Digital Preservation: Early Lessons from the Portico Archive, Eileen Fenton
  • Developing a CAS E-Journal Archiving System, Zhixiong Zhang
  • Comparative Evaluation of Major IR Systems for Preservation, Ting Zeng
  • New Partnerships for Scientific Data Preservation and Publication Systems, Zhongming Zhu

Towards the Australian Data Commons: A proposal for an Australian National Data Service. The ANDS Technical Working Group. October 2007.

This paper, among other topics, discusses the reasons to focus on data management, the issues, and the programs to deliver the data. While the paper looks specifically at a national data service, there are aspects that are useful for local digital preservation. Here are some interesting notes from it.

  • Important activities include identifying and deploying policies and technologies to allow users to gain seamless access to data collected within multiple institutionally operated repositories.
  • The intent is to provide common services to support research to make it easier to discover, access, use, analyze, and combine digital resources as part of their activities. They should also support and advise researchers and research data managers about appropriate digital preservation strategies.
  • We are in a data deluge. It can only continue and grow in intensity as the number, frequency and resolution of data sources rises and as information becomes universally ‘born digital’.
  • Data is an increasingly important and expensive ingredient of research activities and needs increasing attention to be managed efficiently and effectively.
  • The sponsors of data capture and care should help determine the accessibility of the data
  • Not everyone can use the same solution, so there may need to be multiple responses.
  • There should be a registry of repositories with services offered
  • Provide assistance to others on adopting the plans and getting the service they need.
  • Collecting and managing the metadata is critical. Best to collect early and automatically.

The data service believes it can contribute most effectively by developing services and activities that enable stewardship within multiple federations of data management and data user communities.

In ten years time, it will be successful if:

  • A data commons exists in a network of research repositories and the data is discoverable;
  • Researchers and data managers perform well with well formed data management policies;
  • More research data is routinely deposited into stable, accessible and sustainable environments;
  • More people have relevant expertise in data management

Stewardship of digital resources involves both preservation and curation. Preservation entails standards-based, active management practices that guide data throughout the research life cycle, as well as ensure the long-term usability of these digital resources. Curation involves ways of organizing, displaying, and repurposing preserved data.

Friday, November 16, 2007

Digital Preservation Matters - 16 November 2007

Electronic Records Management and Digital Preservation: Protecting the Knowledge Assets of the State Government Enterprise. Eric Sweden. NASCIO. October 2007. [pdf]

Electronic records management and digital preservation must be a shared responsibility, including understanding and support, from the CIO. Everyone needs to be part of managing digital assets. These initiatives must be managed on the organizational level. The team needs enterprise architects, project managers, electronic records managers, librarians and archivists to ensure the knowledge assets are managed properly. Technology create both opportunities and challenges. The goal of Digital Preservation systems is to make sure the information they contain remains accessible to users over a long period of time. A challenge is to keep bit streams intact and usable long term. You need to know what to preserve and how to preserve the records. The strategy must address preservation for the life of the record. There is not a single best way to preserve digital materials. Digital materials do not allow preservation procrastination. If a record needs to be maintained for over 10 years, the original technology will probably be obsolete. Digital Preservation must be a routine operation, not a special event.

RSA 2007: long-term data storage presents legal risks. Ian Grant. Computer Weekly. 23 Oct 2007.

Art Coviello, executive vice-president of EMC, stated at a conference that storing every piece of data long term may place organizations at risk of legal liability. The organization needs to know what data they have, who is looking at it and what they are doing with it. They should classify data and users before they store data. This is needed to protect the data and to reduce information clutter.

Keep 'Smoking Gun' E-Mails From Backfiring. H. Christopher Boehning, Daniel J. Toal. New York Law Journal. October 25, 2007

While this is written from a legal and not archival perspective, the article discusses the importance of validating / authenticating electronic documents. It lists the legal rules for authenticating emails and other electronic documents, including:

  • testimony by a witness with knowledge of the object;
  • circumstantial means ("appearance, contents, substance, internal patterns or other distinctive characteristics, taken in conjunction with circumstances," such as the email address;
  • hash values that serve as a digital fingerprint; comparison to existing documents;
  • self authentication of items with labels, tags, or ownership marks.

The Aftermath: Examining the E-Discovery Landscape After the 2006 Rule Changes. Eric Sinrod. FindLaw. October 16, 2007.

Another article emphasizing the importance of records management plans for electronic data. It mentions that “Data can be located live on networks, servers, hard drives, laptops, PDAs and on backup tapes.” Purging according to retention policies is important. Data may be required in ‘native’ format with all metadata intact.

‘Digital curators’ lead cultural IT projects. Shane Schick. ComputerWorld Canada. 8 Nov 2007.

As cultural organizations try to reach new audiences online and integrate their collections into multimedia-friendly exhibits, they are starting to face the same challenges as others who have been moving away from paper-based processes. These challenges include not only figuring how to digitize content but what gets preserved first, what can wait and what doesn’t need to be digitized at all. Institutions face the difficulty of trying to preserve something indefinitely, without knowing how formats might change over time. They must collecting the right hardware and software along with the content itself. “Archives are now building in budgets for migration strategies for data.”

Friendly Advice Machine. John Cleese. Iron Mountain. October 2007.

On the lighter side: For those with an interest in digital archiving and secure storage, and a ‘British’ sense of humor, these clips may be of interest.

Friday, November 09, 2007

Weekly Readings - 9 November 2007

HD Photo to become JPEG XR. Stephen Shankland. CNet News. November 2, 2007.
The Joint Photographic Experts Group has approved Microsoft's HD Photo format as a standard called JPEG XR. This is an important step to make the format neutral. It is designed for the next generation of digital cameras and was based on Microsoft’s Windows Media Format. Microsoft is committed to make the patents available without charge. The standardization process typically takes about a year. (See also

PRONOM and DROID - new versions released. Neil Beagrie. National Archives UK. November 2, 2007.
The National Archives in the UK has released new versions of PRONOM and DROID. PRONOM is an online registry of file formats, software, and other technical information used for digital preservation purposes, available at DROID (Digital Record Object Identification) is open source software at that is used to identify file formats in batch mode. They are freely available.

An overview of LOCKSS, how it works, and issues related to it. (LOCKSS, developed at Stanford University, stands for Lots of Copies Keeps Stuff Safe.) One of the main issues surrounding it is the issue of trust. “Trusting a single provider, a single institution, and a single archive represents the real risk”. LOCKSS is built on the principle of building confidence in the archive. LOCKSS was built to archive electronic journals but has been enhanced to also archive blogs on Google’s Blogger.

Looking Ahead. Lee J. Nelson. Advanced Imaging Magazine. November 9, 2007.
The article looks at some of the industry trends. Included is an announcement on an HD Photo Plug-in for Adobe Photoshop. “HD Photo is geared for end-to-end digital photography, offering better image quality, greater preservation of data and advanced features. Its still image codec for continuous-tone images is underpinned by lossy and lossless compression, multiple colorspaces, wide dynamic range and extensive metadata.”

Government Pledges £25m To Preserve Uk's Film Archives. 24 Hour Museum. October 17, 2007.
The British government has taken steps to preserve the country’s film archives. They have given money to the UK Film Council to secure the films in the archives. “It’s absolutely right that they should be safe and accessible for future generations.” The £25million plus £3million are to be used to preserve, restore and increase access to the collections, some of which are deteriorating and in danger of being lost.

The Library and Xerox are studying the potential of using the JPEG 2000 format in large repositories of digital materials. The project is designed to help develop guidelines and best practices for digital content. The trial will include up to 1 million tiff images to be converted to JPEG 2000. Xerox will build and test the system, and they look specifically to create profiles for the objects. Xerox already created a profile for using the JPEG 2000 format for newspapers.

Friday, April 20, 2007

Weekly readings - 20 April 2007

Digital Imaging - How Far Have We Come and What Still Needs to be Done? Steven Puglia, Erin Rhodes. RLG DigiNews. April 15, 2007.
This is an overview of digital imaging over the past 10 years. It has moved beyond the experimental stage, though in many institutions is still isn’t a mainstream program. There is a growing understanding of the significant investment required for digital initiatives, particularly in the infrastructure, standards, metadata, and managing them long term. The more we learn, the more there is to learn. Some feel we are carrying the flawed earlier ‘preservation reformatting’ such as microfilm into the digital areas. Technology is never “THE answer” to our problems. The goal is to use the tools wisely, not just to have the technology. Often digital preservation was an extension of microfilming brittle books. But users are asking more than what microfilm could provide. We are still asking many of the same questions we were asking 10 years ago. We have accepted digitization, but it is not “completely synonymous with preservation”, though we are moving forward. Every project may have different digitization and preservation needs.

There has been progress in digital preservation since the first recommendations were published in 1996. Some elements include: OAIS; Trusted Digital Repositories; a preservation metadata data dictionary; or the options of digital repositories. There is a conference devoted just for digital preservation (iPres) and more literature on the topic. There is an increase in the number of standards now. The three components of a digital preservation approach presented at the Cornell workshop are organizational, technological, and resources. The organizational is the “what”, the technological is the “how” and the resources is the “how much” is needed to produce the outcomes. A challenge is to balance time and resources of developing a repository internally against the external environment. ‘There is currently no “one stop shopping” for keeping up with digital preservation research and development. Keeping up takes effort, but it is worthwhile.’

Copyright Keeps Open Archives and Digital Preservation Separate. Peter B. Hirtle. RLG DigiNews. April 15, 2007.
Open access and self-archiving repositories enhance access to current research but do not necessarily provide long term preservation of the contents. Libraries cannot rely on those repositories for at least two reasons.
  • The repositories lack the technical, organization and financial support needed to preserve materials.
  • The deposit agreements do not necessarily convey the preservation rights needed. 
“Digital preservation, by its very nature, must impinge upon the rights of the copyright owner” since they need to be copied and recopied. Copyright law does not does not give a general exemption for preservation. Typical deposit agreements do not include the preservation rights. Deposits without the right to preserve may put the repository at risk, though it is difficult to say how much. Only the “journals that are part of formal third party journal archiving programs can be said to be effectively preserved. In sum, libraries cannot yet rely upon open archives for long-term access to the journal literature.”

Friday, April 13, 2007

Weekly readings - 13 April 2007

Road Report: Second Annual Open Repositories Conference (OR07) in San Antonio. Carol Minton Morris. D-Lib Magazine. March 2007.

The conference presented sessions on DSpace, Fedora, and Eprints, including user groups for each software. Open source software may be free, but does not mean “no cost”, it brings maintenance costs. Choose the right partners to create a competitive advantage instead of competing with your associates. Fedora allows for complex digital objects. The new Fedora Commons will provide a non-profit organization to support the growing community. The next conference will be held April 1-4, 2008.

Conference addresses archiving and preservation of e-journals. Phillip Pothen. JISC. 28 March 2007.

The uncertainty of long-term access to scholarly journals is a major issue for libraries and others. A recent conference discussed the topic and said that major concerns still remain even though progress has been made. A great deal of content is still at risk. Librarians should press the archiving programs to make sure they meet their archiving needs. Librarians are the custodians of the content. The group of libraries saving the data can do more than individuals alone. LOCKSS and Portico are some methods in use. The e-Depot in The Netherlands is also archiving journals from some publishers. “Old business models are breaking down while long-term archives require highly resilient architectures, long-term funding and a commitment to quality.” Blackwell suggests that 50% of all serials publications will be online by 2016, while 39% of science journals will be online by the end of this year. This means that there are considerable preservation challenges. Preservation, access, and open access are not the same thing. “Digital curation needs to be embedded in institutional strategies.” Responsibilities and requirements must be clear and agreed upon.

History 1980-2000 has disappeared into the ether. Sorry. Ben Macintyre. The Times. March 23, 2007.

This commentary warns of the short life of digital objects, which are “dangerously disposable.” Many do not bother to archive their digital data. Historians may look back at this period as a black hole. The most important real-time histories are written in online forums, which are fleeting. Many items have already been lost. The article ends with a plea for paper, which he feels is the best way to save things.

Tools and Methods for the Digital Historian. AHRC. March 23, 2007.

The Arts & Humanities Research Council (AHRC) has created an online forum, ‘Tools and Methods for the Digital Historian’ in order to encourage the exchange of ideas. The Methods Network is a UK initiative which provides a place for discussing digital history and research, but it open to all who want to register and discuss the issues. It also refers to a set of Working Papers.

FastStone Image Viewer 3.1. FastStone Website. April 16, 2007.

Update on the FastStone Image Viewer: This downloadable program is an image browser, converter and editor. The features include viewing, managing, comparing and other adjustments to images. It provides access to EXIF information, lossless JPEG transitions, embedded thumbnails, and image annotation. It supports all major graphic formats, BMP, JPEG, JPEG 2000, GIF, PNG, PCX, TIFF, WMF, ICO and TGA, as well as many RAW formats, such as CRW, CR2, NEF, PEF, RAF, MRW, ORF, SRF and DNG. It also supports saving files in pdf format.

Friday, April 06, 2007

Weekly readings - 06 April 2007

JPEG 2000 - Do you use it? John Nack. Adobe blog. April 02, 2007.

Photoshop has contained a plug-in for reading and writing jpeg2000 files. However, Adobe has not seen the widespread adoption of the format. With Photoshop CS2, they decided to stop installing the plug-in by default, though it is still currently available. If features no longer make sense, they will retire them in order to focus on what is most important. Adobe is trying to gauge the value of standalone jpeg2000 reading and writing. [Lots of comments on the blog.]

Questioning the Future of JPEG2000 Support in Photoshop. Peter Murray. TLDJ. April 5, 2007.

There is still some uncertainty about the format and whether it will be used much. The response to the Abode survey has been disappointing that not many use jpeg2000. It would be a shame if support were dropped since the format seems to be gaining ground. Google lists projects where some are working on wider adoption of the format.

Metadata mangling in Windows Vista. Stephen Shankland. CNet News. February 8, 2007.

Windows Vista and the Photo Info tool can cause problems with some images or the metadata. Some cameras use an EXIF Maker Note Tag in the image, and when updated, the digital camera software “may no longer recognize the metadata that is automatically added to the photo." There have also been reports of some compatibility issues and the files becoming “unreadable in other applications, such as Adobe Photoshop." Camera manufacturers may provide software for Vista users who want to open or print raw files.

Intel Gets More Time to Explain Lost E-Mails in Antitrust Case. Chris Preimesberger. eWeek. April 6, 2007

Intel has been granted more time by the court to explain how they will locate missing emails. Guidelines enacted in December require enterprises to be able to quickly find data files required by the court. Some of the items may have to be recovered from backup tapes or user backups, neither of these are indexed. The court said they had an “ill-conceived plan of document retention and lackluster oversight”. People at the highest level “failed to receive or to heed instructions essential for the preservation of their records”.

Friday, March 30, 2007

Weekly readings - 30 March 2007

Testimony to Congress. James H. Billington. Library of Congress. March 20, 2007.

Statement given by the Librarian of Congress concerning the Library of the 21st Century. It now takes 15 minutes to produce the same amount of information that it took LC over 200 years to acquire. Most exists only in digital form. “There is a widely-held but false assumption that digital materials accessible today … will necessarily be available in the future.” Also, “information not actively preserved today could literally be gone tomorrow.” Recent important digital materials, such as those on the internet, have not been preserved and have vanished. These are the primary sources of our time. A key challenge is to “capture, collect, preserve, and provide access to important ‘born-digital’ material and Web-based information.” LC manages about 295 TB of digital information. “The Library's basic mission of acquiring, preserving and making accessible the world's knowledge and the nation's creativity is not changing.” We can’t save everything, so we need to identify and select what is critical to the collection. “We are not just creating endless digital data files; we are giving our collections context and making them increasingly accessible to the world.” As we add to our collections we need an infrastructure that will make the content available in the future. A new asset is LC’s National Audiovisual Conservation Center which will preserve and make accessible the audio – visual collections.

Killing risk, unifying data protection. Jim Damoulakis. Computerworld. February 27, 2007.

It is important to look at what we are doing with data protection. Some of the techniques include nightly backup, snapshot, mirroring, database dumps, host-based replication, and storage array-based replication. One way to create a unified strategy is to look at the risks that exist. They include physical device failure, data loss through deletion or corruption, and disasters. Data loss can occur undetected, and there needs to be a way to protect against this.

Perspectives on Trustworthy Information. H.M. Gladney. Digital Document Quarterly. March 2007.

Digital preservation activities are shifting from solving basic problems to implementing solutions and repository procedures. Selection is a challenge of building a long-term digital collection, but it need to be balanced by practicalities. Archival objects need to include honest and adequate provenance information that is bound to the object. “Preserving an information collection is a different challenge than managing archives.” The need to preserve digital information, which is the base of most scientific research, is self-evident. Snapshots and logs may be sufficient for preserving databases.

JHU/UVA Medieval Manuscript Digitization Workshop. Timothy Stinson. Blog. March 28th, 2007.

This quote is from the blog report of the digitization workshop: “Staples has a great way of thinking about preservation - he pointed out that preservation isn’t simply a technological solution, an archive, e.g., where we can stick things and have them safe forever. Rather preservation is the result of usage, maintenance, and institutional commitment. Those things that are used the most, he argued, are the same ones that are migrated the most frequently, and are the least likely to become invisible and forgotten or to cease to be a priority to individuals and institutions. We need not only technical solutions, but also wide access and modeling of data in such a way that it is frequently used, migrated, and repurposed.”

Calif. CIO Steers Clear of Ideology on File Formats. Carol Sliwa. Computerworld. March 19, 2007.

The question of open formats is not an ideological struggle between competing visions of the future. It is a straight business decision, looking at the costs of one approach over another and deciding if it meets the business needs. They don’t have a preference between ODF and Office Open XML file format, but they are moving to interoperability and things that are more open and stop being locked in to proprietary systems. Open, XML-based formats provide flexibility.

Friday, March 23, 2007

Weekly readings - 23 March 2007

Trustworthy Repositories Audit & Certification (TRAC): Criteria and Checklist. Robin Dale, et al. CRL. March 9, 2007.

TRAC is the revised and expanded version of the Audit Checklist originally developed by RLG-NARA. The 94 page report provides a very complete method for checking and certifying long-term repositories. It can also be used for planning and guiding the development of repositories. The document looks at Organizational Infrastructure, Digital Object Management, and Technologies, Technical Infrastructure, & Security, and provides a checklist of criteria for measuring the trustworthiness of repositories. Another link to the site.

e-Journals: Archiving and Preservation. Briefing Paper. JISC. March 2007.

The traditional model of publishers supplying content and libraries preserving content does not work well with digital materials. Licensing agreements do not guarantee permanent access to materials. But the e-journal trend is increasing at a rapid rate. Many are searching for the solution. The terms ‘perpetual access’, ‘archiving’, and ‘long-term preservation’ are often used interchangeably. Perpetual access is usually used with e-journal licenses clauses to assure that access will be continued regardless of events. Archiving describes the management processes of e-journals. Long-term preservation refers to the processes to ensure the content remains accessible in the future, regardless of any technical or organizational changes. There needs to be multiple options and strategies for preserving e-journals, including coordinated overlap. There are promising developments evolving, but the solutions must include libraries, publishers, and archiving services.

Iron Mountain launches active archiving for email. Computer Technology Review. March 20, 2007.

Iron Mountain has introduced an Active Archiving Service for email. This is a single solution which includes management, archiving, legal discovery, continuity and disaster recovery. Most legal discovery processes now include email. The new federal rules make an email archive critical. It is integrated with Outlook and allows users full access to emails and the ability to restore individual messages. The cost starts at $6 per user per month.

Metadata for All: Descriptive Standards and Metadata Sharing across Libraries, Archives and Museums. Mary W. Elings, G√ľnter Waibel. First Monday. 5 March 2007.

The cultural heritage community has a large pool of digital resources for teaching, research and learning. A big challenge is integrating digital content from libraries, archives and museums which use different strategies for caring for their materials. Applying data content standards by material type rather than the organization could make the data more usable within the entire community. Two schema used are Visual Resources Association (VRA) Core and the Categories for the Descriptions of Works of Art (CDWA). The article lists the elements of metadata standards, and the relationship between them and museums, libraries and archives. There is a call among archives to process collections more efficiently so they achieve control over all their holdings. The successful use of digital materials in libraries, museums, and archives revolves around the ability to describe similar materials in different institutions.

Microsoft Announces HD Photo, a New Imaging File Format With Advanced Features for Today’s Digital Photographers. Press release. March 8, 2007.

Microsoft announced a new file format for that offers higher image quality, greater preservation of data, and advanced features. HD offers both lossless and lossy image compression. When compressed it has twice the efficiency of JPEG, with fewer artifacts. It preserves the entire original image. They also released a plug-in for Photoshop. [See Photoshop gets HD Photo support.]

Dell to ship PCs with 1TB drives. Chris Mellor. Techworld. March 16, 2007.

Dell will ship computers with Hitachi 1 TB drives, targeting users who wish to store large amounts of data. The computer can handle up to 4 TB. The drives use perpendicular recording. The 1TB drive is priced at $540. Dell is also introducing a 'video time capsule service' where users can upload videos to a site where Dell will store them for a claimed 50 years.

Blu-ray Aims to Oust DVDs Within Three Years. eWeek..

A Digital Life. Gordon Bell, Jim Gemmell. Scientific American. February 18, 2007.

New systems may allow people to record everything they see and hear--and even things they cannot sense--and to store all these data in a personal digital archive. The MyLifeBits project has provided the tools to create a person’s lifelong digital archive. Technological advances may make this easier but there are challenges, particularly with privacy rights and restrictions. They believe digital memories will yield benefits in many areas.

Hammer Storage Pounds Out 'Disruptive' 1TB Appliance. Chris Preimesberger. eWeek. March 22, 2007.

Hammer Storage has introduced Myshare, a new plug and play storage device with 1TB for $499. It can be used on a network, and the content can be made available through a web application, including selective access to folders. The content can also be mirrored, secured, and it allows multiple user and group permissions.

Wednesday, March 21, 2007

Scholarly Communication

Web 2.0 Presentation to BYU Library. Gideon Burton. Blog. March 13, 2007.

This was a presentation to the library on trends in Scholarly Communication, web 2.0, and other topics. This page includes his PowerPoint presentation and links to a video shown at the meeting, "The Web is Us/ing Us" by Michael Wesch.

Oops! Techie wipes out $38 billion fund

While doing routine maintenance work, the technician accidentally deleted applicant information for an oil-funded account — one of Alaska residents’ biggest perks — and mistakenly reformatted the backup drive, as well. There was still hope, until the department discovered its third line of defense, backup tapes, were unreadable. More than 300 cardboard boxes of paperwork has been scanned again.

Friday, March 16, 2007

Weekly readings - 16 March 2007

History, Digitized (and Abridged). Katie Hafner. New York Times. March 10, 2007

Archives and museums hold many important items that will probably not be digitized in the near future. This increases the possibility that they will be ignored as people expect more that all information is on the internet. A major problem is the cost of digitizing materials. Many items will still exist only in paper, LPs, magnetic tape and film. Libraries tend to digitize the items that are unique to their collection. But by putting the items on the internet, the number who use them increase dramatically. The LDS Church has initiated large scanning projects and hopes to have hundreds of millions of images online in the next five years. Others are digitizing collections which allow much broader access to the materials, but copyright is an issue. There is very little room in copyright law for preservation. The amount of material available can be overwhelming.

Director's Message. Anne-Imelda M. Radice. News & Events. March 2007.

The Webwise Conference: Stewardship in the Digital Age: Managing Museum and Library Collections for Preservation and Use highlighted the huge shift underway in museums and libraries. In a short time they have gone from knowing almost nothing about preserving digital objects to now understanding that digitization is an important part of conservation and use. Besides preserving the physical objects, institutions realize they need digital repositories for collections that are:

- physically vulnerable

- on fragile or unstable media

- born digital

Digitization protects historically important collections and addresses future collections. There is a great need to develop a new set of digital preservation skills in order to address digital objects. These collections can increase public awareness and interest in existing collections that may currently be unknown. Digital stewardship is an important part of the overall mission of libraries and museums as they care for their collections.

Quad-layer DVD Technology Becomes the Third HD Format. Marcus Yam. DailyTech. March 11, 2007.

New Medium Enterprises (NME) has developed the Versatile Multilayer Disc (VMD), a new optical-based format capable of storing 20GB of data. VMD is a red-laser technology that achieves its storage capacity by using a greater number of layers. VMD is the same size and thickness as DVD. However, while DVD technology uses two layers of a disc, VMD technology has multi-layering where up to 5GB can be stored on each layer.

Version 3.0 Launched. Mia Garlick. Creative Commons Website. February 23, 2007.

The latest version of the Creative Commons license is now available. A new generic license has been created. The new licenses ensure that there is consistent, express treatment of the issues of moral rights and collecting royalties and that there are no legal barriers to people being able to remix creativity as intended.

Intel Faces Up to E-Mail Retention Problems in AMD Lawsuit. Chris Preimesberger. eWeek. March 7, 2007.

A U.S. federal judge on March 7 gave Intel 30 days to try to recover about 1,000 lost e-mails that it was required to keep for an antitrust lawsuit. The suit was filed AMD (a competitor) in 2005. Intel could have been easily avoided the digital storage problems with careful planning. The new U.S. federal court rules enacted last December require companies to be able to quickly find data required by the federal court.

Principles for Digitized Content. ALA. Website. March 2, 2007.

An ALA task force introduced the draft Principles for Digitized Content. These have been put on the ALA blog and they are interested in comments. The principles in brief are:

1. Digital libraries ARE libraries and ALA policies and values apply fully.

2. Digital content must be given the same consideration as regular content, including preservation.

3. Digital collections must be sustainable and requires long-term management capabilities.

4. Digitization requires collaboration which will require strong organizational support

5. Digital activity requires ongoing communication for its success.

6. Digital collections increasingly address an international audience.

7. Digital collections are developed and sustained by educated staff, requiring continuous learning

8. Digital materials require appropriate preservation, including the development of standards, best practices, and models for sustainable funding to guarantee long term commitment.

9. Digital collections and their materials must adhere to standards, serve the broadest community of users, support sustainable access and use over time, and promote the core library values

Model Plan for an Archival Authority Implementing Digital Recordkeeping and Archiving. Australian Digital Recordkeeping Initiative (ADRI). 2 March 2007.

This 32 page Word document is a list of the components, tasks and resources needed to develop a digital recordkeeping / archiving capability. It addresses creating recordkeeping standards and developing a digital archives repository. It is based on the OAIS model. It outlines the strategies, functions, and tasks to develop, implement, and review the repository. The functions for preservation planning are:

- Monitor / interact with the designated community to understand requirements and changes

- Monitor the emerging technology and standards

- Develop and recommend preservation strategies

- Develop packaging designs and detailed migration plans and prototypes

- Implement administrative policies and directives