Saturday, November 21, 2009

Avoiding Amnesia in a Digital World

Press Release: Avoiding Amnesia in a Digital World

The world's archivists have been looking into a collective crystal ball, imagining the future of their profession in the 21st century. Meeting in Malta, 240 archives and records professionals from 90 countries recognised the enormous challenges they face in a fast changing world.

Documents and records previously on paper are turning digital. Archivists need to act swiftly to save these fragile electronic entities and make sure we don't all suffer a memory loss in future. They need to exploit new technologies to reach out to new generations of users. And to do this, archives staff need the right training, the right skills and the right experience.

"We're aiming to create a new generation of archivists, completely comfortable with Google and You Tube" commented conference chair Nolda Römer-Kenepa from Curaçao. Running through the meeting was the idea that today's archive leaders are digital immigrants, coming to terms with a new world of electronic communication, while a new generation of digital natives is now growing up.

Delegates to the conference on the theme "Imagining the 21st Century Archivist: New Strategies for Education and Training" agreed to work together through national and international networks, promoting internships and exchanges for archives staff, sharing online resources, setting up mentoring programmes and collaborating on research. Of particular concern to participants were problems of indigenous and minority peoples, calling for more flexible options for entering the profession. Distance learning using the internet was seen as a key way for developing countries to acquire education and training.

The International Council on Archives (ICA), which meets every year to exchange views and develop policies, was hosted this year by the National Archives of Malta. Welcoming delegates, National Archivist Charles Farrugia said: "Malta has a long archive tradition, with records going back 700 years. We face the 21st century with new confidence: we renewed our Archives Act in 2005 and started courses in archives and records management at the University of Malta the same year."

Today's archivist needs to manage change, to master technology and to operate in an electronic environment, while keeping a historical perspective. Beyond that, delegates recognised the need to understand business processes, to develop communication and advocacy skills and to be able to capture oral traditions.

Conference goers were concerned with the whole learning cycle, from initial education, through on the job training and taking in continuing professional development. They called for a constant review and update of education and training curricula, using feedback from educators, professional associations, employers and students.

Another key conclusion was the need for a strong relationship between research and teaching programmes in universities offering recordkeeping education. Archives organisations too, need to foster research and development projects, drawing in partners from the academic world, the voluntary sector and business.

Summing up the proceedings at a dinner in the historic surroundings of the La Valette Hall, ICA President Ian Wilson from Canada said: "Malta is a small country and it took courage and determination on the part of its National Archives to invite their worldwide colleagues here. Charles Farrugia's vision, the support of his Minister Dolores Cristina and the government have produced a most memorable conference in Malta."

ICA, the International Council on Archives, is based in Paris, and has nearly 1,500 individual and institutional members in 195 countries.
20 November 2009

Friday, October 30, 2009

Digital Preservation Matters - 30 October 2009

Copyright and Cultural Institutions: Guidelines for U.S. Libraries, Archives, and Museums. Peter B. Hirtle, et al. October 2009. [Book and 275p. PDF]

This book can help libraries learn how they can use the internet to provide access to their collections and comply with copyright laws. The book is also available in PDF form on the website. It “addresses the basics of copyright law and the exclusive rights of the copyright owner, the major exemptions used by cultural heritage institutions, and stresses the importance of “risk assessment” when conducting any digitization project.” A section on ‘Digital preservation and replacement copies’ is important to read to learn more about what we can do and can’t do.


Universities offering new perks to broke students. Carol Warner. HigherEdMorning. October 25, 2009.

An initiative in Florida makes more than 120 textbooks available to students to download for free. It also sells the books at a discount because findings show:

  • 22% of students are “uncomfortable” reading from a computer screen
  • 75% of students prefer to read print copy,
  • 60% of students would buy a discounted book even if the textbook was available for free online.


Microsoft opens Outlook format, gives programs access to mail, calendar, contacts. John Fontana. HigherEdMorning. October 27, 2009.

Microsoft announced it will provide patent- and license-free use rights to the format behind its Outlook Personal Folders. It will document and publish the .pst format, which is used for the email, calendar, and contact functions. This will explain how to parse the contents of the file and how to access that data from other software applications.


iTunes for college courses? It’s true. Carin Ford. HigherEdMorning. October 24, 2009.

The University of Virginia now offers over 1,000 lectures, videos, etc. as a free digital download from iTunes U. Others have done this for a long time. But they have combined other features. An interesting feature is that if a student subscribes to a specific course, new material will be downloaded automatically to his iTunes library.


Obama Drupal-ing around; goes open source. Richi Jennings. Computerworld. October 26, 2009.

White House has chosen Drupal, an open-source content management software, to run the Web site.


High Volume Document Storage Creates New Headaches for Content Managers. Steve Jones. CMS Wire. Oct 26, 2009.

Digital storage centers are filling rapidly. Some look at ‘single-instance storage’ as a viable option for organizations trying to reduce digital storage requirements. Single instancing is based on the principle of keeping one copy of a digital file that multiple users share and eliminating duplication.


Open Access to Research Is Inevitable, Libraries Are Told. Jennifer Howard. The Chronicle of Higher Education. October 15, 2009.

A panel told ARL libraries that public access to research is "inevitable," but it will take work to get there. Faculty are starting to understand that open access to research has to happen in order to have the most scholarly opportunities. The US is far behind other countries regarding access to research. Researchers who don’t have the latest research can't fully participate in the academic discussions. The National Science Foundation plans to build an international, large-scale data-curation network.

Monday, October 26, 2009

Digital Preservation Matters - 23 October 2009

Sidekick Data Restoration Has Started, Microsoft Says. Barry Levine. NewsFactor. October 20, 2009.

Danger, a Microsoft subsidiary using ‘cloud computing’, experienced a system problem that erased all the users' contacts, calendar entries, to-do lists, and photos for those using the Sidekick smart-phone. Much of the data may be eventually recovered, but effective data backup and protection measures were not being followed. It shows the importance of using reliable vendors and have data backups. [This is the first major loss of ‘cloud – data’ that I know of.]


Millennial disc guarantees data preservation. Logan Bradford. Daily Universe. September 15, 2009.

Barry Lunt, a BYU information technologies professor, will launch a product with the company, Millenniata, that produces a disc just like a CD or DVD that will last up to 1,000 years. He learned, through his seven years working for IBM in computer data, that data on CDs and DVDs would decay and be lost over just a few years because of optical discs’ ephemeral qualities, such as when they are exposed to sunlight and humidity. [We have been testing these discs and writers.]


Wellcome Library to use JPEG2000 image format. Library blog. September 18, 2009

The Wellcome library in London has been using TIFF images as their archival storage format. But, anticipating adding over 30 million images, they wanted to find a way to efficiently store the digital content but still maintain high levels of quality and open standards required for long-term preservation. To do this they have chosen to use the JPEG2000 format in its digitization program. But the difficulty is that the JPEG2000 format has multiple versions. They wanted to know which version is best for long-term storage and access, so they commissioned a study by Kings College: JPEG 2000 as a Preservation and Access Format for the Wellcome Trust Digital Library. Robert Buckley, Simon Tanner.

Based on the study will adopt a "visually lossless" lossy compression to gain at least 75% storage savings in comparison to a TIFF version. “The recommended compression parameters will produce an image with no visible difference in image quality, but the compression is irreversible - i.e. the original bit stream will not be possible to reconstruct. As the Library will be digitising physical items that can (if necessary) be re-digitised, it was considered an acceptable compromise.” Some materials may be candidates for JPEG2000 lossless compression. They are also recommending that “JPEG 2000 be used with multiple resolution levels.”


The Swedish Research Council requires free access to research results. Press release. October 8, 2009.

Researchers granted funds by the Research Council should publish their scientific research in publications that are available according to Open Access guidelines within a maximum period of six months. "We consider that publication of research which has been paid for out of public funds should be made freely accessible to all." The Open-Access rules apply so far only to scientifically assessed texts in journals and conference reports, and not to monographs and chapters of books.


Sound archive of the British Library goes online, free of charge. Mark Brown. The Guardian News. 3 September 2009.

The British Library has made its archive of world and traditional music freely available on the internet. The Archival Sound Recordings archive contains about 28,000 recordings, estimated at 2,000 hours of sound. These recordings are from around the world and the oldest are from wax cylinders made in 1898. The Library wants to change the perception that “things are given to libraries and then are never seen again – we want these recordings to be accessible."


Keeping Research Data Safe2: Data Survey added to project website. Neil Beagrie. Blog. 26 Sep 2009.

Information about the project and link to the website. The project is to identify long-lived datasets for the purpose of cost analysis will be ending soon. It refers to the previous project. In the activity model it mentions it will look at the development of an archive’s selection policy, also staff training and development. One area of concern was of OAIS terminology potentially being a barrier to understanding for some user groups.

Friday, October 23, 2009

Hardware updates

Micron boosts NAND flash endurance six-fold. Lucas Mearian. Computerworld. October 19, 2009.

Often techniques are used to increase flash memory performance, but they also cut the capacity by 90%. Micron says its technique to increase density also increases the number of times data can be written to the device.


Engineers create material that could hold 1TB of data on fingernail-sized chip. Lucas Mearian. Computerworld. October 21, 2009.

Engineers have created a material that could hold a terabyte of data in a chip the size of a fingernail.

Friday, October 16, 2009

Digital Preservation Matters - 15 October 2009

PREMIS Implementation Fair 2009. Reports. October 7, 2009.

This fair concerning PREMIS (digital preservation metadata) was presented following the iPres conference in San Francisco. The online agenda for the PREMIS fair now includes the PowerPoint and PDF presentations. They include:

  • Status of PREMIS (Brian Lavoie)
  • Implementation in METS (Rebecca Guenther)
  • PREMIS Rights implementation at University of California San Diego (Bradley Westbrook)
  • PREMIS for geospatial data (Nancy Hoebelheinrich)
  • Towards Interoperable Preservation Repositories (TIPR) project (Priscilla Caplan)


New E-Book Company to Focus on Older Titles. Motoko Rich. The New York Times. October 13, 2009.

A new company has been formed that will republish old titles, and seeks new authors willing to be published in electronic format. They look at to an aggressive marketing campaign, and are looking at both the backlist and the electronic format.


The ARL Preservation Statistics 2006-07. David Green. Association of Research Libraries. September 25, 2009.

The latest edition of the preservation statistics. It looks mainly at physical preservation, and includes information on personnel, expenditures, conservation treatment, preservation treatment, and preservation microfilming. The section dealing with digital preservation states:

“Digitizing for preservation purposes is the reproduction of bound volumes, pamphlets, unboundsheets, manuscripts, maps, posters, works of art on paper, and other paper-based materials for thepurpose of:

a) making duplicate copies that replace deteriorated originals (e.g., by digitizing texts and storing them permanently in electronic form and/or printing them on alkaline paper);

b) making preservation master copies to guard against irretrievable loss of unique originals (e.g., by making high-resolution electronic copies of photographs and storing them permanently and/or printing them; or

c) making surrogate copies that can be retrieved and distributed easily, thereby improving access to information resources without exposing original materials to excessive handling; or some combination of these factors.


Alfresco Achieves DoD 5015.02 Records Management Certification. Marisa Peacock. CMS Wire. Oct 5, 2009.

Alfresco is certified as DoD 5015.2 compliant, the first open source content management system to do so. The Department of Defense standard outlines mandatory requirements for Records Management programs and is recognized as a base standard by many organizations.

Monday, October 12, 2009

netarchive Newsletter - August 2009

The two institutions behind the, the Royal Library and the State and University Library, have developed a system for web archiving called NetarchiveSuite, which has become open source.

An update on web archiving activity. The Act on Legal Deposit of Published Material of 22 December 2004 provides their legal foundation. The major issue for web archiving is access, which is currently limited to researchers. These come under their data protection act.

Their archive contains about 112 Terabytes with about 3.5 billion objects. The Top Level Domain DK now has more than 1.3 mill. domain names of which about 1 mill. are active. In addition they harvest about 44.000 Danish sites on other domains.

The system provides technical information on harvests but it is important to document the decisions made on what to collect and not collect so that future researchers may know the content of the archive.

Friday, October 09, 2009

Digital Preservation Matters - 9 October 2009

Preservation: Evolving Roles and Responsibilities of Research Libraries. ARL. September 15, 2009. Webcast, PDF.

This site has the webcast and a pdf of the original report. Some notes from the webcast:

There isn’t enough funding to meet all preservation needs

  • Need to align preservation activities with institutional concerns.
  • Can’t preserve without metadata. It is what connects preservation to the rest of the library.
  • Must maintain the context and address rights issues.Research libraries must determine what researchers will need in the future, then preserve the content so that it can be us
  • Need to balance print and digital collections
  • Our special collections are what differentiate us from other research libraries, and we need to make broad access available to these collections

Safeguarding Collections at the Dawn of the 21st Century: Describing Roles & Measuring Contemporary Preservation Activities in ARL Libraries. Lars Meyer. ARL. May 2009.

Notes: Preservation is a core function of the research library and a key element of the stewardship and access missions of research organizations. It is not just the responsibility of one department. Three perspectives for libraries:

  1. Reallocate priorities and resources in response to changing trends in publishing, research, and teaching activities.
  2. Expand practices related to preserving digital content though Web archiving, digital repositories, and efforts to preserve e-journals and other born digital content
  3. Build collaborative activities to effectively address digital preservation challenges

“Digital Preservation is a subset of library preservation.” A definition of data curation: ““is the active and on-going management of data through its lifecycle of interest and usefulness to scholarship, science, and education. Data curation activities enable data discovery and retrieval, maintain its quality, add value, and provide for re-use over time. [It] includes authentication, archiving, management, preservation, retrieval, and representation.” Preservation includes managing the relationship between content, context, and access. Rather than discussing analog vs. digital, or access vs. preservation, it is more important to ask if relevant preservation work is being done throughout the life of digital content, and how or when is it done.

Preservation of born digital content begins with:

  1. decisions about the form in which a library should acquire digital content;
  2. a clear understanding of the library’s rights to preserve such content;
  3. policies, technical infrastructure, and staff to realize the ongoing work;
  4. possibly joining or develop cooperative digital preservation networks.

Digital Repositories: It is “vitally important for all research libraries to be engaging with digital repository development projects in some fashion.” These are not the same as preservation repositories; both are needed. One example given is the Illinois Digital Environment for Access to Learning and Scholarship (IDEALS) is the institutional repository for the University of Illinois at Urbana-Champaign, which has very well defined policies. Digitization, to be considered a preservation process, must include an institutional plan to preserve the digital content.

Web resources: many government documents which used to be printed are now available on online. Research libraries have a responsibility to determine how the community can best share the effort of identifying, describing, and preserving Web-based publications.

It is unlikely that all the needed preservation expertise will exist in one library so ARL members need to develop partnerships. Preservation is a continual learning process.

Some recommendations:

Institutions need well-developed policies, strategies, and practices.

  1. Preservation decisions to be made strategically throughout the life-cycle of the resources
  2. Community agreed-upon practices are needed for preserving digital surrogates.
  3. Digital curation: an active set of activities that requires active partnerships
  4. Digital preservation requires an understanding of rights, technical infrastructure, staffing
  5. Repositories: include preservation activities, work with faculty on continuing access.

Google lets you custom-print millions of books. Ryan Singel. Wired. September 18, 2009.

Out of print books can be printed individually with the Espresso Book Machine through a venture with Google Book Search and On Demand Books. The $100,000 printer can print a 300 page gray-scale book with a color cover in about 4 minutes; it then trims and binds the book. The materials cost about $3. The machine can print about 60,000 books a year. The books can be printed from pdf files. “Some 80 percent of the public-domain books are looked at in a given month.” “Neller said he'd love to see the day when Google Book Searchers can press a button next to a search result and find the closest local printer, but Google says that's a long way off.”

Friday, October 02, 2009

Digital Preservation Matters - 01 October 2009

Media Preservation Survey: A Report. Mike Casey. Indiana University Bloomington. 1 October 2009. [132 p. pdf]

An excellent and very detailed report from Indiana University Bloomington concerning the 560,000 audio and video recordings and reels of film on the campus. The report looks at the characteristics and condition of only one of the many groups of materials and the preservation challenges. This was a ten-month study by a team of archivists. The next step is developing a campuswide preservation plan. These historical "jewels" will be lost if not preserved soon. A few preservation activities exist on campus but they are too small to be effective or are not sustainable. They have 51 different media formats. They have over 180,000 digital files in the collection. "These formats require active preservation services from the moment of creation if their content is to survive."

Redundancy is a key strategy is saving the materials, but only 11% have a copy. "One copy is no copy." Preservation of audio and video objects require transferring to digital. Storing AV materials at the correct temperature and relative humidity is the "single most important factor in slowing the physical degradation of audiovisual media." At the current rate it will take 120 years to digitize the AV holdings. There is a very useful chart of Selected High and Medium Risk Formats in their collection.

Among their recommendations:

  • Appoint a campuswide taskforce to advise on preservation
  • Create a centralized media preservation and digitization center for the entire campus
  • Develop special funding to digitization the materials quickly
  • Create an appropriate and centralized physical storage space for the materials
  • Provide archival appraisal and control across campus
  • Develop cataloging services to accelerate research opportunities and improve access
  • Completion of a digital preservation repository

Bagit: Transferring Content for Digital Preservation. Library of Congress. September 29, 2009.

Short video on YouTube about BagIt, a tool from The Library of Congress, California Digital Library and Stanford University. They have developed guidelines for creating, moving and verifying standardized digital containers, called "bags." BagIt requires a bag declaration, list of contents, and the actual content.

Archiving Is For E-discovery; Backup Is For Recovery. Mathew Lodge. The Metropolitan Corporate Counsel. September 01, 2009.

There are challenges with court requests for the discovery of information from backup tapes. If backup tapes are used for information retrieval then they are accessible for e-discovery. But they were never designed for this. Many are doing archiving, but that has different meanings to people. "Active archiving is different: it's a way of centrally managing the storage, retention and hold of information while ensuring "live" (or active) access to any item." It means to move objects to a central repository and provide access to users.

Purple Cows and Fringy Propositions. Carol Minton Morris. D-Lib Magazine. September/October 2009.

Notes from the Fringe Festival. "At every stage of the Bodleian Library's development, Oxford changed practices and policies, and improved – first analog and later digital – technologies in response to changes in the world beyond the Library. Realization is still a catalyst for change." The most useful metaphor for the repository is the internet. Institutions like Oxford create institutional repositories as components of larger library service platforms, not stand-alone silos. Clifford Lynch said we may be better with incremental, structured assessments rather than open-ended preservation commitments. We should aim at preserving digital objects "for the next 20 years with subsequent assessments instead of aiming to preserve them forever." Repositories should look at collecting works from scholars at the end of their careers and create a legacy. Repositories will move beyond educational organizations, so we should look at being involved in that.

Wednesday, September 09, 2009

Digital Preservation Matters - 09 September 2009

Harvard's Web Archive Collection Service (WAX). Website. September 2009.

This site began as a pilot project to address the management of web sites by collection managers for long-term archiving. It is designed to capture, manage, store and display web sites in an archive. “With blogs supplanting diaries, e-mail supplanting traditional correspondence, and HTML materials supplanting many forms of print collateral, collection managers have grown increasingly concerned about potential gaps in the documentation of our cultural heritage.” It is built using open source tools such as the Heritrix web crawler; the Wayback index and rendering tool; and the NutchWAX index and search tool. Documents concerning WAX are available at:

Glossary of Web Archiving Terms. Molly Bragg. Internet Archive website. August 06, 2009.

Part of the Archive-It help section. This page has a glossary of web archiving terms. The rest of the wiki has some good information about archiving web sites.

Missing links: the enduring web. Web Archiving Consortium Workshop. 21 July 2009.

Web pages are at risk. The early web pages are of similar historical importance with prehistoric writings, and both are at risk. “Key issues for long-term access and preservation remain unresolved.” This site includes the presentations from the workshop. Some of these include:

  • Web Archive and Citation Repository in One: DACHS
  • The future of researching the past of the Internet
  • Web Archiving Tools: An Overview
  • Context and content: Delivering Coordinated UK Web Archive to User Communities
  • Capture and Continuity: Broken links and the UK Central Government Web Presence
  • Diamonds in the Rough: Capturing and Preserving Online Content from Blogs
  • Beyond Harvest: Long Term Preservation of the UK Web Archive
  • From Web Page to Living Web Archive
  • Emulating access to the web
  • What we want with web-archives; will we win?

The following items are a few of the presentations at the Web Archiving Consortium Workshop:

Web Archive and Citation Repository in One: DACHS. Hanno Lecher.


  1. Capturing and archiving relevant resources as primary source for later research
  2. Providing citation repository for authors and publishers

When citing online resources:

  • Verify URL references
  • Evaluate reliability of online resources
  • Use PURLs
  • Tools include: Snagit, Zotero, WebCite, DACHS Citation Repository

“the best current solution to improve access to Internet references is for publishers to require capture and submission of all Internet information at the time of manuscript consideration“

The future of researching the past of the Internet. Eric T. Meyer.

May not want to capture an entire web site, so you may consider the sub links. ‘Seed’ is a site from which other sites can be discovered through the links. Look at annotating the web sites; moving from snapshots of a site to more continuous data capture; how to share the results in a meaningful way.

Web Archiving Tools: An Overview. Helen Hockx-Yu.

  1. Selection: have a policy, decide what to capture.
  2. Collect data files (snapshot), examine for other sites to be collected, add to collection list
  3. Store the archived files on disk, virus check, integrity check
  4. Make accessible, index, add metadata, render the files, ensure long term access

Heretrix is the most commonly used tool, also Web Curator Tool.

WebARCive (WARC) format is coming into use. Other tools needed:

  • Rendering, such as Open Source WaybackMachine;
  • Full-text search, such as Nutch/Nutchwaxby or Hanzo tools
  • Provide other search/retrieval options (subject, collection, site name, change over time)

No consensus on strategy, practices and specific tools. Crawlers work with HTML, but not advanced designs or tools. Need to handle problem sites. Decide what to duplicate.

Context and content: Delivering Coordinated UK Web Archive to User Communities. Cathy Smith.

The presentation starts with two questions:

  1. What audiences should web archives anticipate and what does this mean for selection, ingest and preservation?
  2. What will the web be like as an historical source, and what use will be made of archived web sites by future generations?


  1. Institutions continue to provide access to their individual collections, where appropriate, to support researchers; assure integrity of collections; allow integration with the institution’s other, non-web holdings;
  2. Coordinate with other institutions by sharing collection development policies; defining the metadata standard, and developing technical interfaces.

Diamonds in the Rough: Capturing and Preserving Online Content from Blogs. Richard Davis.

“New genres of publications are becoming increasingly important to participants. For example, blogs are cited as a good window into what expert practitioners are doing. This material is not duplicated in traditional sources, yet it is important to consult”. Perceived barriers of web archiving are the cost of implementation and the complexity of available tools. Institutional blog archives are part of the institutional record. They should go through a selection process; support authenticity and fixity; be persistent and citable. Blogs seem to be an area where the content is of primary importance and design is secondary. Create an institutional or thematic archive by using a WordPress database to gather and store the posts and comments and provide access.

Chinese HD DVD Successor Outsells Blu-Ray Discs in China.

Chinese HD DVD Successor Outsells Blu-Ray Discs in China. Anton Shilov. X-bit labs. July 27, 2009.
A Chinese HD DVD standard (CBHD) is being used in China more than Blu-ray. Optical disc manufacturers, who produce Blu-ray, are not planning to support it. They see little support for this standard outside China.

Wednesday, September 02, 2009

Digital Preservation Matters - 02 September 2009

Archival Masters - An RUcore Case Study. Ron Jantz, Isaiah Beard. Duraspace Case Studies. September 2009.

This case study is a summary of practices that Rutgers University Libraries has used with their Fedora system in the treatment of archival masters which have been developed over a period of years. They are recognized as compromises between preservation theory and practice. This will be valuable for others dealing with similar problems. The case study looks at topics such as policies, critical technologies, persistent IDs, normalizing archival masters, using checksums, documenting architectures, generating presentation files, content models, file formats, and others. Video files have been their greatest challenge.

A Data Deluge Swamps Science Historians. Robert Lee Hotz. The Wall Street Journal. August 28, 2009.

The first curator of e-Manuscripts in the British Library struggles with archiving the flood of computer materials. “Never have so many people generated so much digital data or been able to lose so much of it so quickly.” More technical data has been collected in the past year than all previous years combined. “The problem is forcing historians to become scientists, and scientists to become archivists and curators.” People are overwhelmed with all the data. “What you keep and how you pay for it are difficult issues.”

Time to clean up your digital closet. Chris O'Brien. Mercury News. August 3, 2009.

What will happen to data you have stored on devices that become outdated? People don’t really think about it. There isn’t an easy solution, and may never be one due to the dynamic nature of computers. There are some strategies you can put in place. “You will need to start thinking like a librarian and become an active curator of your files. That means relentlessly organizing, labeling and tagging, backing up and deleting.” Keep only the essential data. Develop a system for organizing files online and offline and remember where they are. Label every file and tag them with as much information as you can. Make multiple copies. Investigate ways to keep track of all this and update it regularly.

Think Tank: Google must let us forget. James Harkin. The Sunday Times. August 9, 2009.

With all the data that is now being stored online, there needs to be a way to purge unwanted information. Some companies are gathering information about people from public sites and storing it in a single database. Some data about individuals may be posted by other people. Some say we are creating a “digital memory that vastly exceeds the capacity of our collective human mind”, that there needs to be a way of forgetting the unimportant elements. One way suggested is to put an expiry date on data, then to remove the information on that date.

This article will self-destruct: A tool to make online personal data vanish. Hannah Hickey. University of Washington website. July 21, 2009.

Computers have made it difficult for data to be left behind, but the University of Washington has developed a way to make data expire with a system called Vanish. Vanish, a free, open-source tool that works with Firefox, can place a time limit on text uploaded to any Web service through a Web browser. “After a set time period, electronic communications such as e-mail, Facebook posts and chat messages would automatically self-destruct, becoming irretrievable from all Web sites, inboxes, outboxes, backup sites and home computers. Not even the sender could retrieve them.” It is intended to make information as private as a “phone conversation”.

The Norwegian National Digital Library. Marianne Takle. Ariadne. July 2009.

The National Library of Norway is establishing itself as a digital national library. It plans to digitize its entire collection and has added other practices and strategies. Resources have been redistributed to give priority to digitization, documents are being deposited in digital format, and agreements are in place for digital deposits. It is making collections available to users over the Internet. The three Guiding Principles of Selection for the library are:

  1. A strategy and priority for different collections: books: (oldest information); newspapers (those in demand); photos (donations); music (endangered sound formats).
  2. The thematic selection of material across all media types
  3. Follow up enquiries from other users and institutions and co-operate with them

The greatest obstacle to making information available is copyright.

Friday, August 28, 2009

Digital Preservation Matters - 27 August 2009

Encoded Archival Context – Corporate bodies, Persons, and Families. Society of American Archivists, Berlin State Library. August 21, 2009.

Archivists have expressed the need for a standard structure to record and exchange information about the creators of archival materials. Draft information and a schema (EAC-CPF) are available at this site and feedback is welcome.

10 Ways to Archive Your Tweets. Sarah Perez. ReadWriteWeb. August 11, 2009.

Tweets have an expiration date on them and become unsearchable after a week and a half, though that may be reduced as more content is added. Several options for saving these are explored.

Startup crafts DVD-Rs for the 31st century. Rik Myslewski. The Register. 23 July 2009.

The Millenniata company has developed a new DVD-R technology that it claims will be readable for 1,000 years. The Millennial Disc Series is designed to eliminate the need for governments, financial institutions, libraries, and others to regularly refresh and rotate their digital-data collections. The data is etched into a "carbon layer with the hardness of a diamond". It requires a specialized writer and discs [but readable on any DVD player]. The discs are stable from minus 100° to plus 200° centigrade, and are dunked in liquid nitrogen as part of the testing. These discs are one element of a data preservation strategy.

“Why you never should leave it to the University”. JISC-PoWR website. Blog Post by Brian Kelly on August 19th, 2009.

Discussion of an article about a person who lost his academic website after the School of Business had redesigned their web site. With the changes, the person lost about “ten years worth of virtually daily updates were gone That included most of the manuscripts for my published work. The same thing happened to lecture notes, powerpoint slides, course documentations, useful links, etc. It had all disappeared from the Web!”. The issues need to be discussed, and in the current climate that must include the costs: “disk storage may be cheap but management of content is not”. The JISC archive has a number of other interesting posts about preservation of blogs, websites, wikis; and preservation policies.

Digital Preservation in the Wild. Tim Donohue. Slide show. July 21, 2009.

Thirty slides about digital preservation. Some notes from it:

  • It is not about the technology.
  • You don’t have to preserve everything to the fullest extent if you say you aren’t.
  • Say what you do, do what you say.
  • We acknowledge our gaps

Labeling Library Archives Is a Game at Dartmouth College. Marc Beja. The Chronicle of Higher Education. August 25, 2009.

A digital-humanities professor is creating an Internet-based game where users create descriptive tags for library images to improve searching. Adding keywords can be costly. This could be a way for the library to generate metadata. Users points could gain points as they compete to label images that match the keywords of other players. It is being funded by NEH and should be available next summer. [Some image sites already have similar functionality.]

‘Digital-Only’ Confusion in Scholarly Publishing: American Chemical Society. Barbara Quint. Information Today. July 23, 2009.

What happens if (or when) scholarly publishing reaches the tipping point of going "digital-only"? Publishers have been creating digital versions for some time, but some are now moving to only digital versions. The American Chemical Society journals are all available electronically but none are going only digital. “Studies have shown that more and more users now prefer the digital mode.” They will be publishing two titles next year that not in print form. They will continue to monitor the situation, but for “today, and throughout 2010, online access and print subscriptions both remain options.”

Thursday, August 27, 2009

Reinventing academic publishing online. Part I: Rigor, relevance and practice.

Brian Whitworth, Rob Friedman. First Monday. 3 August 2009.

“The current gate–keeping model of academic publishing is performing poorly as knowledge expands and interacts, and that academic publishing must reinvent itself to be inclusive and democratic rather than exclusive and plutocratic.” Many of the applications that have become popular today are not media rich, but simple and text based, such as blogs, wikis, etc. Timeliness is important; materials out of date may not be useful. Innovators are the agents of change. “A system that rejects its own agents of change rejects its own progress.”

Why Academic Libraries Matter.

Barbara Fister. Peer to Peer Review; Library Journal. August 13, 2009.
Values the library can provide:
  1. A total experience that works well
  2. Provide meaning, depending on the user's goal
  3. Create relationships with the users and the institution
Find out how the library can fit in the life of the user. The library is important as a broker or purchaser of information. Library principles go beyond collections and local needs; it is about access, about the importance of knowledge and about what we do with information.

Monday, August 24, 2009

Introduction to ANDS

Introduction to ANDS. Ron Sandland. July 2009.
Share - the newsletter of the Australian National Data Service.
We need to find new ways to capture and share data. To do this we need to create accessible repositories and make it possible to access their holdings. Important issues are access control, storage solutions, training, and guidelines on best practices. They are launching online services.
The key challenge is to build an environment where researchers can store, share, and find data, as they do with publications.

Thursday, August 20, 2009

Digital Preservation Matters - 20 August 2009

The Next-Generation Architecture for Format-Aware Characterization: About JHOVE2. Website. August 18, 2009.

Because of limitations in the original JOVE program, NDIIPP, the California Digital Library, Portico, and Stanford University are collaborating on a new project. An alpha prototype is available for download. The project looks at identification, validation, feature extraction, and policy-based assessment for simple digital file and potentially complex digital object that may be in multiple files.The Digital Continuity Action Plan. Website. Archives New Zealand. 10 August 2009.

A unique inclusive and unified initiative in New Zealand to prevent important public records from being lost and to ensure information will be available tomorrow. A brochure gives an overview of their plan. It includes a note that “Sixty-seven percent of New Zealand public sector agencies hold some information that they can no longer access.” The full plan is set out in a 48p. pdf. The plan is to make the information available and authentic / trusted. If no action is taken, digital information will be lost. A proactive approach is needed to maintain digital information for the future. “Failure to implement digital continuity strategies will result in irretrievable loss of information.” Six goals (explained in detail in the longer document) are:

  1. Understanding: Communicate effectively and have a common understanding of the problem.
  2. Digital information is well-managed from the point of creation onwards.
  3. Infrastructure exists to support the interoperability of systems and efficient digital continuity.
  4. High-value information is identified, so critical information is not lost.
  5. Digital information is accessible now and in the future, and protected from unauthorized use.
  6. Information management is characterized by good governance, leadership and accountability.

Sony to back open e-book format. BBC News. 14 August 2009.

Sony has announced it will use the ePub open format reader instead of its proprietary standard. This will allow Sony the option of making its e-book store compatible with other readers.

Long Term Digital Preservation of Web Sites. Mikael Tylmad. Thesis. Royal Institute of Technology for the Swedish National Archive. May 31, 2009. [38p. PDF]

Websites have become a standard way for organizations to present information to the public. There are a number of archival concerns in keeping this information long term. Few web pages are written in standard HTML anymore; they use a number of different technologies, such as Flash, and many formats. “The fewer file types the better and if they are human readable it is

even better.” This requires archivists to keep the software as well as the entire website. Besides the textual and graphical parts of a web page, the relationship of the parts and how they are presented are important (content and context). Archived sites lose interactivity, become static. Links in Flash etc can be hidden from crawlers and important parts will be lost. Heritrix, used by Internet Archive, is a powerful solution to web archiving. Emulation through virtualization is another powerful solution. Another solution is SWAT (Snappy Web Archiving Tool). The tool, written in Ruby, is available at: It does the following:

  1. Harvests all files from the website and analyzes for future compatibility with DROID.
  2. Screenshots of all web pages are created as tiffs to show the page design
  3. Creates in XML metadata about files, links, etc (METS standard)
  4. The web archive with documentation are put in a tar package with an ADDML description.

Amazon Erases Orwell Books From Kindle. Brad Stone. The New York Times. July 17, 2009.

Amazon remotely deleted some digital editions of the books from the Kindle devices of readers who had bought them. And they appear to have deleted other purchased e-books from Kindles recently.

Chrysler Destroys Its Historical Archives; GM to Follow? Bob Elton. The Truth About Cars. July 26, 2009.

Archives are the foundation of historical research. Without access to primary material (documents, photographs, financial statements, engineering, test reports, etc) historians lack the sources needed to understand the past. Some automakers have worked to preserve and protect their historical documents. However Chrysler and GM have recently closed their library, the librarian laid off. All materials were “offered to anyone who could carry them away.” Many of the GM divisions no longer know the location of their historical documents, how they are organized or how researchers can gain access.

Digital Archives That Disappear. Inside Higher Ed. April 22, 2009

As digital archives have become more important and more popular, there are different opinions about how best to guarantee that they will be available long term. Some think the creators of the archives should keep control, while others believe larger organizations with more resources would be better. The article looks at the example of "Paper of Record," a digital archive of early newspapers with a strong collection of Mexican newspapers. The archive was purchased secretly by Google in 2006; shortly thereafter, the archive disappeared from view. Historians and others complained to Google about the loss of their ability to work. It appears from other sources that the articles are now partially available in the Google news reader.

Thursday, August 13, 2009

FW: Digital Preservation Matters - 13 August 2009

File Information Tool Set (FITS). August 6, 2009.

With the increase of digital projects that introduce new formats, it is increasingly important to have tools that deal with issues such as file format identification, validation and metadata extraction tools. FITS, developed by Harvard, acts as a wrapper for some existing tools, including JHOVE, Exiftool, the National Library of New Zealand Metadata Extractor, DROID, Ffident, and two original tools: FileInfo and XmlMetadata. The files can identify a file with a single result, or in the case of a conflict, can handle it in several ways. It is written in java and can be run from a command line or an interface. It is available for download and has a user guide.

Research Data Preservation and Access: The Views of Researchers. Neil Beagrie, et al. Ariadne. July 2009.

Data is becoming more central to interdisciplinary projects and has grown in size and complexity. This study tries to assess the feasibility and costs of developing and maintaining a shared digital research data service. It shows, with text and graphs, the disciplines where research data issues were of greatest concern, the storage features that are needed most, the retention period for data once the projects have ended, and how the data is shared. University managers have serious concerns about the cost, scalability and sustainability of purely local solutions.

Library of Congress Digital Preservation Newsletter. August 2009.

LC has developed new tools (including bagit) to transfer large quantities of digital content. BagIt, and related transfer tools, prepare to transfer data by packaging the collection in a directory with a manifest file that lists the contents. Specifications and other tools are on the tool and services page. More on this: 21st Century Shipping. D-Lib Magazine. Michael Ashenfelder. July/August 2009

The California Digital Library has opened its Web Archiving Service collections. The service was created to support the Web-at-Risk project, and is funded by the NDIIPP and the University of California.

A workshop on photometadata aimed at helping digital photographers use metadata when creating and distributing their work. The program demonstrated applications to embed metadata in photographs; it was stated that each digital photo can and should contain information about itself, its creator and its licensing conditions. Industry professionals told how metadata increased their business.

Online textbooks are gaining popularity, changing how students study. Dani Martinson. Missourian. August 6, 2009.

Online textbooks can provide additional information and resources for students, including direct links to audio and video. Digital textbooks are usually 50%cheaper than regular textbooks, though there is no buyback, and the books are often available only for a semester. Information can be updated easier and more frequently. A study found that the professors were more accepting of digital textbooks than students. They expect the demand will increase when the digital content is specifically designed for digital, rather than just a PDF version of the printed textbooks.

Wednesday, August 12, 2009

Elsevier Announces the “Article of the Future”

Elsevier Announces the “Article of the Future”. July 21, 2009.

Elsevier announced the ‘Article of the Future’ project, an ongoing collaboration with the scientific community to redefine how a scientific article is presented online. The project allows readers individualized entry points and routes through content, while exploiting the latest advances in visualization techniques. The prototype will be launched this week. The key feature is a hierarchical presentation of text and figures. A second key feature is the article with highlights and a graphical abstract.

In a Digital Future, Textbooks Are History

In a Digital Future, Textbooks Are History. Tamar Lewin. New York Times. August 8, 2009.

Textbooks have not gone the way of the scroll yet, but many educators say that it will not be long before they are replaced by digital versions — or supplanted altogether by lessons assembled from the wealth of free courseware, educational games, videos and projects on the Web. “In five years, I think the majority of students will be using digital textbooks. They can be better than traditional textbooks.”

“We believe that the world is going digital, but the jury’s still out on how this will evolve. We’re agnostic, so we’ll provide digital, we’ll provide print, and we’ll see what our customers want.”

CK-12 Foundation develops free “flexbooks” that can be customized to meet state standards, and added to by teachers. Its physics flexbook, a Web-based, open-content compilation, was introduced in Virginia in March.

“You can use them online, you can download them onto a disk, you can print them, you can customize them, you can embed video. When people get over the mind-set issue, they’ll see that there’s no reason to pay $100 a pop for a textbook, when you can have the content you want free.”

Create Data Destruction Policies

How and Why to Create Data Destruction Policies. Mark Grossman and Tate Stickles. Computerworld. 23 June 2009.
This column looks at creating an effective data destruction policy. Having a consistent data destruction policy followed by everyone at all times is vital. Consistency is key. Your data destruction policy needs to address how to classify and handle each type of data residing on your media. Educate your people and verify they are complying with your policy.

Document the entire data destruction policy so you will know what media is sanitized and destroyed. Your documentation should allow you to quickly answer those who, what, where, when, why, and how questions.

An important step of an effective data destruction policy is to have a process in place so you can follow up with regularly scheduled testing of your process and media to ensure the effectiveness of your policy.

Tuesday, August 11, 2009

Online textbooks gaining popularity, changing how students study

Online textbooks are gaining popularity, changing how students study. Dani Martinson. Missourian. August 6, 2009.
Online textbooks can provide additional information and resources for students, including direct links to audio and video. Digital textbooks are usually 50%cheaper than regular textbooks, though there is no buyback, and the books are often available only for a semester. Information can be updated easier and more frequently. A study found that the professors were more accepting of digital textbooks than students. They expect the demand will increase when the digital content is specifically designed for digital, rather than just a PDF version of the printed textbooks.

Measuring Mass Text Digitization Quality and Usefulness

Measuring Mass Text Digitization Quality and Usefulness. Simon Tanner. D-Lib Magazine. July/August 2009.
This article discusses the accuracy of Optical Character Recognition (OCR) output in a way that is relevant to the needs of the end users of digital resources. It looks at the benefits to be gained from measuring not just character accuracy but also word and significant word accuracy.

Tuesday, August 04, 2009

Digital Preservation Matters - 4 August 2009

OpenWMS: Workflow Management System for Digital Objects. Rutgers. August 2009.

Rutgers has released their OpenWMS software for creating metadata for analog and digital materials. It is platform-independent and open source. The web-accessible system that can be used as a standalone application or integrated with other repository architectures. It provides a complete metadata creation system with services to ingest objects and metadata into a Fedora repository and can export these objects and metadata, individually and in bulk in a METS/XML Wrapper.

RODA Open Source Repository for Archives. July 2009.

RODA is an open source digital repository specially designed for Archives, with long-term preservation and authenticity as its primary objectives. Created by the Portuguese Directorate-General for the Portuguese Archives and the University of Minho, it was designed to support the most recent archival standards and become a trustworthy digital repository. Try an online demo at

To download the full installation package or sources go to:

To register and participate in discussion forums and report issues

Digital Preservation Survey. Fedora Commons. July 2009.

The Fedora Preservation Solutions Community survey was created to gather information about and examples of digital preservation developments, practices, and needs, regarding the management of digital content in repositories, specifically using open source software like Fedora. The results were not specific to Fedora users. The results of the survey were shared at the Open Repositories Conference (May 18 – 21, 2009) in Atlanta, GA. The survey and the results are available from this site. Some items of interest in the results:

  • 55.7% who answered are currently archiving and preserving digital materials
  • 36.9% are planning to preserve digital materials
  • 71.6% are using an open source platform for their digital archive
  • 45.0% use Fedora, 43.3% use DSpace

The Preservation and Archiving Solution Community posted these above three items on the Fedora Commons website. The wiki and listserv are open to all who are interested in archiving and digital preservation.

The Research Library’s Role in Digital Repository Services. A report of the ARL Digital Repository Issues Task Force. Association of Research Libraries. January 2009. [52p. pdf]

Digital repositories are a key element of research cyber infrastructure. The repository services are built on a foundation of content, context, and access, which need to be balanced. They are still developing. Digital is not just a new way to collect and distribute, but it has brought new kinds of content and services. Institutions produce large and ever-growing quantities of data, images, multimedia works, learning objects, and digital records. The repositories should not be managed as isolated collections. They are about the users as much as the content, and services need to be developed to meet the needs. “As research libraries embark on repository service development, they enter a brand new business in many ways.”

Sustainability is about the institutional commitment and the ability to create persistent structures. Libraries have a key role in the new informational structures. Libraries should look at these areas:

  1. Understand needs of users and creators in order to develop repository-related services.
  2. Use a life-cycle management framework to guide services and policies.
  3. Express the value of repository services to justify resources, promote partnerships, efforts.
  4. Integrate collections into emerging services that are outside of library-managed repositories.
  5. Participate in shaping the technology of repositories and service mechanisms.

Important actions for libraries include:

  1. Build new kinds of partnerships and alliances, within and between institutions.
  2. Develop service strategies based on assessment of local needs
  3. Develop outreach and marketing strategies to connect others to the library environment
  4. Define responsibilities to guide the development of repository services for different types of content.

Books Online:

Amazon deal to reprint rare books. BBC News. 22 July 2009.

Amazon is working with the University of Michigan to provide reprints of 400,000 rare, out-of-print and out-of-copyright books. The books will be printed in soft cover editions. Items that have been out of print for years will “be able to go back into print, one copy at a time”.

Harvard U. Press to Sell 1,000 Books Online. Marc Beja. The Chronicle of Higher Education. July 22, 2009.

Harvard University Press created a profile with Scribd, and the press has already posted hundreds of works for download. They are charging for the materials. Others, such as New York University and MIT have also posted items on the website, but do not charge.