Sunday, October 23, 2011

OAIS / TDR presentation at FDLP.

OAIS / TDR presentation at FDLP.  James A. Jacobs. Federal Depository Library Conference. Free Government Information. October 2011. [PDF]
A presentation giving an introduction to the "Reference Model for an Open Archival Information System" (OAIS) and the "Audit And Certification Of Trustworthy Digital Repositories" (TDR).  This includes slides with speaker notes and a nice handout about related information with links. Every library decision should assess the impact of digital issues.  Notes:

OAIS
1. It defines the functional concepts of a long-term archive with consistent, unambiguous terminology.
2. It gives us a functional framework for designing archives, and a functional model.
3. It gives us a standard for “conformance.”
4. It is a “Reference Model” that describes functions; it is not an “implementation”
5. Some key OAIS concepts are:
   - Designated Community: An identified group of potential Consumers who should
      be able to understand a particular set of information.
   - Description of roles and functions in the information life cycle.
   - The Long Term: Long enough for there to be concern about changing technologies,
      new media and data formats, and a changing user community.
   - Preserved content must be usable according to the designated community

TDR
Documents what is being done and how well it is being done.
Provides 109 “metrics” for measuring conformance to OAIS in three areas: 
1. Organizational Infrastructure
2. Sustainability and succession plan
3. Digital Object Management
4. Technical Infrastructure And Security Risk Management

Saturday, October 22, 2011

Cite Datasets and Link to Publications

Cite Datasets and Link to Publications. Digital Curation Centre. 18 October 2011.
The DCC has published a guide to help authors / researchers create links between their academic publications and the underlying datasets.  It is important for those reading the publication to be able to locate the dataset.  This recognizes that data generated during research are just as valuable to the ongoing academic discourse as papers and monographs, and in many cases the data needs to be shared. "Ultimately, bibliographic links between datasets and papers are a necessary step if the culture of the scientific and research community as a whole is to shift towards data sharing, increasing the rapidity and transparency with which science advances."

This guide has identified a set of requirements for dataset citations and any services set up to support them. Citations must be able to uniquely identify the object cited, identify the whole dataset and subsets as well.  The citation must be able to be used by people and software tools alike.  There are a number of elements needed, but the "most important of these elements – the ones that should be present in any citation – are the author, the title and date, and the location. These give due credit, allow the reader to judge the relevance of the data, and permit access the data, respectively."  A persistent url is needed, and there are several types that can be used. 

Audit And Certification Of Trustworthy Digital Repositories.

The Management Council of the Consultative Committee for Space Data Systems (CCSDS) has published this manual of recommended practices. It is based on the 2003 version from RLG. “The purpose of this document is to define a CCSDS Recommended Practice on which to base an audit and certification process for assessing the trustworthiness of digital repositories. The scope of application of this document is the entire range of digital repositories.”

The document addresses audit and certification criteria, organizational infrastructure, digital object management, and risk management.  It is a standard for those who audit repositories; and, for those who are responsible for the repositories, it is an objective tool they can use to evaluate the trustworthiness of the repository.

Thursday, October 20, 2011

National Archives Digitization Tools Now on GitHub

National Archives Digitization Tools Now on GitHub. NARAtions. October 18, 2011.
The National Archives has begun to share applications developed in-house to facilitate digitization workflows. These applications have significantly increased the productivity and improved the accuracy and completeness of the digitization.Two digitization applications, “File Analyzer and Metadata Harvester” and “Video Frame Analyzer” are publicly available on GitHub.
  • File Analyzer and Metadata Harvester: This allows a user to analyze the contents of a file system or external drive and generate statistics about the contents, generate checksums, and verify that there is a one-to-one match of before and after files. The File Analyzer can import data in a spreadsheet, and can match and merge results with auxiliary data from an external spreadsheet or finding aid.
  • Video Frame Analyzer: This is used to objectively analyze technical properties of individual frames of a video file in order to detect quality issues within digitized video files.  It reduced the time to do quality checks by 50%. 

Monday, October 17, 2011

Research Librarians Consider the Risks and Rewards of Collaboration.

Research Librarians Consider the Risks and Rewards of Collaboration. Jennifer Howard. The Chronicle of Higher Education. October 16, 2011.

Association of Research Libraries’ meeting discussed research and preservation projects like the HathiTrust digital repository and the proposed Digital Public Library of America, plans for which are moving ahead. Concerning the Digital Public Library of America: “Library” is a misnomer in this case, which is more of a federation of existing objects. It wouldn’t own anything. The main contribution would be to set standards and link resources.  “The user has to drive this.”

They said that it’s almost three times more expensive to store materials locally than it is to store them with HathiTrust. Researchers now also create and share digital resources themselves via social-publishing sites such as Scribd. There is a need for collection-level tools that allow scholars and curators to see beyond catalog records.

Discussed Recollection, a free platform built by NDIIPP and a company named Zepheira to give a better “collection-level view” of libraries’ holdings. The platform can be used to build interactive maps, timelines, and other interfaces from descriptive metadata and other information in library catalogs. So, for instance, plain-text place names on a spreadsheet can be turned into points of latitude and longitude and plotted on a map.

“Rebalancing the Investment in Collections,” discussed that libraries had painted themselves into a corner by focusing too much on their collection budgets. Investing in the right skills and partnerships is most critical now. “The comprehensive and well-crafted collection is no longer an end in itself.”

On person told librarians that they shouldn’t rush to be the first to digitize everything and invest in every new technology. “Everybody underestimates the cost of innovation,” he said. “Instead of rushing in and participating in a game where you don’t have the muscle, you want to stand back” and wait for the right moment.

Digital Preservation Matters.

Digital Preservation-Friendly File Formats for Scanned Images.

Digital Preservation-Friendly File Formats for Scanned Images.  Bill LeFurgy. The Signal. October 12, 2011.
Some digital file formats are better for preservation than others.  The best format for preservation is one where the content can be viewable accurately regardless of changes in hardware, software or other technical changes. The Library of Congress has created a web resource to help in selecting file formats, and which will help in understanding how effective formats for long-term preservation.
  • Disclosure of specifications and tools for validating technical integrity
  • Adoption by the primary creators and users of information resources
  • Openness to direct basic and non-propriety tools
  • Self-documentation of metadata needed to render the data as usable information or understand its context
  • Degree to which the format depends on specific hardware, operating system, or software for rendering the information and how difficult that may be.
  • Extent that licenses or patents may inhibit the ability to sustain content.
  • Technical protection mechanisms. Embedded capabilities to restrict use in order to protect the intellectual property.
 Using these factors has helped determine formats that may be more sustainable than others. 

To Save and Project Fest: Long Live Cinema!

To Save and Project Fest: Long Live Cinema!  J. Hoberman. The Village Voice. October 12, 2011.
Digital might be the future of the motion-picture medium, but for film preservation, it’s a mixed blessing. Archivists make it clear that digital technology is part of the solution—and part of the problem. Digital cinema is itself difficult to preserve, subtly distorts (by “improving”) the celluloid image, even as it often dictates (through commercial considerations) those movies deemed worthy of preservation. New York Times DVD critic Dave Kehr has pointed out that instead of increasing access, each new distribution platform (from 35mm to 16mm, VHS, DVD, Blu-ray, and online streaming) has narrowed the range of titles in active distribution and diminished the proportion of available films. Film restoration is also the restoration of cultural memory.

Sunday, October 16, 2011

Tim O'Reilly - Keynote for 2011 NDIIPP/NDSA Partners Meeting.

Tim O'Reilly - Keynote for 2011 NDIIPP/NDSA Partners Meeting. Tim O'Reilly. Library of Congress website.  October 7, 2011.
This is a 31 minute video talking about digital preservation. The things that turn out to be historic often are not thought of being historic at the time. You can’t necessarily do preservation from the institution level.  You have to teach the preservation mindset. Like Wikipedia; it is designed to keep all earlier versions. We should think about what kind of tools we need to build digital preservation into our everyday activities.

There will be a whole new dimension to digital. Imagine what will happen in situations if only digital books and maps are available and then they become unavailable.  That world may be closer than we think. Imagine a world if there are no print books. What would you need to keep the digital materials available?  It turns out that digital actually increases the manufacturing cost of books.   We need to have tools with digital preservation designed in, not necessarily in the way we think of scholarly preservation, but in terms of increasing the likelihood that things will survive. 

What should the web’s memory look like? There is an obligation to preserve the things that matter. We are engaged in the wholesale destruction of our history because we aren’t thinking about what we do as important to our descendants. Think of yourselves as people who are engaged in a task that matters to everyone.  As we move into an increasingly digital world, preservation won’t be just the concern of specialists, but of everyone. One of the arguments for open source is simply to preserve the code.  There have been a number of examples of technical companies not having their source code after they stop supporting it. Preserving everything may get in the way of our preserving the things that are important.

Thursday, October 13, 2011

Innovation, Disruption and the Democratization of Digital Preservation

Innovation, Disruption and the Democratization of Digital Preservation. Bill LeFurgy. Agogified. October 10, 2011.
Interesting article about innovation and society.  It asks the question about digital preservation: Is innovation the key to dealing with all that valuable digital data? "When considered from the popular perspective of innovation, digital preservation looks like a straightforward challenge for libraries, archives, museums and other entities that long have kept information on behalf of society." But it isn't that easy, since technology changes much faster than society's conventions and institutions. "Innovation is not a safe, orderly or controllable process.  It sends out big ripples of disruption with an unpredictable impact." Libraries are being bounced around because of such disruption and the traditional methods are not suited to address the changes.  "All this means that the ability of traditional institutions to fully meet the need for digital preservation is in doubt."
But with these changes comes a change in the people playing a role in preserving digital materials. Some see a greater role for individuals in digital preservation.  There is a great need for designing preservation functionality into tools used to create and distribute digital content to enable content creators to be involved in the digital stewardship. "Ultimately, we have to hope that innovation pushes along the trend toward the democratization of digital preservation.  The more people who care about saving digital content, and the easier it is for them to save it, the more likely it is that bits will be preserved and kept available."

Tuesday, October 11, 2011

Abingdon firm gets Queen's seal of approval

Abingdon firm gets Queen's seal of approval. Oxford Journal.  22 September 2011.
Tessella has been awarded one of the UK’s most prestigious business awards for their collaboration with a public sector organization in developing a unique system for preserving digital information.  Their product, Safety Deposit box which came out in 2003, is now used by governments in seven countries.

Sunday, October 09, 2011

Piggybacking to Avoid Going Down the Rabbit Hole, or What I Learned at the First DPOE Workshop.

Piggybacking to Avoid Going Down the Rabbit Hole, or What I Learned at the First DPOE Workshop. Sam Meister. The Signal. October 7, 2011.
This is an excellent post about the Digital Preservation Outreach and Education (DPOE) initiative’s first ever Train the Trainer Baseline Workshop and the experience gained there.  The workshop went over the core principles and concepts (Audience, Content, Instructors, Events) and the six modules that make up the curriculum (Identify, Select, Store, Protect, Manage, Provide). The group of 24 trainers broke into six regional groups to work through the modules and develop specific strategies to present the material of individual modules. "As the first group of trainers to review, analyze, revise and disseminate this curriculum, the result of a multi-year development process, we would be the “pioneers” for the DPOE program. To me, this made clear the role and level of responsibility that would be expected of us throughout the rest of the workshop and beyond."

Friday, October 07, 2011

New Guidelines: CrossRef DOIs to be Displayed as URLS.

New Guidelines: CrossRef DOIs to be Displayed as URLS. Carol Anne Meyer. D-Lib Magazine. September/October 2011.
CrossRef, a not-for-profit association of more than 1000 scholarly publishers, revised its recommendations for CrossRef Digital Object Identifiers (DOIs) to specifies that DOIs on the web use a URL format. The previous practice of putting "doi:" in the ID is discontinued, and also that publishers create DOIs that are as short as possible.

From Link Rot to Web Sanctuary: Creating the Digital Educational Resource Archive (DERA).

From Link Rot to Web Sanctuary: Creating the Digital Educational Resource Archive (DERA). Bernard M. Scaife. Ariadne. July 2011.
One of the tasks was to fix the broken links in the catalogue. A report showed that of about 16,000 links to external resources, about 1,200 were non-functional (7.5%).  There were ways to fix many of these, but about 10% of the links referred to documents which no longer existed.  Many of these were government publications. The question was how to do this differently. They looked at adding materials into their own repository, which would allow them to solve the link rot problem while "building in a core level of digital preservation and increasing the discoverability of these documents. We were convinced that a citation which linked to a record in a Web archive was far more likely to survive than one which did not."

They needed to clarify the intellectual property rights, add descriptive metadata, such as the type of document, a collection name, subjects, and the organization that created the document.  They also found that they "had to accept all common file formats at present. In practice, the majority are pdf, some MS Word and a few Excel files. It would, for preservation purposes, be preferable to convert and ingest in PDF/A format, at least for the textual formats. However our view was that the small overhead of batch migrating to that format at a later stage means it would be better to spend time upfront now on metadata rather than file conversion. We felt that this was a pragmatic response which meant that we would be working within the spirit of digital preservation best practice." Also, they found that "data-based formats such as Excel cannot be meaningfully integrated into a full-text search and that these objects would benefit from better representations." 

Other things they learned include    
  • Placing files in a repository gives digital preservation to key documents in the subject field and eradicates the link rot problem.
  • Adding high-quality metadata enhances the resource and allows it to hold its head high and become an integral part of a library's collection.
  • A library can play an important role in preserving content as part of its long-term strategy and ensure high-quality resources remain available.
  • The added value of being able to search the full text provides a potentially very rich resource for researchers.
Future plans are to build up content levels and to integrate these resources with the regular library content,

More (digital) wake-up calls for academic libraries

More (digital) wake-up calls for academic libraries. Rick Luce. LIBER 2011. Duurzame toegang blog. June 2011.
The topic was the core business of academic libraries: serving researchers and the scientific research process. There are many changes taking place in the sciences: "zetabytes of data; dynamic, complex data objects that require management; communities and data flows becoming much more important than static library collections, etc." The warning to academic libraries was that if libraries do not develop those services the new researcher needs, someone else will, and then there is no future for the research library. We need a "fundamental transformation process that will affect every aspect of the ‘library’ business."  The library needs to provide a repository between the scientific process and IT infrastructure that supports and preserves workflows.

Wednesday, October 05, 2011

Graduates To Sow Seeds of New Training Program Across U.S.

Graduates To Sow Seeds of New Training Program Across U.S.  Bill LeFurgy.  The Signal. October 3, 2011.
The inaugural class of  digital preservation training workshop was held at the Library of Congress on  September 20-23.  The 24 professionals were selected from a nationwide pool. The DPOE Workshop was a workshop model designed to produce a national corps of trainers equipped to teach others basic principles and practices of preserving digital materials. “What’s unique about this workshop,” said George Coulbourne, Executive Program Manager, “is that we designed it for people who are going to be actual practitioners of digital preservation. This is not for administrators or managers, but for the novice practitioner. It’s also intended to be as open-source and low-cost as possible.  We hope this event accelerates a new national movement in open, accessible digital-preservation training.”
Those who were trained will take back and present what they learned in their home regions, including by holding one or more training events in digital preservation by mid-2012.