Monday, April 30, 2012

University of Utah Selects Ex Libris Rosetta for Long-Term Digital Preservation

University of Utah Selects Ex Libris Rosetta for Long-Term Digital Preservation. Press Release. April 30, 2012.
Ex Libris is pleased to announce that the University of Utah has opted for Ex Libris Rosetta to preserve the school's extensive cultural heritage collections, which include newspapers and other historical textual documents, photographs, rare books, oral history interviews (including transcripts and audio), motion picture collections, and more. In addition to cultural heritage collections, Rosetta will enable the University of Utah to preserve faculty publications and research data. The J. Willard Marriott Library hosts the collections of many campus departments and, as a member of the Mountain West Digital Library network, hosts collections belonging to other Utah institutions.

Sunday, April 29, 2012

An Overview of Web Archiving.

An Overview of Web Archiving. Jinfang Niu. D-Lib Magazine. March/April 2012.
An article on the methods used at a variety of universities, and other institutions to select, acquire, describe and access web resources for their archives. Some notes from the article:
  • Web archiving is the process of gathering up data that has been recorded on the World Wide Web, storing it, ensuring the data is preserved in an archive, and making the collected data available for future research.
  • The workflow of web archiving includes appraisal and selection, acquisition, organization and storage, description and access. This workflow is the core of web archiving.
  • Creating a web archive presents many challenges,  
  • When archiving web content through web crawling programs, selection decisions are the basis for compiling a site list to crawl and configuring crawler parameters. Crawling may replace deposit for some things.
  • In acquiring web resources, the decision of whether to seek permission from copyright owners depends on the legal environment of the web archive, the scale of the web archive, and the nature of archived content and the archiving organization.
  •  Web archives need to preserve the authenticity and integrity of archived web content. The concept of provenance is important. 
  • The library must decide how it will generate, store and use metadata. Also, how it will make this available to others.

Web Archives for Researchers: Representations, Expectations and Potential Uses.

Web Archives for Researchers: Representations, Expectations and Potential Uses. Peter Stirling, et al. D-Lib Magazine. March/April 2012.
Web archiving is one of the missions of the Bibliothèque nationale de France. This study looks at content and selection policy, services and promotion, and the role of communities and cooperation.  While the interest of maintaining the "memory" of the web is obvious to the researchers, they are faced with the difficulty of defining, in what is a seemingly limitless space, meaningful collections of documents. Cultural heritage institutions such as national libraries are perceived as trusted third parties capable of creating rationally-constructed and well-documented collections, but such archives raise certain ethical and methodological questions.

To find source material on the web, some researchers look for non-traditional sources, such as blogs and social networks.  Researchers recognize the value of web archives, especially because websites disappear or change quickly.  The Internet is no longer just a place for publishing things, “but rather the traces left by actions that people could equally perform in the streets or in a shop: talking to people, walking, buying things... It can seem improper to some to archive anything relating to this kind of individual activity. On the other hand, one of the researchers acknowledges that archiving this material would provide a rich source for research in the future, and thus compares archiving it to archaeology.”  Some ask, "How do you archive the flow of time?" New models may be needed. And when selecting an archive, the selection criteria should also be archived, as they may change over time.

The secrets of Digitalkoot: Lessons learned crowdsourcing data entry to 50,000 people (for free).

The secrets of Digitalkoot: Lessons learned crowdsourcing data entry to 50,000 people (for free). Tommaso De Benetti.  Microtask. June 16, 2011.
National Library of Finland launched a project called Digitalkoot, which was a test of crowdsourcing with 50,000 volunteers.  The aim was to digitize the National Library’s archives and make them searchable over the internet. The volunteers input data that Optical Character Recognition (OCR) software struggles with (for example documents that are handwritten or printed in old fonts). Digitalkoot relies on machines, humans and a gaming twist.

Tuesday, April 03, 2012

Dream of perpetual access comes true!

Dream of perpetual access comes true! Jeffrey van der Hoeven. Open Planets Foundation. 
he KEEP project released its final version of the open source Emulation Framework software. This project has brought emulation in the digital preservation context to the next level, that is, user friendly.  The easy to install package runs on all major computer platforms.  It automates several steps:

  1. identify what kind of digital file you want to render;
  2. find the required software and computer platform you need;
  3. match the requirements with available software and emulators;
  4. install the emulator;
  5. configure the emulator and prepare software environment;
  6. inject the digital file you selected into the emulated environment;
  7. give you control over the emulated environment.
 The software supports six different computer platforms out of the box: x86, Commodore 64, Amiga, BBC Micro, Amstrad, Thomson, by using seven open source emulators which are distributed  with the Emulation Framework. 

With tech breakthrough, Seagate promises 60TB drives this decade

With tech breakthrough, Seagate promises 60TB drives this decade.  Lucas Mearian. ComputerworldMarch 20, 2012.
 Seagate said they have achieved a density of 1 terabit (1 trillion bits) per square inch on a disk drive platter. The technology would lead to the production this decade of 3.5-in. hard drives with up to 60TB of capacity.

As drive manufacturers store more bits per square inch on the surface of a disk, they also tighten the data tracks. The challenge as those tracks tighten is overcoming magnetic disruption between the bits of data, which causes bits to flip their magnetic poles resulting in data errors.

Tuesday, February 14, 2012

Brigham Young University Selects Ex Libris Rosetta

Brigham Young University Selects Ex Libris Rosetta. Press Release. February 14, 2012.
Ex Libris has announced that Brigham Young University (BYU) has selected the Ex Libris Rosetta digital preservation system. The first member of the Association of Research Libraries (ARL) to adopt Rosetta, Brigham Young will implement Rosetta initially for the Harold B. Lee Library’s digital collections and will later expand the implementation to the digital assets of the university’s colleges and schools.

Rosetta encompasses the entire digital preservation workflow, including the acquisition, validation, ingest, storage, preservation, and delivery of digital objects. Rosetta enables academic institutions as well as libraries, archives, and other memory institutions to manage, preserve, and provide access to institutional documents, research output in digital formats, digital images, Web sites, and other digitally born and digitized materials.


Thursday, December 08, 2011

Why don't we already have an Integrated Framework for the Publication and Preservation of all Data Products?

Why don't we already have an Integrated Framework for the Publication and Preservation of all Data Products?   Alberto Accomazzi,et al. Astronomical Data Analysis Software and Systems.  
7 Dec 2011.
Astronomy has long had a working network of archives supporting the curation of publications and data. There are examples of websites giving access to data sets, but they are sometimes short lived.  "We can only realistically take implicit promises of long-term data archival as what they are: well-intentioned plans which are contingent on a number of factors, some of which are out of our control." We should take steps to ensure that our system of archiving, sharing and linking resources is as resilient as it can be.  Some ideas are: 
  1. future-proof the naming system: assign persistent data IDs to items we want to preserve 
  2. provide the ability to cite complete datasets, just as we can cite websites
  3. include a data reference section in academic papers
Curated datasets need to be preserved indefinitely for scholarly purposes.

A literature review: What exactly should we preserve? How scholars address this question and where is the gap

A literature review: What exactly should we preserve? How scholars address this question and where is the gap.  Jyue Tyan Low. Cornell University Library. 7 Dec 2011.
There are generally two approaches to long-term preservation of digital materials
  1. preserving the object in its original form as much as possible along with the accompanying systems,
  2. migration or transformation: transforming the object to make it compatible with more current systems but retaining the original “look and feel.
Migration is the most widely used method, but there can be changes to the original.  If some of the original properties are lost, what then are the essential properties to maintaining its integrity?  Currently there are no formal and objective way to help stakeholders decide what the significant properties of the objects are, which are defined as:
The characteristics of digital objects that must be preserved over time in
order to ensure the continued accessibility, usability, and meaning of the
objects, and their capacity to be accepted as evidence of what they purport
to record.
An important goal of digital preservation is more than just retrieving the objects, it is to ensure the authenticity of the information.  A digital object can change as long as the final output is what it is expected to be.  The properties to preserve come from the purpose of the object, and at least one purpose for the object needs to be defined. Archivists have created standards that look at records in the context of their creation, intended use and preservation.  It is important to ask what features of the object is important when delivering to the user.  There may be many uses to many communities that were not intended by the object creator, so we should not let the ideal limit the reasonable.

Tuesday, November 15, 2011

Geospatial Data Preservation.

Geospatial Data Preservation. Website. November 2011.
The Geospatial Resource Center is being developed as a finding tool for freely available web-based resources about the preservation of geospatial information. A variety of selected resources are being added, including reports, presentations, standards, and information about tools for preparing geospatial assets for long-term access and use. The resources are indexed to enable searching of titles and are categorized to facilitate discovery by choosing among topics, resource types, or both. The website contains many valuable resources.  A few resources from these three categories:

Education & Training
  • Appraisal and selection of geospatial data for preservation
  • Best Practices for Geospatial Programs
  • Copyright Quickguide
Tools & Software
  • Cost Estimation Toolkit
  • Conversion tools for geospatial data
  • Geospatial metadata tools

Policies & Benefits

  • Collection policies,
  • Content standards,
  • Policies on Open geospatial data access and preservation

Sunday, October 23, 2011

OAIS / TDR presentation at FDLP.

OAIS / TDR presentation at FDLP.  James A. Jacobs. Federal Depository Library Conference. Free Government Information. October 2011. [PDF]
A presentation giving an introduction to the "Reference Model for an Open Archival Information System" (OAIS) and the "Audit And Certification Of Trustworthy Digital Repositories" (TDR).  This includes slides with speaker notes and a nice handout about related information with links. Every library decision should assess the impact of digital issues.  Notes:

OAIS
1. It defines the functional concepts of a long-term archive with consistent, unambiguous terminology.
2. It gives us a functional framework for designing archives, and a functional model.
3. It gives us a standard for “conformance.”
4. It is a “Reference Model” that describes functions; it is not an “implementation”
5. Some key OAIS concepts are:
   - Designated Community: An identified group of potential Consumers who should
      be able to understand a particular set of information.
   - Description of roles and functions in the information life cycle.
   - The Long Term: Long enough for there to be concern about changing technologies,
      new media and data formats, and a changing user community.
   - Preserved content must be usable according to the designated community

TDR
Documents what is being done and how well it is being done.
Provides 109 “metrics” for measuring conformance to OAIS in three areas: 
1. Organizational Infrastructure
2. Sustainability and succession plan
3. Digital Object Management
4. Technical Infrastructure And Security Risk Management

Saturday, October 22, 2011

Cite Datasets and Link to Publications

Cite Datasets and Link to Publications. Digital Curation Centre. 18 October 2011.
The DCC has published a guide to help authors / researchers create links between their academic publications and the underlying datasets.  It is important for those reading the publication to be able to locate the dataset.  This recognizes that data generated during research are just as valuable to the ongoing academic discourse as papers and monographs, and in many cases the data needs to be shared. "Ultimately, bibliographic links between datasets and papers are a necessary step if the culture of the scientific and research community as a whole is to shift towards data sharing, increasing the rapidity and transparency with which science advances."

This guide has identified a set of requirements for dataset citations and any services set up to support them. Citations must be able to uniquely identify the object cited, identify the whole dataset and subsets as well.  The citation must be able to be used by people and software tools alike.  There are a number of elements needed, but the "most important of these elements – the ones that should be present in any citation – are the author, the title and date, and the location. These give due credit, allow the reader to judge the relevance of the data, and permit access the data, respectively."  A persistent url is needed, and there are several types that can be used. 

Audit And Certification Of Trustworthy Digital Repositories.

The Management Council of the Consultative Committee for Space Data Systems (CCSDS) has published this manual of recommended practices. It is based on the 2003 version from RLG. “The purpose of this document is to define a CCSDS Recommended Practice on which to base an audit and certification process for assessing the trustworthiness of digital repositories. The scope of application of this document is the entire range of digital repositories.”

The document addresses audit and certification criteria, organizational infrastructure, digital object management, and risk management.  It is a standard for those who audit repositories; and, for those who are responsible for the repositories, it is an objective tool they can use to evaluate the trustworthiness of the repository.

Thursday, October 20, 2011

National Archives Digitization Tools Now on GitHub

National Archives Digitization Tools Now on GitHub. NARAtions. October 18, 2011.
The National Archives has begun to share applications developed in-house to facilitate digitization workflows. These applications have significantly increased the productivity and improved the accuracy and completeness of the digitization.Two digitization applications, “File Analyzer and Metadata Harvester” and “Video Frame Analyzer” are publicly available on GitHub.
  • File Analyzer and Metadata Harvester: This allows a user to analyze the contents of a file system or external drive and generate statistics about the contents, generate checksums, and verify that there is a one-to-one match of before and after files. The File Analyzer can import data in a spreadsheet, and can match and merge results with auxiliary data from an external spreadsheet or finding aid.
  • Video Frame Analyzer: This is used to objectively analyze technical properties of individual frames of a video file in order to detect quality issues within digitized video files.  It reduced the time to do quality checks by 50%. 

Monday, October 17, 2011

Research Librarians Consider the Risks and Rewards of Collaboration.

Research Librarians Consider the Risks and Rewards of Collaboration. Jennifer Howard. The Chronicle of Higher Education. October 16, 2011.

Association of Research Libraries’ meeting discussed research and preservation projects like the HathiTrust digital repository and the proposed Digital Public Library of America, plans for which are moving ahead. Concerning the Digital Public Library of America: “Library” is a misnomer in this case, which is more of a federation of existing objects. It wouldn’t own anything. The main contribution would be to set standards and link resources.  “The user has to drive this.”

They said that it’s almost three times more expensive to store materials locally than it is to store them with HathiTrust. Researchers now also create and share digital resources themselves via social-publishing sites such as Scribd. There is a need for collection-level tools that allow scholars and curators to see beyond catalog records.

Discussed Recollection, a free platform built by NDIIPP and a company named Zepheira to give a better “collection-level view” of libraries’ holdings. The platform can be used to build interactive maps, timelines, and other interfaces from descriptive metadata and other information in library catalogs. So, for instance, plain-text place names on a spreadsheet can be turned into points of latitude and longitude and plotted on a map.

“Rebalancing the Investment in Collections,” discussed that libraries had painted themselves into a corner by focusing too much on their collection budgets. Investing in the right skills and partnerships is most critical now. “The comprehensive and well-crafted collection is no longer an end in itself.”

On person told librarians that they shouldn’t rush to be the first to digitize everything and invest in every new technology. “Everybody underestimates the cost of innovation,” he said. “Instead of rushing in and participating in a game where you don’t have the muscle, you want to stand back” and wait for the right moment.

Digital Preservation Matters.

Digital Preservation-Friendly File Formats for Scanned Images.

Digital Preservation-Friendly File Formats for Scanned Images.  Bill LeFurgy. The Signal. October 12, 2011.
Some digital file formats are better for preservation than others.  The best format for preservation is one where the content can be viewable accurately regardless of changes in hardware, software or other technical changes. The Library of Congress has created a web resource to help in selecting file formats, and which will help in understanding how effective formats for long-term preservation.
  • Disclosure of specifications and tools for validating technical integrity
  • Adoption by the primary creators and users of information resources
  • Openness to direct basic and non-propriety tools
  • Self-documentation of metadata needed to render the data as usable information or understand its context
  • Degree to which the format depends on specific hardware, operating system, or software for rendering the information and how difficult that may be.
  • Extent that licenses or patents may inhibit the ability to sustain content.
  • Technical protection mechanisms. Embedded capabilities to restrict use in order to protect the intellectual property.
 Using these factors has helped determine formats that may be more sustainable than others. 

To Save and Project Fest: Long Live Cinema!

To Save and Project Fest: Long Live Cinema!  J. Hoberman. The Village Voice. October 12, 2011.
Digital might be the future of the motion-picture medium, but for film preservation, it’s a mixed blessing. Archivists make it clear that digital technology is part of the solution—and part of the problem. Digital cinema is itself difficult to preserve, subtly distorts (by “improving”) the celluloid image, even as it often dictates (through commercial considerations) those movies deemed worthy of preservation. New York Times DVD critic Dave Kehr has pointed out that instead of increasing access, each new distribution platform (from 35mm to 16mm, VHS, DVD, Blu-ray, and online streaming) has narrowed the range of titles in active distribution and diminished the proportion of available films. Film restoration is also the restoration of cultural memory.

Sunday, October 16, 2011

Tim O'Reilly - Keynote for 2011 NDIIPP/NDSA Partners Meeting.

Tim O'Reilly - Keynote for 2011 NDIIPP/NDSA Partners Meeting. Tim O'Reilly. Library of Congress website.  October 7, 2011.
This is a 31 minute video talking about digital preservation. The things that turn out to be historic often are not thought of being historic at the time. You can’t necessarily do preservation from the institution level.  You have to teach the preservation mindset. Like Wikipedia; it is designed to keep all earlier versions. We should think about what kind of tools we need to build digital preservation into our everyday activities.

There will be a whole new dimension to digital. Imagine what will happen in situations if only digital books and maps are available and then they become unavailable.  That world may be closer than we think. Imagine a world if there are no print books. What would you need to keep the digital materials available?  It turns out that digital actually increases the manufacturing cost of books.   We need to have tools with digital preservation designed in, not necessarily in the way we think of scholarly preservation, but in terms of increasing the likelihood that things will survive. 

What should the web’s memory look like? There is an obligation to preserve the things that matter. We are engaged in the wholesale destruction of our history because we aren’t thinking about what we do as important to our descendants. Think of yourselves as people who are engaged in a task that matters to everyone.  As we move into an increasingly digital world, preservation won’t be just the concern of specialists, but of everyone. One of the arguments for open source is simply to preserve the code.  There have been a number of examples of technical companies not having their source code after they stop supporting it. Preserving everything may get in the way of our preserving the things that are important.

Thursday, October 13, 2011

Innovation, Disruption and the Democratization of Digital Preservation

Innovation, Disruption and the Democratization of Digital Preservation. Bill LeFurgy. Agogified. October 10, 2011.
Interesting article about innovation and society.  It asks the question about digital preservation: Is innovation the key to dealing with all that valuable digital data? "When considered from the popular perspective of innovation, digital preservation looks like a straightforward challenge for libraries, archives, museums and other entities that long have kept information on behalf of society." But it isn't that easy, since technology changes much faster than society's conventions and institutions. "Innovation is not a safe, orderly or controllable process.  It sends out big ripples of disruption with an unpredictable impact." Libraries are being bounced around because of such disruption and the traditional methods are not suited to address the changes.  "All this means that the ability of traditional institutions to fully meet the need for digital preservation is in doubt."
But with these changes comes a change in the people playing a role in preserving digital materials. Some see a greater role for individuals in digital preservation.  There is a great need for designing preservation functionality into tools used to create and distribute digital content to enable content creators to be involved in the digital stewardship. "Ultimately, we have to hope that innovation pushes along the trend toward the democratization of digital preservation.  The more people who care about saving digital content, and the easier it is for them to save it, the more likely it is that bits will be preserved and kept available."

Tuesday, October 11, 2011

Abingdon firm gets Queen's seal of approval

Abingdon firm gets Queen's seal of approval. Oxford Journal.  22 September 2011.
Tessella has been awarded one of the UK’s most prestigious business awards for their collaboration with a public sector organization in developing a unique system for preserving digital information.  Their product, Safety Deposit box which came out in 2003, is now used by governments in seven countries.