Thursday, December 08, 2011

Why don't we already have an Integrated Framework for the Publication and Preservation of all Data Products?

Why don't we already have an Integrated Framework for the Publication and Preservation of all Data Products?   Alberto Accomazzi,et al. Astronomical Data Analysis Software and Systems.  
7 Dec 2011.
Astronomy has long had a working network of archives supporting the curation of publications and data. There are examples of websites giving access to data sets, but they are sometimes short lived.  "We can only realistically take implicit promises of long-term data archival as what they are: well-intentioned plans which are contingent on a number of factors, some of which are out of our control." We should take steps to ensure that our system of archiving, sharing and linking resources is as resilient as it can be.  Some ideas are: 
  1. future-proof the naming system: assign persistent data IDs to items we want to preserve 
  2. provide the ability to cite complete datasets, just as we can cite websites
  3. include a data reference section in academic papers
Curated datasets need to be preserved indefinitely for scholarly purposes.

A literature review: What exactly should we preserve? How scholars address this question and where is the gap

A literature review: What exactly should we preserve? How scholars address this question and where is the gap.  Jyue Tyan Low. Cornell University Library. 7 Dec 2011.
There are generally two approaches to long-term preservation of digital materials
  1. preserving the object in its original form as much as possible along with the accompanying systems,
  2. migration or transformation: transforming the object to make it compatible with more current systems but retaining the original “look and feel.
Migration is the most widely used method, but there can be changes to the original.  If some of the original properties are lost, what then are the essential properties to maintaining its integrity?  Currently there are no formal and objective way to help stakeholders decide what the significant properties of the objects are, which are defined as:
The characteristics of digital objects that must be preserved over time in
order to ensure the continued accessibility, usability, and meaning of the
objects, and their capacity to be accepted as evidence of what they purport
to record.
An important goal of digital preservation is more than just retrieving the objects, it is to ensure the authenticity of the information.  A digital object can change as long as the final output is what it is expected to be.  The properties to preserve come from the purpose of the object, and at least one purpose for the object needs to be defined. Archivists have created standards that look at records in the context of their creation, intended use and preservation.  It is important to ask what features of the object is important when delivering to the user.  There may be many uses to many communities that were not intended by the object creator, so we should not let the ideal limit the reasonable.

Tuesday, November 15, 2011

Geospatial Data Preservation.

Geospatial Data Preservation. Website. November 2011.
The Geospatial Resource Center is being developed as a finding tool for freely available web-based resources about the preservation of geospatial information. A variety of selected resources are being added, including reports, presentations, standards, and information about tools for preparing geospatial assets for long-term access and use. The resources are indexed to enable searching of titles and are categorized to facilitate discovery by choosing among topics, resource types, or both. The website contains many valuable resources.  A few resources from these three categories:

Education & Training
  • Appraisal and selection of geospatial data for preservation
  • Best Practices for Geospatial Programs
  • Copyright Quickguide
Tools & Software
  • Cost Estimation Toolkit
  • Conversion tools for geospatial data
  • Geospatial metadata tools

Policies & Benefits

  • Collection policies,
  • Content standards,
  • Policies on Open geospatial data access and preservation

Sunday, October 23, 2011

OAIS / TDR presentation at FDLP.

OAIS / TDR presentation at FDLP.  James A. Jacobs. Federal Depository Library Conference. Free Government Information. October 2011. [PDF]
A presentation giving an introduction to the "Reference Model for an Open Archival Information System" (OAIS) and the "Audit And Certification Of Trustworthy Digital Repositories" (TDR).  This includes slides with speaker notes and a nice handout about related information with links. Every library decision should assess the impact of digital issues.  Notes:

OAIS
1. It defines the functional concepts of a long-term archive with consistent, unambiguous terminology.
2. It gives us a functional framework for designing archives, and a functional model.
3. It gives us a standard for “conformance.”
4. It is a “Reference Model” that describes functions; it is not an “implementation”
5. Some key OAIS concepts are:
   - Designated Community: An identified group of potential Consumers who should
      be able to understand a particular set of information.
   - Description of roles and functions in the information life cycle.
   - The Long Term: Long enough for there to be concern about changing technologies,
      new media and data formats, and a changing user community.
   - Preserved content must be usable according to the designated community

TDR
Documents what is being done and how well it is being done.
Provides 109 “metrics” for measuring conformance to OAIS in three areas: 
1. Organizational Infrastructure
2. Sustainability and succession plan
3. Digital Object Management
4. Technical Infrastructure And Security Risk Management

Saturday, October 22, 2011

Cite Datasets and Link to Publications

Cite Datasets and Link to Publications. Digital Curation Centre. 18 October 2011.
The DCC has published a guide to help authors / researchers create links between their academic publications and the underlying datasets.  It is important for those reading the publication to be able to locate the dataset.  This recognizes that data generated during research are just as valuable to the ongoing academic discourse as papers and monographs, and in many cases the data needs to be shared. "Ultimately, bibliographic links between datasets and papers are a necessary step if the culture of the scientific and research community as a whole is to shift towards data sharing, increasing the rapidity and transparency with which science advances."

This guide has identified a set of requirements for dataset citations and any services set up to support them. Citations must be able to uniquely identify the object cited, identify the whole dataset and subsets as well.  The citation must be able to be used by people and software tools alike.  There are a number of elements needed, but the "most important of these elements – the ones that should be present in any citation – are the author, the title and date, and the location. These give due credit, allow the reader to judge the relevance of the data, and permit access the data, respectively."  A persistent url is needed, and there are several types that can be used. 

Audit And Certification Of Trustworthy Digital Repositories.

The Management Council of the Consultative Committee for Space Data Systems (CCSDS) has published this manual of recommended practices. It is based on the 2003 version from RLG. “The purpose of this document is to define a CCSDS Recommended Practice on which to base an audit and certification process for assessing the trustworthiness of digital repositories. The scope of application of this document is the entire range of digital repositories.”

The document addresses audit and certification criteria, organizational infrastructure, digital object management, and risk management.  It is a standard for those who audit repositories; and, for those who are responsible for the repositories, it is an objective tool they can use to evaluate the trustworthiness of the repository.

Thursday, October 20, 2011

National Archives Digitization Tools Now on GitHub

National Archives Digitization Tools Now on GitHub. NARAtions. October 18, 2011.
The National Archives has begun to share applications developed in-house to facilitate digitization workflows. These applications have significantly increased the productivity and improved the accuracy and completeness of the digitization.Two digitization applications, “File Analyzer and Metadata Harvester” and “Video Frame Analyzer” are publicly available on GitHub.
  • File Analyzer and Metadata Harvester: This allows a user to analyze the contents of a file system or external drive and generate statistics about the contents, generate checksums, and verify that there is a one-to-one match of before and after files. The File Analyzer can import data in a spreadsheet, and can match and merge results with auxiliary data from an external spreadsheet or finding aid.
  • Video Frame Analyzer: This is used to objectively analyze technical properties of individual frames of a video file in order to detect quality issues within digitized video files.  It reduced the time to do quality checks by 50%. 

Monday, October 17, 2011

Research Librarians Consider the Risks and Rewards of Collaboration.

Research Librarians Consider the Risks and Rewards of Collaboration. Jennifer Howard. The Chronicle of Higher Education. October 16, 2011.

Association of Research Libraries’ meeting discussed research and preservation projects like the HathiTrust digital repository and the proposed Digital Public Library of America, plans for which are moving ahead. Concerning the Digital Public Library of America: “Library” is a misnomer in this case, which is more of a federation of existing objects. It wouldn’t own anything. The main contribution would be to set standards and link resources.  “The user has to drive this.”

They said that it’s almost three times more expensive to store materials locally than it is to store them with HathiTrust. Researchers now also create and share digital resources themselves via social-publishing sites such as Scribd. There is a need for collection-level tools that allow scholars and curators to see beyond catalog records.

Discussed Recollection, a free platform built by NDIIPP and a company named Zepheira to give a better “collection-level view” of libraries’ holdings. The platform can be used to build interactive maps, timelines, and other interfaces from descriptive metadata and other information in library catalogs. So, for instance, plain-text place names on a spreadsheet can be turned into points of latitude and longitude and plotted on a map.

“Rebalancing the Investment in Collections,” discussed that libraries had painted themselves into a corner by focusing too much on their collection budgets. Investing in the right skills and partnerships is most critical now. “The comprehensive and well-crafted collection is no longer an end in itself.”

On person told librarians that they shouldn’t rush to be the first to digitize everything and invest in every new technology. “Everybody underestimates the cost of innovation,” he said. “Instead of rushing in and participating in a game where you don’t have the muscle, you want to stand back” and wait for the right moment.

Digital Preservation Matters.

Digital Preservation-Friendly File Formats for Scanned Images.

Digital Preservation-Friendly File Formats for Scanned Images.  Bill LeFurgy. The Signal. October 12, 2011.
Some digital file formats are better for preservation than others.  The best format for preservation is one where the content can be viewable accurately regardless of changes in hardware, software or other technical changes. The Library of Congress has created a web resource to help in selecting file formats, and which will help in understanding how effective formats for long-term preservation.
  • Disclosure of specifications and tools for validating technical integrity
  • Adoption by the primary creators and users of information resources
  • Openness to direct basic and non-propriety tools
  • Self-documentation of metadata needed to render the data as usable information or understand its context
  • Degree to which the format depends on specific hardware, operating system, or software for rendering the information and how difficult that may be.
  • Extent that licenses or patents may inhibit the ability to sustain content.
  • Technical protection mechanisms. Embedded capabilities to restrict use in order to protect the intellectual property.
 Using these factors has helped determine formats that may be more sustainable than others. 

To Save and Project Fest: Long Live Cinema!

To Save and Project Fest: Long Live Cinema!  J. Hoberman. The Village Voice. October 12, 2011.
Digital might be the future of the motion-picture medium, but for film preservation, it’s a mixed blessing. Archivists make it clear that digital technology is part of the solution—and part of the problem. Digital cinema is itself difficult to preserve, subtly distorts (by “improving”) the celluloid image, even as it often dictates (through commercial considerations) those movies deemed worthy of preservation. New York Times DVD critic Dave Kehr has pointed out that instead of increasing access, each new distribution platform (from 35mm to 16mm, VHS, DVD, Blu-ray, and online streaming) has narrowed the range of titles in active distribution and diminished the proportion of available films. Film restoration is also the restoration of cultural memory.

Sunday, October 16, 2011

Tim O'Reilly - Keynote for 2011 NDIIPP/NDSA Partners Meeting.

Tim O'Reilly - Keynote for 2011 NDIIPP/NDSA Partners Meeting. Tim O'Reilly. Library of Congress website.  October 7, 2011.
This is a 31 minute video talking about digital preservation. The things that turn out to be historic often are not thought of being historic at the time. You can’t necessarily do preservation from the institution level.  You have to teach the preservation mindset. Like Wikipedia; it is designed to keep all earlier versions. We should think about what kind of tools we need to build digital preservation into our everyday activities.

There will be a whole new dimension to digital. Imagine what will happen in situations if only digital books and maps are available and then they become unavailable.  That world may be closer than we think. Imagine a world if there are no print books. What would you need to keep the digital materials available?  It turns out that digital actually increases the manufacturing cost of books.   We need to have tools with digital preservation designed in, not necessarily in the way we think of scholarly preservation, but in terms of increasing the likelihood that things will survive. 

What should the web’s memory look like? There is an obligation to preserve the things that matter. We are engaged in the wholesale destruction of our history because we aren’t thinking about what we do as important to our descendants. Think of yourselves as people who are engaged in a task that matters to everyone.  As we move into an increasingly digital world, preservation won’t be just the concern of specialists, but of everyone. One of the arguments for open source is simply to preserve the code.  There have been a number of examples of technical companies not having their source code after they stop supporting it. Preserving everything may get in the way of our preserving the things that are important.

Thursday, October 13, 2011

Innovation, Disruption and the Democratization of Digital Preservation

Innovation, Disruption and the Democratization of Digital Preservation. Bill LeFurgy. Agogified. October 10, 2011.
Interesting article about innovation and society.  It asks the question about digital preservation: Is innovation the key to dealing with all that valuable digital data? "When considered from the popular perspective of innovation, digital preservation looks like a straightforward challenge for libraries, archives, museums and other entities that long have kept information on behalf of society." But it isn't that easy, since technology changes much faster than society's conventions and institutions. "Innovation is not a safe, orderly or controllable process.  It sends out big ripples of disruption with an unpredictable impact." Libraries are being bounced around because of such disruption and the traditional methods are not suited to address the changes.  "All this means that the ability of traditional institutions to fully meet the need for digital preservation is in doubt."
But with these changes comes a change in the people playing a role in preserving digital materials. Some see a greater role for individuals in digital preservation.  There is a great need for designing preservation functionality into tools used to create and distribute digital content to enable content creators to be involved in the digital stewardship. "Ultimately, we have to hope that innovation pushes along the trend toward the democratization of digital preservation.  The more people who care about saving digital content, and the easier it is for them to save it, the more likely it is that bits will be preserved and kept available."

Tuesday, October 11, 2011

Abingdon firm gets Queen's seal of approval

Abingdon firm gets Queen's seal of approval. Oxford Journal.  22 September 2011.
Tessella has been awarded one of the UK’s most prestigious business awards for their collaboration with a public sector organization in developing a unique system for preserving digital information.  Their product, Safety Deposit box which came out in 2003, is now used by governments in seven countries.

Sunday, October 09, 2011

Piggybacking to Avoid Going Down the Rabbit Hole, or What I Learned at the First DPOE Workshop.

Piggybacking to Avoid Going Down the Rabbit Hole, or What I Learned at the First DPOE Workshop. Sam Meister. The Signal. October 7, 2011.
This is an excellent post about the Digital Preservation Outreach and Education (DPOE) initiative’s first ever Train the Trainer Baseline Workshop and the experience gained there.  The workshop went over the core principles and concepts (Audience, Content, Instructors, Events) and the six modules that make up the curriculum (Identify, Select, Store, Protect, Manage, Provide). The group of 24 trainers broke into six regional groups to work through the modules and develop specific strategies to present the material of individual modules. "As the first group of trainers to review, analyze, revise and disseminate this curriculum, the result of a multi-year development process, we would be the “pioneers” for the DPOE program. To me, this made clear the role and level of responsibility that would be expected of us throughout the rest of the workshop and beyond."

Friday, October 07, 2011

New Guidelines: CrossRef DOIs to be Displayed as URLS.

New Guidelines: CrossRef DOIs to be Displayed as URLS. Carol Anne Meyer. D-Lib Magazine. September/October 2011.
CrossRef, a not-for-profit association of more than 1000 scholarly publishers, revised its recommendations for CrossRef Digital Object Identifiers (DOIs) to specifies that DOIs on the web use a URL format. The previous practice of putting "doi:" in the ID is discontinued, and also that publishers create DOIs that are as short as possible.

From Link Rot to Web Sanctuary: Creating the Digital Educational Resource Archive (DERA).

From Link Rot to Web Sanctuary: Creating the Digital Educational Resource Archive (DERA). Bernard M. Scaife. Ariadne. July 2011.
One of the tasks was to fix the broken links in the catalogue. A report showed that of about 16,000 links to external resources, about 1,200 were non-functional (7.5%).  There were ways to fix many of these, but about 10% of the links referred to documents which no longer existed.  Many of these were government publications. The question was how to do this differently. They looked at adding materials into their own repository, which would allow them to solve the link rot problem while "building in a core level of digital preservation and increasing the discoverability of these documents. We were convinced that a citation which linked to a record in a Web archive was far more likely to survive than one which did not."

They needed to clarify the intellectual property rights, add descriptive metadata, such as the type of document, a collection name, subjects, and the organization that created the document.  They also found that they "had to accept all common file formats at present. In practice, the majority are pdf, some MS Word and a few Excel files. It would, for preservation purposes, be preferable to convert and ingest in PDF/A format, at least for the textual formats. However our view was that the small overhead of batch migrating to that format at a later stage means it would be better to spend time upfront now on metadata rather than file conversion. We felt that this was a pragmatic response which meant that we would be working within the spirit of digital preservation best practice." Also, they found that "data-based formats such as Excel cannot be meaningfully integrated into a full-text search and that these objects would benefit from better representations." 

Other things they learned include    
  • Placing files in a repository gives digital preservation to key documents in the subject field and eradicates the link rot problem.
  • Adding high-quality metadata enhances the resource and allows it to hold its head high and become an integral part of a library's collection.
  • A library can play an important role in preserving content as part of its long-term strategy and ensure high-quality resources remain available.
  • The added value of being able to search the full text provides a potentially very rich resource for researchers.
Future plans are to build up content levels and to integrate these resources with the regular library content,

More (digital) wake-up calls for academic libraries

More (digital) wake-up calls for academic libraries. Rick Luce. LIBER 2011. Duurzame toegang blog. June 2011.
The topic was the core business of academic libraries: serving researchers and the scientific research process. There are many changes taking place in the sciences: "zetabytes of data; dynamic, complex data objects that require management; communities and data flows becoming much more important than static library collections, etc." The warning to academic libraries was that if libraries do not develop those services the new researcher needs, someone else will, and then there is no future for the research library. We need a "fundamental transformation process that will affect every aspect of the ‘library’ business."  The library needs to provide a repository between the scientific process and IT infrastructure that supports and preserves workflows.

Wednesday, October 05, 2011

Graduates To Sow Seeds of New Training Program Across U.S.

Graduates To Sow Seeds of New Training Program Across U.S.  Bill LeFurgy.  The Signal. October 3, 2011.
The inaugural class of  digital preservation training workshop was held at the Library of Congress on  September 20-23.  The 24 professionals were selected from a nationwide pool. The DPOE Workshop was a workshop model designed to produce a national corps of trainers equipped to teach others basic principles and practices of preserving digital materials. “What’s unique about this workshop,” said George Coulbourne, Executive Program Manager, “is that we designed it for people who are going to be actual practitioners of digital preservation. This is not for administrators or managers, but for the novice practitioner. It’s also intended to be as open-source and low-cost as possible.  We hope this event accelerates a new national movement in open, accessible digital-preservation training.”
Those who were trained will take back and present what they learned in their home regions, including by holding one or more training events in digital preservation by mid-2012.

Wednesday, September 28, 2011

ADS and the Data Seal of Approval – case study for the DCC.

The ADS and the Data Seal of Approval – case study for the DCC.  Jenny Mitcham and Catherine Hardman. Digital Curation Centre website. 2010.  
This page describes the experience of Archaeology Data Service in applying for the Data Seal of Approval (DSA). It provides some practical information about the DSA application process and outlines issues the ADS faced in undertaking the process, and several potential benefits they see from the self-certification.

“When undertaking to curate data for the foreseeable future (and beyond) the concept of ‘trust’ is of paramount importance. Yet in a young discipline such as digital archiving, it is very difficult to demonstrate the potential for longevity of curation.”

The Assessment Manual can be downloaded from the DSA website, which includes details of the 16 guidelines, the minimum requirements, and some guidance notes.  In the spirit of the openness the DSA recommends that the main policy and procedure documents should be accessible the world at large.  One of the benefits mention is it shows to users and depositors that the archive has a set of standards is meeting them.

Tuesday, September 27, 2011

Fujitsu CTO: Flash is just a stopgap

Fujitsu CTO: Flash is just a stopgap. Chris Mellor. The Register. 8 August 2011.
Flash memory according to the CTO is "beset with problems that will become unsolvable". The increases in flash density come at the expense of the ability to read and write data. "Each shrink in process geometry, from 3X to 2X and onto 1X, shortens flash's endurance", and brings its additional problems of access speed and endurance.

Tomorrow will be too late – (born) digital in library special collections.

Tomorrow will be too late – (born) digital in library special collections. 28 June 2011.
Report from the annual conference of LIBER, the Association of European Research Libraries: It appears that according to recent studies, research libraries are still functioning within an analogue paradigm. Many libraries have digitized collections and provide online access, but their digitization efforts “mostly lack strategic planning, access is still mostly provided in a controlled way (for a limited group of users), preservation issues are still not being addressed adequately, and born-digital material (including audiovisual content) is blatently missing from collections and collection plans.”  If libraries stay in their comfort zone of digital access in a controlled network, their information role will become insignificant.  They need:
  • A knowledge of the user groups and their behaviours
  • Content, access, and preservation strategies
  • Strategic alliances
  • Permanent innovation
  • An open mind
 There are opportunities for libraries but they need to transition their methods.

Google helps put Dead Sea Scrolls online


Google helps put Dead Sea Scrolls online. BBC News.  26 September 2011.
Ultra-high resolution images of several Dead Sea Scrolls are now available on line.  Five scrolls have been digitized (1,200 megapixel images); they are The Temple Scroll, The War Scroll, The Community Rule Scroll, The Great Isaiah Scroll, and The Commentary of Habakkuk Scroll.  

Friday, September 23, 2011

Practical Approaches to Electronic Records: What Works Now (ppt)

Practical Approaches to Electronic Records: What Works Now.  Chris Prom, et al. August 30, 2011.
This is a PowerPoint of a presentation given by several people at SAA.  A few notes from the slides:
Adjectives that are undesirable when describing an archive: Undercounted, undermanaged, inaccessible.
Basic requirements for the digital archive:
  • Perform a virus check
  • Capture descriptive metadata about the folders and files
  • Document the file formats
  • Record checksums for the files
  • Document the actions taken over time

    Meeting the Challenge of Media Preservation: Strategies and Solutions.

    Meeting the Challenge of Media Preservation: Strategies and Solutions." Indiana University Bloomington. September 2011. (128 page pdf.)
    This excellent study is the result of a year of research and planning to address the problems an earlier report in 2009.  It looks at the preservation and conservation of audio, video, and film, including: guiding preservation principles, facility planning, prioritization, digitization methodologies, strategies for film, principles for access, technological infrastructure needs, and engagement with campus units and priorities. It is specifically for the university, but the information and recommendations are of interest to others.  Their mission is to preserve the time-based media holdings of Indiana University so that they may be accessible. They estimate their media holdings at more than 560,000 audio, video, and film objects, and nearly all on obsolete formats. And they estimate they only have a  fifteen- to twenty-year window of opportunity to digitally preserve audio and video holdings. They propose to collect rich descriptive and technical metadata to support digitization and future interpretation and management of digital content.
    • The media preservation crisis impacts every institution with media collections.
    • Because campus holdings are very large and time pressures great, even high-efficiency workflows may not preserve everything in time. 
    • Not every recording is an appropriate candidate for long-term preservation.
    • Research, instruction, curation, and public availability are core university missions supported by media preservation efforts.
    • Access is the end goal of any preservation work, and it must be developed in tandem with
      media preservation efforts.
    • Access to preserved holdings is critical to the success of the project and to the realization of its value to the campus.
    • The vision is an era characterized by a wealth of media content preserved long term and made accessible and integrated into campus research and instruction.
    • We live in a watershed moment in which acute challenges demand a coordinated effort to address dramatic technological and cultural changes in the way users access time-based media.
    • Target: high resolution audio preservation and production masters—24 bit, 96 kHz sample rate.
    Some recommendations are that a 10,000-square-foot Indiana Media Preservation and Access Center is built employing 25 staff: administrators, audio and video engineers, film specialists, processing technicians, and IT support. The annual output is projected at 2-3PB of data per year with a total fifteen-year target of 39PB of data storage.  The first year of work will be focused on developing solutions to the challenges posed by legacy media. The second year will begin developing management strategies and workflows for file-based born digital recordings.

    Their guiding principles include: Curatorial Responsibility; Standards and Best Practices; Online Accessibility; Description Services; usability of metadata; copyright strategies; Access Digitizing and Preservation. Their efforts need to be combined into a trusted digital repository.  "Preservation metadata requirements need to be defined, and tools need to be developed to support audio and video preservation package validation, technical metadata capture, and repository ingest."

    This is a long study but is well worth the time to read all the way through it.   This university link also includes other related materials, such as the media preservation survey, and a brochure "Our History is at Risk".

    Friday, September 16, 2011

    Library of Congress To Launch New Corps of Digital Preservation Trainers. Library of Congress.

    Library of Congress To Launch New Corps of Digital Preservation Trainers. Library of Congress.  The Signal.  September 16, 2011. Bill LeFurgy.
    The Digital Preservation Outreach and Education program at the Library of Congress will hold its first national train-the-trainer workshop on September 20-23, 2011, in Washington, DC.

    The DPOE Baseline Workshop will produce a corps of trainers who are equipped to teach others, in their home regions across the U.S., the basic principles and practices of preserving digital materials. Examples of such materials include websites; emails; digital photos, music, and videos; and official records. The intent of the workshop is to share high-quality training in digital preservation, based upon a standardized set of core principles, across the nation.

    Long-term Preservation for Spatial Data Infrastructures: a Metadata Framework and Geo-portal Implementation.

    Long-term Preservation for Spatial Data Infrastructures: a Metadata Framework and Geo-portal Implementation. Arif Shaon, Andrew Woolf. D-Lib Magazine. September/October 2011.
    Geospatial data is increasing, particularly with diverse environmental datasets. Long-term preservation of the data is not typically addressed, but it is very important for current and future use.  Sustained access to environmental data is becoming more important and more difficult because it is increasing so dramatically.
    Without effective long-term preservation, the data face the risk of becoming unusable over time. This article looks at the requirements, particularly metadata, for preserving this data.  The authors have implemented a web-based portal prototype that demonstrates some functions of a preservation interface, such as data discovery using geospatial metadata, data downloading, metadata creation and validation.  There is more to be done in this area.

    Tuesday, September 13, 2011

    Preserving Your Personal Digital Memories

    This is a free online course on preserving your personal digital materials: photos, documents, and other media.  These are fragile and require special care to keep them useable. But preserving digital information is a new concept that most people have little experience with. As new technologies appear for creating and saving our personal digital information, older ones become obsolete, making it difficult to access older content. Learn about the problem with digital preservation of materials and hear about some simple, practical tips and tools to help you keep digital files safe.

    Evaluating Open Source Digital Preservation Systems: A Case Study.

    Evaluating Open Source Digital Preservation Systems: A Case Study.  Angela Jordan. Practical E-Records. August 18, 2011.
    The University of Illinois Archives has implemented the “Practical E-Records Method,” a project that provides recommendations to help  make digital curation and digital preservation systematic institutional functions.  They tested Archivematica, which is essentially "an Ubuntu (Linux) distribution with extensions to support digital preservation actions using a web-based preservation dashboard."  The test "started with elementary electronic records such as Microsoft Office documents and PDFs, then moved to complicated, larger file types, such as audio-visual objects."  Some parts worked well, but there were a number of errors.  "Given the immediate needs of the University Archives, the developing state of Archivematica, and other digital preservation development work taking place within the University Library, we chose not to incorporate the current version into our electronic records work flow."

     There were three remaining concerns:
    1. smaller institutions may lack the hardware or the technological capability to support the system
    2. the installation process is not user friendly
    3. the software is best run from a dedicated virtual server, to which many institutions may not have access.  Running Archivematica on a dedicated virtual machine requires significant help from IT
    The technological ability needed to successfully install and run this system is currently beyond the people who might benefit most. Once some of the issues are worked out in upcoming versions, Archivematica will be useful for smaller institutions that have less IT support than a large research library.

    KRDS Digital Preservation Benefits Analysis Toolkit and KRDS Updates now available.

    KRDS Digital Preservation Benefits Analysis Toolkit and KRDS Updates now available. Neil Beagrie. Website.  05 Aug 2011.
    The Keeping Research Data Safe (KRDS) project  was set up to show the benefits of digital preservation.  The Digital Preservation Benefits Analysis toolkit, now available, has two tools which consist of a detailed guide and worksheet(s):
    1. KRDS Benefits Framework:  an “entry-level” tool requiring less experience and effort to implement and which can also be used as a stand-alone tool for many tasks
    2. Value-chain and Benefits Impact tool: a more advanced tool requiring more experience and effort to implement. It is likely to be most useful in activities such as evaluation and strategic planning.
    The site also has worksheets, guidance documentation and exemplar test cases.

    Thursday, September 08, 2011

    Research Archive Widens Its Public Access—a Bit

    Research Archive Widens Its Public Access—a Bit. Editorial. Technology Review.  7 September 2011.
    JStor, an organization which maintains link to 1,400 journals for subscribing institutions, is providing free public access to articles published prior to 1923 in the United States or before 1870 in other countries, about 6 percent of its content. In a letter to publishers and libraries, JStor refers to plans for "further access to individuals in the future."

    Atempo Digital Archive Helps Simplify and Streamline Broadcast Workflows With Scalable Storage Integration.

    Atempo Digital Archive Helps Simplify and Streamline Broadcast Workflows With Scalable Storage Integration. Press release. Sept. 8, 2011.
    Announcement that Atempo Digital Archive (ADA) has been integrated with MediaGrid, to simplify and streamline broadcast workflows. The software addresses long-term data retention requirements and digital preservation. "Atempo enables organizations to preserve and protect digital information simply and effectively, across any infrastructure, on any platform, over long periods of time. Atempo's comprehensive archiving solutions deliver policy-based and workflow-driven management of rich media files, email and other high-value digital assets to maximize the efficiency and performance of storage systems and reduce long-term storage costs."

    Paper on JPEG 2000 for preservation

    Paper on JPEG 2000 for preservation. Johan van der Knijff.  National Library of the Netherlands (KB). Johans blog. Open Planets Foundation.  June 2011.  This paper, published in D-Lib Magazine, looks at the suitability of the JP2 format for long-term digital preservation.  He identifies issues will be addressed in an amendment to the standard.  It also provides some practical recommendations that may help in mitigating the risks for existing collections.

    Wednesday, September 07, 2011

    A simple JP2 file structure checker

    A simple JP2 file structure checker. Johan van der Knijff.  National Library of the Netherlands (KB). Johans blog. Open Planets Foundation.  1 September 2011.
    The KB is planning to migrate a collection of  TIFF images to JP2. One major risk of such a migration is that hardware failures during the migration process may result in corrupted images. Others have found that corrupted, incomplete JP2 files can still be see as "well-formed and valid" by JHOVE.  This tool was written to detect incomplete code streams in JP2 files.  There are links to the source code of jp2StructCheck, some documentation, and a small data set with some test images. This will be an important tool for digital preservation.

    Sunday, September 04, 2011

    JISC Legal Cloud Computing and the Law Toolkit.

    JISC Legal Cloud Computing and the Law Toolkit. Website. 31 August 2011.
    Documents to help make informed decisions about implementing cloud computing solutions in an  institution. Not specifically for digital preservation, but helpful to think about the policies that will affect data and the life-cycle, such as
    Data Protection, Possession of Data on Termination, etc.  Written for UK educational institutions.

    • Report on Cloud Computing and the Law for UK Further and Higher Education
    • User Guide: Cloud Computing and the Law for IT
    • User Guide: Cloud Computing and the Law for Senior Management and Policy Makers
    • User Guide: Cloud Computing and the Law for Users 
    • User Guide: Cloud Computing Contracts, SLAs and Terms & Conditions of Use 

    Institutional Repository and ETD Bibliography 2011

    Institutional Repository and ETD Bibliography 2011. Charles W. Bailey, Jr.  September 2011.
    This bibliography has over 600 English-language articles, books, and other works about institutional repositories and theses and dissertations (ETDs).  Among other things, it includes digital preservation issues, IR library issues, IR metadata strategies, and institutional open access mandates and policies. Most sources have been published from 2000 through June 30, 2011.  The bibliography includes links to freely available versions of included works.  It is available as a PDF file.

    Saturday, September 03, 2011

    Memory failure detected.

    Memory failure detected. THE: Times Higher Education. 1 September 2011.
    In the future, researchers will have an incomplete record about events taking place in our day: much of the material was never stored or has been only partially archived.  The extent to which content disappears without trace from the web is worrying. Not enough academics are engaging with the topic. "We are taking it for granted that such material will be there, but we need to be attentive. We have a responsibility to future generations of researchers."  "These issues are long term and worthy of investment," The Internet Archive is the most comprehensive of the web archives with more than 150 billion pages from more than 100 million sites [but these are often only partial pages].

    There are also smaller-scale selective archives. Websites are collected around topics, themes or events chosen by library curators, with sites harvested only when the copyright holder's permission has been obtained. The approach lacks breadth, but as the operation is smaller, individual websites can be captured more comprehensively. Most news content that is published only online is simply falling through the cracks. "But the web archiving community's current practices, the report continues, are producing something that is in danger of ending up as a "dusty archive". In this scenario, archiving technology keeps pace with the latest developments and archives are well curated and maintained, but they sit largely unused, gathering "digital dust". "As is too often the case with those who build resources, they are preserving websites without giving any real thought to how they might be used in the future."

    Monday, August 29, 2011

    Institutional Repositories and Digital Preservation: Assessing Current Practices at Research Libraries

    Institutional Repositories and Digital Preservation: Assessing Current Practices at Research Libraries. Yuan Li, Meghan Banach. D-Lib Magazine. May/June 2011.
    If the digital scholarly record is to be preserved, libraries need to establish new best practices for preservation. For their part, creators need to be more proactive about archiving their work. Institutional Repositories may provide some help in preserving digitial materials, but some  question whether IRs were intended to provide long-term preservation of digital scholarship.  


    The most important roles that IRs play are to collect, manage, and disseminate the digital scholarship that their communities produce. Most content in an IR is deposited by author self-archiving, by third party on behalf of the author, and by repository staff. Regardless of how content is deposited in the IR, the quality of deposited content should be examined before digital preservation actions are considered, since the quality of content can directly affect the success of digital preservation efforts. Problems may include format obsolescence, poor quality images, and insufficient metadata to manage and preserve the materials.

    While most report that their IRs are currently providing long-term digital preservation, a closer look shows they are really in a planning process to provide long-term preservation rather than providing it in a fully operational way. An increasing number of research libraries have started to move digital preservation programs ahead by developing preservation policies.

    Criteria for the Trustworthiness of Data Centres

    Criteria for the Trustworthiness of Data Centres. Jens Klump. D-Lib Magazine. January/February 2011.
    The rapid decay of URLs for research resources is an important reason to use persistent identifiers. The use of persistent identifiers implies that the data objects are persistent themselves. The rapid obsolescence of the technology to read the information, along with the physical decay of the media, represents a serious threat to preservation of the content. Since research projects only run for a relatively short time, it is advisable to shift the responsibility for long-term data curation from the individual researcher to a trusted data repository or archive.

    We need criteria for the assessment of trustworthiness of digital archives. Some of the methods presented have been:
    •     Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC)
    •     Catalogue of Criteria for Trusted Digital Repositories (nestor Catalogue)
    •     DCC and DPE Digital Repository Audit Method Based on Risk Assessment (DRAMBORA)
    •     DINI-Certificate Document and Publication Services
    •     Data Seal of Approval (Sesink et al., 2008)
    These provide useful feedback on developing additional criteria and auditing procedures to certify  trusted digital archives.

    Google Strikes Deal With French Publisher La Martiniere Groupe

    Google has signed a deal with French publishing house La Martiniere Groupe for the scanning of books no longer on sale but still protected by copyright.They will jointly set up a catalog of books to be scanned that are no longer sold by the publisher. La Martiniere Groupe will decide which books Google is allowed to scan and also which of the scanned books can then be sold on Google's Ebooks platform. That deal was seen as setting a precedent for how publishing companies across the continent can make money via the digitization of books still under their copyright protection but no longer sold in stores.

    Thursday, August 25, 2011

    The Conference on World Affairs Archive Online: Digitization and Metadata for a Digital Audio Pilot.

    The Conference on World Affairs Archive Online: Digitization and Metadata for a Digital Audio Pilot.  Michael Dulock, Holley Long.  D-Lib Magazine. March/April 2011.
    The University of Colorado Archives began a project to digitize a sample set of 80 tapes from their substantial collection of audio recordings. To prepare for the project the Media Specialist inspected a sampling of the audio in the collection to determine media formats and the collection's condition. The media specialist also listened to the selection of materials to rate the sound quality on a scale of excellent, very good, good, fair, or poor.  In addition to the media's age, playback equipment for analog audio formats is becoming increasingly more difficult to acquire and maintain.

    They followed the technical  recommendations in the Collaborative Digitization Program's Digital Audio Best Practices. The team chose to digitize the materials at the recommended 44.1 kHz and 24 bit, since these specifications adequately capture the spoken word.

    With existing metadata, volunteers added descriptive summaries and topical keywords.  The project used several schemas:  Broadcasting Metadata Dictionary Project (PBCore) for the audio, qualified Dublin Core for text documents, and Visual Resources Association Core metadata 4.0 (VRA Core) for photographs and other images.

    Digital Preservation, Digital Curation, Digital Stewardship: What’s in (Some) Names?

    Digital Preservation, Digital Curation, Digital Stewardship: What’s in (Some) Names?  Butch   Lazorchak. The Signal. August 23, 2011.
         We often use “digital preservation,” “digital curation” and  “digital stewardship” interchangeably without thinking about the differences or what the name is. 
    Preservation is defined as keeping something in its original state.
    Curation looks at selection, maintenance, collection and archiving of digital assets in addition to their preservation.

    Curation is useful for looking at the entire life of the materials and concentrates on "building and managing collections of digital assets and so does not fully describe a more broad approach to digital materials management.

    Stewardship looks at holding resources in trust for future generations which can include both preservation and curation.

    Tuesday, August 23, 2011

    Digital Video Preservation: Further Challenges for Preserving Digital Video and Beyond By Killian Escobedo

    Digital Video Preservation: Further Challenges for Preserving Digital Video and Beyond. Killian Escobedo. The Signal. August 16, 2011.
    Standards for preserving and maintaining digital video are now emerging. For archives, a sustainable video file format must either be loss-less or uncompressed.  There are efforts to establish the Motion JPEG2000 video codec and MXF wrapper as the preservation target format for digital video. But the codec has been widely adopted, which makes it difficult to support the format for preservation purposes. There are also other digital video preservation challenges, such as video files that are part of larger multimedia objects, like CD-ROMs, DVDs and websites. Other workflows and tools will have to be developed.  Preserving digital video files on CD-ROMs, DVDs, and such will require new metadata schema and strategies.

    "Currently, the Archives is looking at screen capture software as a potential means of recording how a CD-ROM functions and links to other material before software and hardware obsolescence renders the content unplayable. A similar method can be employed for capturing the navigational structure of DVD menus and Flash-based websites".  Once the object cannot play on regular equipment, it will require recreating environments with obsolete hardware, software, and operating systems.

    Wednesday, August 17, 2011

    When Data Disappears.

    When Data Disappears. Kari Kraus. The New York Times. August 6, 2011.
    A writer said he didn't include digital media in his archive because he felt digital preservation is doomed to fail. “There are forms of media which are just inherently unstable.” It is more difficult, but it is not pointless.  "If we’re going to save even a fraction of the trillions of bits of data churned out every year, we can’t think of digital preservation in the same way we do paper preservation. We have to stop thinking about how to save data only after it’s no longer needed, as when an author donates her papers to an archive. Instead, we must look for ways to continuously maintain and improve it. In other words, we must stop preserving digital material and start curating it."

    There are major challenges with digital preservation, but part of it is the amount of data being created.  The world created over 1.8 zettabytes of digital information a year. There will never be enough capacity to save everything if we continue to replicate the practices used to maintain paper archives. In the paper archives model, preservation begins at the end of the life cycle. Data preservation must happen earlier, ideally when the item is created. The decisions about what to save and how to save it must be made early in the life cycle; the data should then be curated, not preserved.  Not all data is worth preserving, either in paper or electronically.  Video games offer an interesting model that may be useful with other types of information.  That model "allows us to see preservation as active and continuing: managing change to data rather than trying to prevent it, while viewing data as a living resource for the future rather than a relic of the past"

    The 2011 Digital Universe Study: Extracting Value from Chaos.

    The 2011 Digital Universe Study: Extracting Value from Chaos.  IDS Website.  June 2011.
    In 2011 the amount of information created and replicated will surpass 1.8 zettabytes in about 500 quadrillion 'files'. About 75% of the information is created by individuals. The amount of information individuals create themselves is less than the information being created about them (your digital shadow).  As the digital universe expands and gets more complex, processing, storing, managing, securing, and disposing of the information in it become more complex as well.  The calls to action include:

    Technical:
    • Investigate new tools for creating metadata
    • Decide on the most important data projects, along with the needed data sets and tools and create an enterprise data strategy
    • Stay close to the latest strategies and practices
    • Be aggressive in developing and managing storage management tools
    Organizational:
    • Set the strategy and build a process for sharing resources
    • Begin creating the needed skill sets, mindsets, and processes needed to best use the data
    • Collaborate with partners and suppliers
    The growth of the digital universe is a challenge but also brings a way for new and exciting uses of data.


    Binary Powers of 10

    Binary Powers of 10.  Website.
    [There are lots of sites with this information, but this has some good information.  There are others that continue the list with lumabyte, though that is not on any standard list.  And of course my favorite is the brontobyte. The current list is extending with the alphabet going from Z to A.]
    1 byte
    1 byte.
    1 kilobyte
    1,024 bytes
    1 megabyte
    1,048,576 bytes
    1 gigabyte
    1,073,741,824 bytes
    1 terabyte
    1,099,511,627,776 bytes
    1 petabyte
    1,125,899,906,842,624 bytes
    1 exabyte
    1,152,921,504,606,846,976 bytes
    1 zettabyte
    1,180,591,620,717,411,303,424 bytes
    1 yottabyte
    1,208,925,819,614,629,174,706,176 bytes
    1 xonabyte
    1,237,940,039,285,380,274,899,124,224 bytes
    1 wekabyte
    1,267,650,600,228,229,401,496,703,205,376 bytes
    1 vundabyte
    1,298,074,214,633,706,907,132,624,082,305,024 bytes

    About the Data Seal of Approval (DSA)

    The Data Seal of Approval ensures that research data can still be processed in the future by establishing quality guidelines. 
    There are five criteria that together determine whether or not the digital research data may be qualified as sustainably archived:
    1. The research data can be found on the Internet.
    2. The research data are accessible, while taking into account relevant legislation with regard to personal information and intellectual property of the data.
    3. The research data are available in a usable format.
    4. The research data are reliable.
    5. The research data can be referred to.
    In addition, there are three groups that must use the data responsibly"
    1. The data producer is responsible for the quality of the digital research data.
    2. The data repository is responsible for the quality of storage and availability of the data: data management.
    3. The data consumer is responsible for the quality of use of the digital research data.
    The seal shows that the data archive or repository is in compliance with the sixteen DSA guidelines:
    1. The data producer deposits the research data in a data repository with sufficient information for others to assess the scientific and scholarly quality of the research data and compliance with disciplinary and ethical norms.
    2. The data producer provides the research data together with the metadata requested by the data repository.
    3. The data repository has an explicit mission in the area of digital archiving and promulgates it.
    4. The data repository uses due diligence to ensure compliance with legal regulations and contracts.
    5. The data repository applies documented processes and procedures for managing data storage.
    6. The data repository has a plan for long-term preservation of its digital assets.
    7. Archiving takes place according to explicit workflows across the data life cycle.
    8. The data repository assumes responsibility from the data producers for access to and availability of the digital objects.
    9. The data repository enables the users to utilize the research data and refer to them.
    10. The data repository ensures the integrity of the digital objects and the metadata.
    11. The data repository ensures the authenticity of the digital objects and the metadata.
    12. The technical infrastructure explicitly supports the tasks and functions described in internationally accepted archival standards like OAIS.
    13. The data consumer must comply with access regulations set by the data repository.
    14. The data consumer conforms to and agrees with any codes of conduct that are generally accepted in higher education and research for the exchange and proper use of knowledge and information.
    15. The data consumer respects the applicable licences of the data repository regarding the use of the research data.
    16. The data producer provides the research data in formats recommended by the data repository.

    Record Industry Braces for Artists’ Battles Over Song Rights.

    Record Industry Braces for Artists’ Battles Over Song Rights.  Larry Rohter. New York Times.  August 15, 2011.
    When copyright law was revised in the mid-1970s, musicians, like creators of other works of art, were granted “termination rights,” which allow them to regain control of their work after 35 years.  The record companies believe the termination right doesn’t apply to most sound recordings. The copyright law went into effect on Jan. 1, 1978, so the earliest any recording can be reclaimed is Jan. 1, 2013.  A resolution is probably not possible without a definitive court ruling.

    Tuesday, August 16, 2011

    Will Kindles kill libraries?

    Will Kindles kill libraries? Eugenia Williamson. The Phoenix. July 27, 2011.
    [There are many sources discussing this major change for publishing and libraries.  This post really looks at the preservation aspect.]

    "Preserving materials for future generations is a big part of why libraries exist in the first place. According to the American Library Association, preservation upholds the First Amendment by contributing to the free flow of information."

    But a library can't preserve a book it doesn't own, and many digital works are now being licensed rather than purchased. One company, OverDrive, is a middleman negotiating between the libraries and publishers.  It is unknown if there are long term rights for these materials, and if so, what the rights are, and how this fits into a preservation model.

    As reported in Library Journal, that state's library system began using those services in 2006, and last year, that company proposed a new contract that would raise administrative fees 700 percent by 2015.

    Kansas has announced their intent to petition for the right to terminate its contract which Kansas believes that it owns the e-books it licensed and has the right to transfer them to a new service provider. If the library cannot do this, they will have spent $568,000 for books it can no longer access, which is more than if they had purchased print copies that they would own.

    Friday, August 12, 2011

    New Statistics Model for Book Industry Shows Trade Ebook Sales Grew Over 1,000 Percent

    New Statistics Model for Book Industry Shows Trade Ebook Sales Grew Over 1,000 Percent. Library Journal. Michael Kelley. August 9, 2011.
    A new annual survey of the total U.S. book publishing industry shows growing revenue and exponential eBook sales.

    The industry sold 2.57 billion books in all formats in 2010, a 4.1 percent increase over 2008.  Publishers' net sales revenue grew to $27.94 billion in 2010, a 5.6 percent increase over 2008. Net revenue from trade books grew 5.8 percent since 2008, to $13.94 billion.

    Within the trade segment, eBooks, again excluding the robust growth that has occurred in 2011, grew from 0.6 percent of the total trade market share in 2008 to 6.4 percent in 2010, which translates to a 1,274.1 percent increase in publisher net sales revenue year-over-year, with total net revenue for 2010 at $878 million. In the same three years, 114 million ebooks were sold, a 1,039.6 percent increase. In adult fiction, ebooks represent 13.6 percent of the net revenue market share.

    Online sales became an increasingly important distribution channel. Net sales revenue for content distributed online was $2.82 billion in 2010, a three-year overall growth of 55.2 percent. Net unit sales by publishers to online channels grew 68.6 percent, to 276 million in 2010.

    For 2010, overall bricks-and-mortar trade retail remained the largest distribution channel in the United States (40.8 percent). In contrast to the eBook numbers, total net sales revenue of trade hardcovers in 2010 was $5.26 billion, an increase of only 0.9 percent over the three years, and its share of the market declined from 39.6 percent in 2008 to 37.7 percent in 2010. Softcover revenue was up 1.2 percent to $5.27 billion, with a similar decline in market share, and mass-market paperback net sales revenue was down 13.8 percent to $1.28 billion.


    Washington State Archives - Digital Archives

    The archives is dedicated specifically to the preservation of electronic records from both State and Local agencies that have permanent legal, fiscal or historical value. This is a Microsoft solution. The web interface and database storehouse were custom designed specifically for the Digital Archives.  The documents on their website about the  project (in PDF or PowerPoint formats) are worth reviewing.  They are :

    Thursday, August 11, 2011

    Building a Sustainable Institutional Repository

    Building a Sustainable Institutional Repository. Chenying Li, et al. D-Lib Magazine. July/August 2011.
    Institutional Repositories are an increasingly important resource and service offered by libraries. Increasing the use of the content is a key to building a sustainable IR. Two organizational types:

    1. Structured Content Organization
    Organizing content according to its role in the University, which provides a more orderly process of content organization and more efficient metadata.

    2. Modular Content Publishing
    Creating modules as independent publishing units that work together as a complete and comprehensive publishing system. This uses themed publishing and metadata aggregation.

    It is becoming more important for libraries to provide users with the contents and services that are found in institutional repositories.  

    Free Tools for Your Preservation Toolbelt.

    Free Tools for Your Preservation Toolbelt. Randy Stern, Spencer McEwen.  Harvard.  June 2011. A presentation delivered at the Open Repositories 2011 conference
    The Digital Repository Service models “objects” rather than files.  Examples:
    - Delivery, archival master, and production master images comprise one object
    - All images and OCR text for a book comprise one object

    Object Preservation Metadata: Digital preservation requires accurate and sufficient technical metadata to support preservation planning and activities. Descriptive metadata is also valuable for identification and management by curators.  Standards-based schema maximize tool support and ability to exchange data with other repositories.

    Here are some tools they use, which are open source or will be soon.

    Tool 1 - FITS (File Information Tool Set)
    Identifies, validates, and extracts technical metadata from files

    Tool 2 - OTS-Schemas (Object Tool Set Schemas)
    Java library for reading and writing documents in common XML schemas

    Tool 3 - OTS (Object Tool Set)
    Java library for creating, reading, updating, and writing METS Object Descriptors

    Tool 4 - BatchBuilder
    Builds OTS METS objects (and SIP) from directory hierarchies of content files

    Wednesday, August 10, 2011

    Five Tips for Designing Preservable Websites.

    Five Tips for Designing Preservable Websites. Robin C. Davis.  The Bigger Picture.  Smithsonian Institution Archives. August 2, 2011.
    Smithsonian Institution Archives is preserving the Institution’s history, including its large web presence. The Archives crawls each website using Heritrix, an open-source tool created by the Internet Archive, to capture content in an archival format. The purpose is to preserve the appearance, behavior, and content of digital objects. The Archives tailors crawl configurations to each specific website to capture as much of it as possible while adhering to the collections policy. Sometimes the structure of the site itself makes a perfect crawl difficult or impossible.

    Five suggestions for web developers that can help ensure that their websites will be easier to crawl, to make accessible, and to preserve.
    1. Follow accessibility standards
    2. Avoid proprietary formats for important content or provide alternate versions
    3. Maintain stable URLs and redirect when necessary.  Avoid linkrot, meaning links which point to resources that are no longer available. Carefully plan and implement a URL design scheme with a policy of persistence. They have found websites with as many as 40% broken links.
    4. Design navigation carefully and include a sitemap. The crawler is usually set to capture only six levels deep. To help others discover your entire website, provide a sitemap and “view all” link for documents.
    5. Allow browsing of collections, not just searching, such as by arranging images by genre.
    Designing a web site with preservation in mind can help safeguard it for the future. This is part of our cultural legacy.



    Hawaiian Heritage sites set to launch on CyArk website.

    Hawaiian Heritage sites set to launch on CyArk website. Press release. Hawaii 24/7. August 10, 2011.   
    CyArk, with the help of its partners, conducted the field work for the Digital Preservation of three culturally significant Hawaiian sites, which include site animations, photography, panoramas, perspectives, and drawings. This information will showcase oft-overlooked heritage sites and highlight the need for cultural resource preservation.

    CyArk is a non profit organization with the mission of digitally preserving cultural heritage sites through collecting, archiving and providing open access to data created by laser scanning, digital modeling, and other state-of-the-art technologies.

    Start-up to release 'stone-like' optical disc that lasts forever

    Start-up to release 'stone-like' optical disc that lasts forever.  Lucas Mearian. Computerworld. August 8, 2011.

    Millenniata has partnered with Hitachi-LG Data Storage to launch an M-Disc read-write player in early October.  Any DVD player maker will be able to produce M-Disc machines by simply upgrading their product's firmware. Millenniata said it has also proven it can produce Blu-ray format discs with its technology - a product it plans to release in future iterations. Currently the discs write at  only 4x speed, but they are working to increase it.

    Millenniata partnered with Hitachi-LG Data Storage to provide M-Ready technology in most of its DVD and Blu-ray drives. The technology is addressing the needs of the long-term data archive market.  This disc does not need special temperature or humidity controls.

    Friday, August 05, 2011

    Economics and Digital Preservation: Final Report of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access

    Economics and Digital Preservation: Final Report of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access.  Fran Berman and Brian Lavoie. Library of Congress. July 21, 2011. PDF. [Old link has disappeared. To read, use the 2015 link]
    Digital Preservation is both a technical and economic problem. There must be solutions to both for there to be success.  Even the most elegant technical solution is no solution at all if it is not economically sustainable.  Some of the challenges they list:
    • “One‐time” funding models are inadequate to address persistent long‐term access and preservation needs
    • Poor alignment between stakeholders in the digital preservation and access world and their roles, responsibilities and support models
    • Lack of institutional, enterprise, and/or community incentives to support the collaboration needed to enforce sustainable economic models
    • Complacency that current practices are “good enough” and / or the problem is not urgent.
    • Fear that digital access and preservation is too big to take on
    Stakeholders are:
    • Those who benefit from use of a preserved asset
    • Those who select what to preserve
    • Those who own or have rights to an asset
    • Those who preserve the asset
    • Those who pay
    There is no magic bullet, and there is no "free" solution.

    Recommendations:

    1.        Create Sustainability‐friendly policies and mandates
    2.        Invest in preservation infrastructure
    3.        Create preservation‐aware communities
    a.        Create public public‐private partnerships to align distinct stakeholder groups
    b.        Convene expert communities to address the selection and preservation needs of valuable materials for which there is no stewardship
    4.        Raise awareness
    a.        Provide leadership in training and education For 21st century digital preservation,
    b.        Promote digital preservation skills and awareness
    5.        Take individual responsibility
    a.        Provide nonexclusive rights to preserve and distribute created content
    b.        Partner with preservation experts throughout the data lifecycle to ensure your data will be maintained in a form that will be useful over the long term
    c.        Pro‐actively participate in professional organizations to create best practices and selection priorities.


    Library of Congress Digital Preservation Newsletter.

    Library of Congress Digital Preservation Newsletter. Library of Congress.  August 2011. [PDF]
    The newsletter includes information on:
    •  “Make it Work: Improvisations on the Stewardship of Digital Information,” 
    • All About Archiving the Web 
    • Possible uniform law on the authentication of online legal materials
    • Exploring Cultural Heritage Collections With Recollection
      • Recollection is a free and open source platform that lets archivists, librarians, scholars and curators create easy to navigate web interfaces (like maps, timelines, facets, tag clouds) to their digital, cultural heritage collections.
    • Finding digital preservation training.  The training calendar.
    • Digital Time Capsules and our "Digital Afterlife"
      • Creating and organizing personal digital content for future access.
    • The Signal: Library of Congress blog to discuss digital stewardship in a way that is informative and appealing.
      • Tending the machines
    • What skills does a digital archivist or librarian need?  Skills students need to compete in the archives and libraries job market.  Expertise with programming, formats and standards is, of course, very important.  But other talents have a greater bearing on success in today’s workplace. Such as:
      • an ability to understand and adapt to new ways of using technology
      • eagerness to help refine how things are done
      • a basic understanding of how the different system parts contribute to doing the job at hand
      • ability to bridge two distinct social camps: the highly technical and the highly not-technical
      • how choose among these tools and software options to meet the needs of users
      • communication skills, including presentation, writing, speaking and persuading
      • ability to social media and to integrate photographs, graphics and video with text to get the right message out to as many people as possible

    Thursday, August 04, 2011

    GE pushes ahead with 500GB holographic disc storage

    GE pushes ahead with 500GB holographic disc storage. Lucas Mearian. Computerworld. July 28, 2011.
    GE hopes to license its technology for a 500GB holographic disc.  This was first announced in 2009. They hope to create a 1 TB disc.  InPhase Technologies is also working on a 300GB holographic optical disk.
    [These are still a long way from implementation, if they ever do succeed.]

    Digital Preservation Courses & Workshops For Organizations and Institutions

    Digital Preservation Courses & Workshops For Organizations and Institutions. Library of Congress. August 2011.  The Library of Congress, as part of their outreach and education efforts, provide this calendar to help people access training related to digital preservation. The calendar can be sorted by date, course, format, location, and cost. To find out more about an offering, click on its title. The list includes a number that are free and online, which include:
    • Preserving Your Personal Digital Memories
    • Protecting Future Access Now: Models for Preserving Digitized Books and Other Content at Cultural Heritage Organizations
    • An Introduction to Digital Preservation

    Digital Preservation in a Box: NDSA Outreach

    Digital Preservation in a Box: NDSA Outreach. Butch Lazorchak. The Signal. August 3rd, 2011.
    A group has been working on what it calls “Digital Preservation in a Box.” This is an introduction to the concepts of preserving digital information through a suite of resources that are available to anyone planning an outreach event, presentation, or preparing to teach introductory digital preservation concepts.There are a few resources listed and it will develop over time.

    Poll: How do you back up your data?

    Poll: How do you back up your data?   Geoff Gasior. The Tech ReportFebruary 16, 2011.
    The more digital content we accumulate, the more we have to lose in the event of storage failure.  There are options. One poll showed how some people personally back up their data:
    • USB thumb drive: 4%
    • External hard drive: 35%
    • Optical disc: 4%
    • Memory card: 0.3% 
    • Tape: 1%
    • Network-attached storage device: 15%
    • Online backup service: 4%
    • Multiple methods: 19%
    • Gonna lose everything if my hard drive dies: 20%