Saturday, April 25, 2015

MoMA’s Digital Art Vault

MoMA’s Digital Art Vault. Ben Fino-Radin. The Museum of Modern Art. April 14, 2015.
The museum is working to preserve and digitize its 4,000 videotape collection of analog video art. It has created a digital vault which consists of:
  1. the packager: analyzes all digital collections materials as they arrive; records the results in an obsolescence-proof text AIP stored with the materials themselves, and generates a checksum.
  2. the warehouse: a digital storage RAID system maintained by their IT department. This type of disk-based storage becomes "an untenable expense" with very large amounts of data. They project 1.2PB. It would be irresponsibly expensive to continue using hard drive storage, as it was not quite intended for this scale of data.
  3. the indexer, which is not discussed at present.
There is the issue of how to ensure that our successors will understand what a given stream of bits is supposed to represent, and there is also the problem of authenticity.  These archival packages contain the digital content as well as the information needed in the future to understand what the materials are and to confirm their authenticity.

Friday, April 24, 2015

PREFORMA Starts Prototyping Phase

PREFORMA Starts Prototyping Phase. OPF Blog. 22 April 2015.
The PERFORMA prototyping phase has started with three groups that will work on:
  1. the compliance checker for the PDF/A standard for documents; 
  2. the TIFF standard for digital still images; and
  3. a set of open source standards for moving images
This phase will last until December 2016. It is important that libraries and archives understand what is in the digital objects they are preserving.  These tools will increase the knowledge about these formats.

Format Migrations at Harvard Library: An NDSR Project Update

Format Migrations at Harvard Library: An NDSR Project Update. Joey Heinen. The Signal. April 17, 2015.
Digital material is just as susceptible to obsolescence as analog formats. There are different strategies that can be implemented, and they are developing a format migration framework for migration projects at Harvard. The viability of this framework will be tested by migrating three obsolete formats within the Digital Repository Service: Kodak PhotoCD, SMIL playlists and RealAudio. A first step is determine the stakeholders and responsible parties, since a digital preservation project cannot begin without knowing the stakeholders.

A diagram of the Migration Workflow shows each step of the process from gathering documentation for initial analysis to ingest of the migrated content into the repository. A Migration Pathway diagram shows how content will be transformed by a migration. They hope that analyzing the technical and infrastructural challenges of each format and putting this into a template that can be adapted will help the digital preservation field.

Thursday, April 23, 2015

WebPreserver - Collect Web & Social Media as Legally Admissible Evidence

WebPreserver - Collect Web & Social Media as Legally Admissible Evidence. Website. Apr 08, 2015.
The software package, WebPreserver, uses a Chrome web browser plugin and web-based platform to collect authenticated snapshots of websites, blogs and social media accounts like Facebook, Twitter and Google+ easy. The screen captures, source-code and metadata are authenticated with a 256-bit digital signature and time stamp comply with the Federal Rules of Evidence and other regulatory requirements. Users can organize, tag, collaborate, search, print on demand or download the files as PDF's or Warc files.

Wednesday, April 22, 2015

Because digital preservation won’t just go away

Jisc Archivematica project update ...because digital preservation won’t just go away. Jenny Mitcham. Digital Archiving at the University of York blog. 17 April 2015.
Investigating Research Data Management; recognize that the tools used by digital archivists could have much to offer those who are charged with managing research data. The research will be based around the following questions:
  1. Why are we bothering to 'preserve' research data. What are the drivers here and what are the risks if we don't?
  2. What are the characteristics of research data?
  3. How might research data differ from other born digital data that institutions are archiving and preserving?
  4. What types of files are researchers producing?
  5. How would we incorporate the system into a wider technical infrastructure for research data management and what workflows would we put in place?
Previously they had conducted a survey that looked specifically at software packages used by researchers. They are investigating how existing digital preservation tools would handle these types of data and if these appear in Pronom.

University of Sheffield Selects Ex Libris Rosetta

University of Sheffield Selects Ex Libris Rosetta. Press Release. April 21, 2015.
The University has selected the Rosetta digital asset management and preservation solution to "ensure the sustainable preservation of the digitised objects created by the University Library’s Special Collections Department and the National Fairground Archive—a unique collection of videos, texts, audio files, and pictures about the culture and history of travelling fairs and entertainment—as well as the large collection of University’s born-digital material which includes scholarly research data, administrative records, and departmental publications."

“In line with the University’s strategy to establish an enduring digital archive, Rosetta will enable us to develop a sustainable digital preservation programme, underpinned by a full lifecycle infrastructure for the management and preservation of digital objects.”

Saturday, April 18, 2015

Digital Curation and Doctoral Research: Current Practice

Digital Curation and Doctoral Research: Current Practice. Daisy Abbott. International Journal of Digital Curation. 10 February 2015.[PDF]
More doctoral students are engaging in research data creation, processing, use, management, and preservation activities (digital duration) than ever before. Digital curation is an intrinsic part of the skills that students are expected to acquire.

Training in research skills and techniques is the key element in the development of a research student. The integration of digital curation into expected research skills is essential. Doctoral supervisors "should discuss and review research data management annually, addressing issues of the capture, management, integrity, confidentiality, security, selection, preservation and disposal, commercialization, costs, sharing and publication of research data and the production of descriptive metadata to aid discovery and re-use when relevant." Those supervisors may not necessarily have those skills themselves. And there is a gap in the literature about why and how to manage, curate, and preserve digital data as part of a PhD program.

While both doctoral students and supervisors can benefit from traditional resources on the topic, the majority of guidance on digital curation takes the form of online resources and training programs. In a survey,
  • over 50% of PhD holders consider long-term preservation to be extremely important. 
  • under 40% of students consider long-term preservation to be extremely important.
  • 90% of doctoral students and supervisors consider digital curation to be moderately to extremely important. 
  • Yet 74% of respondents stated that they had limited or no skills in digital curation and only 10% stated that they were “fairly skilled” or “expert”. 
And generally researchers were not are of the digital curation support services that are available. The relatively recent emphasis on digital curation in research nature of or the processes, present problems for supervisors. Developing the appropriate skills and knowledge to create, access, use, manage, store and preserve data should therefore be considered an important part of any researcher’s development. Efforts should be taken to
  • Ensure practical digital curation is understood
  • Encourage responsibility for digital curation activities in institutional support structures
  • Increase the discoverability and availability of digital curation support services

Tech talk in the archives: how can we redefine our processes & priorities in the digital age?

Tech talk in the archives: how can we redefine our processes and priorities  in  the  digital  age?  Erin  O’Meara. PASIG Presentation. March 12, 2015. [PDF]
Tech talk for archives usually revolves around workflow, data clean up and translation, infrastructure needs, and digital content pathways. Traditional archives have a physical focus, work is long term and ongoing, and priorities are established based on perceived use. For digital archives, the focus is on the digital objects, which tends to be more of a project based approach that needs dependency analysis before work is done. Prioritization is based on a grid of preservation needs, use and access, and an impact on the larger repository.

Get to know your complete digital holdings (servers as well as boxes of disks); the infrastructure, your capabilities for acquiring, processing and preserving digital holdings; staff and their training needs; and the gaps in all of this. Your processes need to change from "as-is" to "to-be".
  • Gain momentum and resources "building the ship while flying"
  • Geta business analysis and outsider perspective
  • Manage processes and know when to change
  • Continue building, and allow the archive to grow and mature
  • Allow time to review and reflect
  • Clarify and integrate roles between archive and tech staff

In determining your directions:
  • State the  larger goal then document  it
  • Break it down into executable chunks
  • Engage and educate stakeholders and leadership
  • Talk to colleagues outside your workplace
  • Curiosity and tinkering  encouraged
  • Give your team a break and recognition

Friday, April 17, 2015

Trustworthiness of Preservation Systems

Trustworthiness  of  Preservation Systems. David  Minor. PASIG Presentation. March 11, 2015. [PDF]
We  all  want  to  trust  systems, especially preservation  systems. Trust is an iterative process to verify and clarify. The principles of trust include:
  •  Institutional commitment to collections
  •  Infrastructure demands
  •  Technical system and staffing capabilities
  •  Sustainability (particularly funding, technology, collaboration)
  •  Identify and communicate risks to content, examining “what if” questions

There are three levels of auditing
  •  "Basic certification” is a simple self assessment
  •  "Extended certification" represents a plausibility checked assessment
  •  "Formal certification" is an audit driven by external experts

Major auditing frameworks include:
  •  Data Seal of Approval (Basic)
  •  nestor (Extended)
  •  TRAC/ISO 16363 (Formal)
  •  DRAMBORA (Range)

  1.  Identify organizational context
  2.  Document policy and regulatory framework
  3.  Identify activities, assets, and their owners
  4.  Identify risks
  5.  Assess risks
  6.  Manage risks
In the future, we need to know how these audit frameworks apply to distributed digital preservation environments, and how flexible the questions and the audit models are.

Thursday, April 16, 2015

Sony and Memnon announce partnership to enhance digital preservation capabilities

Sony and Memnon announce partnership to enhance digital preservation capabilities. Press release. April 13, 2015.
The partnership is offering their technology and experience in delivering large-scale digital preservation projects involving audio, video and film content. Some of the existing customers include Danish Radio, the British Library, Bibliothèque Nationale de France and Indiana University, BBC Worldwide and Sony Pictures. The need for large-scale digital preservation in organizations have accelerated due to the continuous physical deterioration of media carriers and the need for interoperability, lower storage costs and a stable long-term digital storage format. The research suggests that only 21% of broadcasters have completed digitisation of their tape libraries, and other the others, they average more than 100,000 legacy tape. “As a result, many content owners have assets that are literally depreciating, yet simultaneously have increased opportunities for reusing and monetising their digital content, once it is made readily accessible.”

“The time to tackle this challenge is undoubtedly now, but any successful digital preservation project is reliant on proven technological and operational expertise. We believe that large-scale digitisation is a distinct discipline that requires industrial processes and methodologies for high efficiency and consistent quality.

Wednesday, April 15, 2015

Tracking Digital Collections at the Library of Congress, from Donor to Repository

Tracking Digital Collections at the Library of Congress, from Donor to Repository. Mike Ashenfelder. The Signal, Library of Congress. April 13, 2015.
An interesting look at the processing of content by the Library of Congress specialists.
When a collection is first received the contents are reviewed and if digital media devices are found, they are transferred to the digital collections registrar, who then records that the materials were received, including the collection name, collection number, a registration number and any additional notes. The following tasks are performed:
  1. Physical inventory of the storage devices (and photograph of the medium)
  2. Write protecting, documenting, and transfer of the files using the Bagit tool
    1. a directory containing the file or files (data)
    2. a checksummed manifest of the files in the bag
    3. a “bagit.txt” file
  3. The content is cataloged, described, and inventoried. 
  4. Transfer of the files to the Library’s digital repository for long-term preservation.
If there are difficulties accessing the content, other tools can be used, such as the Forensic Recovery of Evidence Device (FRED), the Forensic Toolkit, or BitCurator. The final step is to shelve the original digital hardware and software for preservation.

Researchers visiting the Library of Congress can access copies of some of the digital collections but access depends on copyright and the conditions established by the collection donor. There are also technological challenges to serving up records.  Access is currently available only onsite. Also, the Library does not have the software or drives to read every file format. Not all researchers require a perfect rendering of the original file. A lot of researchers "are just interested in the information. They don’t care what the file format is. They want the information.”  For the Library, access and appraisal of digital collections is an ongoing issue.

Tuesday, April 14, 2015

Digital curation and quality standards for memory institutions: PREFORMA research project

Digital curation and quality standards for memory institutions: PREFORMA research project. Antonella Fresa, Börje Justrell, Claudio Prandoni. Archival Science. 25 Mar 2015.  [PDF]
Memory institutions are facing increasing content for long-term preservation. The intention of PREFORMA project (PREservation FORMAts for culture information/e-archives) is to establish a long-term sustainable ecosystem around a range of practical tools with the stakeholders. According to the recent European Cultural Heritages study, 
  • 3 % of institutions studied have a written digital preservation strategy (from 44 % for national libraries to 12–25 % for museums)
  • About a third of the institutions are included in a national preservation strategy
  • 40 % of national libraries say there is no national digital preservation strategy
  • 30 % of institutions are included in a national digital preservation infrastructure
There are barriers for digitization and curation, such as: 
  • cost of digitization
  • high costs of digital preservation, due to the use of separate solutions implemented by each memory institution
  • cultural content is complex
Digital preservation is defined by the Digital Preservation Europe project as ‘‘a set of activities required to make sure digital objects can be located, rendered, used and understood in the future’’. To be used meaningfully in the future, digital objects should be preserved in a context that makes them usable and understandable for future users. The  preservation  of  digital  information  is  an  ongoing  action,  to  be periodically revised, in order to update data sets and metadata formats.

In the area of creation and appraisal of digital objects, there are three identifiable
major work areas:

  1. Standardisation of the communication between the producer and the archive;
  2. Development of tools supporting generation and transformation of metadata;
  3. Development of tools for automated or semi-automated appraisal of data.
Important aspects for digital preservation during ingest are:
  1. File format
  2. Authenticity, integrity and provenance  data 
  3. Completeness of metadata  accompanying   digital  objects  
  4. Transformation of objects that may be necessary 
The PREFORMA activities intend to allow institutions control  over  the  technical  properties  of  preservation  files through an open-source conformance checker and creating an ecosystem around the implementation for specific file formats. The first activity is to develop an open-source toolset for conformance checking of digital  files. The second activity is to establish a network of common interest in order to gain control over the technical properties of preservation files

Preservation and access to cultural heritage materials participates in movement towards ‘‘unlocking the full value of scientific data’’. 

Saturday, April 11, 2015

Digital preservation as a service

Digital preservation as a service. Steve Knight. National Library of New Zealand. March 30th, 2015.
Digital Preservation as a Service (DPaaS) is a joint project of National Library of New Zealand and Archives New Zealand to determine how to best approach digital preservation and leverage the government’s investment to date.

Digital preservation requires interaction with all the organisation’s processes and procedures and institutional support for appropriate resources. It is:
  • the active management of digital content over time to ensure ongoing access
  • a ‘series of managed activities necessary to ensure continued access to digital materials for as long as necessary’ despite ‘the obsolescence of everything’
Digital preservation is not:
  • backup and disaster recovery – these are short term business functions 
  • only about access or ‘open access’ 
  • ‘an afterthought’
We are trying to ensure against loss, against wasted time and money when systems are not built with long term needs in mind. We need a sustainable safekeeping model for digital assets – is a national level digital preservation service the answer? A nation-wide approach will:
  • ensure the long term safekeeping of a greater range of New Zealand’s social, cultural, scientific and economic digital assets
  • leverage investment to date
  • reduce duplicate investment
  • support a strategic response to issues related to data use and re-use 
By working at a national scale, we can provide the digital preservation capability and capacity that’s not currently available.

Friday, April 10, 2015

Cloud storage for preservation

Archiving On-Premise and in the Cloud. Joseph Lampitt, Oracle. PASIG Presentation. March 2015. [PDF]
Cloud Storage is storage accessed over a network via web services APIs. For digital preservation storage, one option is the 3-2-1 Rule (3 copies, 2 mediums, 1 offsite).

Benefits of Cloud Storage
  • Limitless scalability
  • Custom metadata
  • Single namespace
  • Simplified management
Preservation Considerations with Cloud Storage include:
  • System / cloud performance
  • Security
  • Infrastructure and investment
  • Stability and longevity of the solution
  • Descriptive metadata 
  • Fixity and where that happens
  • System security and access control  
  • Audit Event Tracking (e.g. maintaining records of actions associated with an asset)
  • Version control so that originals are unchanged
There are trade offs between on site and cloud solutions. The business needs should drive the choice of solutions. It is reported that 90% of an organization's data is passive. Charts comparing the cost of cloud storage to on site storage. “Glacier is almost 10 times as expensive as an on premise tape system with support.”

Thursday, April 09, 2015

Digital Preservation: We Know What it Means Today, But What Does Tomorrow Bring?

Digital Preservation: We Know What it Means Today, But What Does Tomorrow Bring? Randy Kiefer's presentation.  UKSGLive. April 3, 2015.
Long-term preservation refers to processes and procedures required to ensure content remains accessible well into the future. Publishers want to be good stewards of their content. Digital Preservation is an "insurance policy" for e-resources. Commercial hosting platforms and aggregators are not preservation archives! They can remove discontinued content from their system, and commercial businesses may disappear, as with Metapress. There are global digital preservation archives, such as CLOCKSS and Portico, and regional archives, such as National Libraries.
The biggest challenges in the future are: formats (especially presentation of content; and what to do with databases, datasets and supplementary materials. "Any format can be preserved, including video. The issue is that of space, cost and presentation (especially if the format is now not in use/supported)." There are legal issues with cloud based preservation systems. There is no legal precedent with a cloud-based preservation system, and no protection with regards to security.

Monday, April 06, 2015

NYU Libraries to Team with Internet Archive to Preserve High Quality Musical Content on the Web

NYU Libraries to Team with Internet Archive to Preserve High Quality Musical Content on the Web. Christopher James. Press Release. March 27, 2015.
This collaboration is to ensure that the websites of musical composers can be collected, preserved, and made accessible in the future. The project will preserve objects with sound and visual quality at a significantly higher level than current web archiving standards. The project is funded with a grant of from The Andrew W. Mellon Foundation. Since master-level recordings are rarely available on the  Internet, the music is usually delivered in lower quality compressed formats, such as MP3. A specific aim of the project is to develop protocols for obtaining master-level recordings and integrating them into the archival copies on the websites.

Friday, April 03, 2015

Dutch digital developments

Dutch digital developments.  Digital Preservation Seeds. March 29, 2015.
There are national strategy plans to streamline and intensify initiatives concerning the digital heritage and to focus on collaboration between all “cultural heritage organizations’ in the Netherlands. The collaboration would look at the big organisations in specific areas offering services and assistance to  their colleagues from smaller organizations. Also by having shared initiatives across types of institutions, such as museums and archives, to make collected material more visible to the public.
The goals of the working groups are:
  1. Making digital heritage visible. Identify what the public expects expects from digital heritage and how they want to use it, and how to promote the digital collections to make them more visible. 
  2. Making digital heritage usable. Look for ways to improve collections, the find-ability, and to work together with researchers to improve search facilities.
  3. Preserve digital heritage for the long term. The infrastructure for digital preservation needs to be developed and to use already existing experience and facilities.
Achieving these goals will hopefully lead to an integrated approach to improve the access and preservation of our digital heritage. 

Thursday, April 02, 2015

Preservation Policy for Humans

Preservation Policy for Humans. Nick Ruest, Stephen Marks. Presentation, PASIG 2015. March 2015. [PDF].
Some questions to think about when creating the digital preservation policy:
  1. Where should I start?
  2. What do I care about?
  • Identify the aspects that are important and prioritize them.
  • Develop a collection development policy by getting input from your community. Identify your primary and secondary communities and determine that is important to them.
  • Determine what you can preserve, identify the scope of your efforts, and what you have the right to preserve.
  • Identify the size and type of objects that your infrastructure can preserve now, and what it may be able to support in the future.
  • When preserving objects, maintain the integrity, authenticity and usability of the objects over time. It is important for preservation that your actions are consistent with your plan.
The plan should address:
  1. The general approach
  2. The tools available
  3. The methods of applying the tools
With digital preservation there are different levels that can be implemented.
"There is no single right solution."

Digitization Challenges – A Discussion in Progress

Digitization Challenges – A Discussion in Progress. Merrilee Proffitt. Blog. OCLC Research. March 23, 2015.
There are challenges faced by libraries digitizing collections, such as dealing with born digital materials, storage and preservation, web harvesting, and others. Their recent discussion looked at these topics:
  • Metadata: Item-level description vs collection descriptions. The challenge is digitizing archival collections at the item or page level when the descriptions are at a collection level. How can we engage scholars to help with the description if the resources are outside the library?
  • Process management / workflow / shift from projects to programs. There are challenges to establish workflows to meet different needs. Some are transitioning from projects to programs.
  • Selection – prioritizing users over curators and funders. "Many institutions are still operating under a model whereby curators or subject librarians feed the selection pool" even though surveys indicate selection should move towards directly serving the needs of the users.
  • Audio/Visual materials. Making these available is a concern because differing levels of interest, high costs, reformatting capacities, and need for accompanying transcriptions. 
  • Access: are we putting things where scholars can find them. Are collections discoverable from Google or the institution? What are the users' experiences in using the collection?

Wednesday, April 01, 2015

Mining the Archives: Metadata Development and Implementation

Mining the Archives: Metadata Development and Implementation. Martin White. Ariadne.
13 February 2015.
This is review of articles in the Ariadne archives on metadata. Michael Day, Metadata Officer at UKOLN, contributed a short paper to Ariadne on the implications of metadata for digital preservation. He set out five important questions which still represent challenges for the profession:
  • Who will define what preservation metadata are needed?
  • Who will decide what needs to be preserved?
  • Who will archive the preserved information?
  • Who will create the metadata?
  • Who will pay for it?
The challenges of metadata development and implementation are substantial. A paper “Application Profiles: Mixing and Matching Metadata Schemas” talks about the roles of those making the metadata standards and those using them:
Both sets of people are intent on describing resources in order to manipulate them in some way. Standard makers are concerned to agree a common approach to ensure inter-working systems and economies of scale. However implementors, although they may want to use standards in part, in addition will want to describe specific aspects of a resource in a “special” way. Although the separation between those involved in standards making and implementation may be considered a false dichotomy, as many individuals involved in the metadata world take part in both activities, it is useful to distinguish the different priorities inherent in the two activities.

Tuesday, March 31, 2015

PERICLES Environment Extraction Project

PERICLES Environment Extraction Project and tools. Website. 12/11/2014.
The PERICLES project is trying to keep digital content accessible when the digital environment continues to change. The website discusses environment information, what it is, why it is important, and how to collect it.

Digital objects are created and exist in environments and the information about it can be important to their current and long term use and re-use of the content. This information, which needs to be collected at creation and throughout the object's life cycle time, is very relevant for preserving the data long-term. Most metadata standards describe the object but ignore the environment. Some examples of environmental information include dependencies (what you need in order to use the object), environment reconstruction, resource status, validation, monitoring and extraction techniques.

The PERICLES Extraction Tool (PET), as discussed in an article in D-Lib Magazine by Fabio Corubolo, has been created to extract environmental information from where objects are created and modified. It analyses the use of the data within the environment that may not be available later.

Sheer curation (as in lightweight or transparent) depends on data capture being embedded within the data creators’ working practices so that it is automatic and invisible to them..

Monday, March 30, 2015

Digital Preservation Challenges with an ETD Collection: A Case Study at Texas Tech University

Digital Preservation Challenges with an ETD Collection — A Case Study at Texas Tech University. Joy M. Perrina, Heidi M. Winkler, Le Yanga. The Journal of Academic Librarianship. January, 2015.
The potential risk of loss seems distant and theoretical until it actually happens. The "potential impact of that loss increases exponentially" for a university when the loss is part of the research output. This excellent article looks at a case study of the challenges one university library encountered with its electronic theses and dissertations (ETDs).  Many institutions have been changing from publishing paper theses and dissertations to accepting electronic copies. One of the challenges that has not received as much attention is that of preserving these electronic documents for the long term.  The electronic documents require more hands-on curation.

Texas Tech University encountered difficulties with preserving their ETD collection. They hope the lessons learned from these data losses will help other organizations looking to preserve ETDs and other types of digital files and collections. Some of the losses were:
  1. Loss of metadata edits. Corrupted database and corrupted IT backups required a rebuild of the database, but the entered metadata was lost.
  2. Loss of administrative metadata-embargo periods. The ETD-db files imported into DSpace did not include the embargoed files. Plans were not documented and personnel changed before the problem was discovered. Some items were found accidentally on a personal drive years later.
  3. Loss of scanned files. The scanning server was also the location to store files after scanning. Human error beyond the backup window resulted in the deletion of over a thousand scanned ETDs, which were eventually recovered.
  4. Failure of policies: loss of embargo statuses changes. The embargo statement recorded in the ETD management system did not match what was published in DSpace.
The library started on real digital preservation for the ETD collection. Funds were set aside to increase the storage of the archive space and provide a second copy of the archived files. A digital resources unit was created to handle the digital files which finally brought the entire digital workflow, from scanning to preservation, under one supervisor. The library joined DPN in hopes that it would yield a level of preservation far beyond what the university would be able to accomplish alone. The clean-up of the problems has been difficult and will take years to accomplish. Lessons learned:
  1. Systems designed for managing or publishing documents are not preservation solutions
  2. System backups are not reliable enough to act as a preservation copy. Institutions must make digital preservation plans beyond backups
  3. Organizations with valuable digital assets should invest in their items to store them outside of a display system only. 
  4. Multiple copies of digital items must reside on different servers in order to guarantee that files will not be accidentally deleted or lost through technical difficulties. 
  5. All metadata, including administrative data, should be preserved outside of the display system. The metadata is a crucial part of the digital item.
  6. Digital items are collections of files and metadata.
  7. Maintaining written procedures and documentation for all aspects of digital collections is vital.
  8. The success of digital preservation will require collaboration between curators and the IT people who maintain the software and hardware, and consistent terminology (e.g. archived).
 "Even though this case study has primarily been a description of local issues, the grander lessons gleaned from these crises are not specific to this institution. Librarians are learning and re-learning every day that digital collections cannot be managed in the same fashion as their physical counterparts. These digital collections require more active care over the course of their lifecycles and may require assistance from those outside the traditional library sphere...."


Tabula. Website. march 27, 2015.
Tabula is a tool for working with text based data tables inside PDF files. There's no easy way to copy-and-paste rows of data out of PDF files. This tool allows you to extract that data into a Excel spreadsheets, csv, or JSON using a simple interface. Tabula works on Mac, Windows and Linux.

Friday, March 27, 2015

National Institutes of Health: Plan for Increasing Access to Scientific Publications and Digital Scientific Data from NIH Funded Scientific Research

National Institutes of Health: Plan for Increasing Access to Scientific Publications and Digital Scientific Data from NIH Funded Scientific Research. February 2015.

This document describes NIH’s plans to build upon and enhance its longstanding efforts to increase access to scholarly publications and digital data resulting from NIH-funded research. Sections relevant to digital preservation and long term management:

NIH intends to make public access to digital scientific data the standard for all NIH funded research. Following adoption of the final plan, NIH will:
  • Explore steps to require data sharing.
  • Ensure that all NIH-funded researchers prepare data management plans and that the plans are evaluated during peer review.
  • Develop additional data management policies to increase public access to designated types of biomedical research data.
  • Encourage the use of established public repositories and community-based standards.
  • Develop approaches to ensure the discoverability of data sets resulting from NIH-funded research to make them findable, accessible, and citable.
  • Promote interoperability and openness of digital scientific data generated or managed by NIH.
  • Explore the development of a data commons. NIH will explore the development of a commons, a shared space for basic and clinical research output including data, software, and narrative, that follows the FAIR principles of Find, Access, Interoperate and Reuse.

Preservation is one of the Public Access Policy’s primary objectives. It wants to ensure that publications and metadata are stored in an archival solution that:
  • provides for long-term preservation and access to the content without charge; 
  • uses standards, widely available and, to the extent possible, nonproprietary archival formats for text and associated content (e.g., images, video, supporting data); 
  • provides access for persons with disabilities
The content in the NIH database is actively curated  using XML records which is future proof, in that XML is technology independent and can be easily and reliably migrated as technology evolves. 

The first principle behind the plan for increasing access to digital scientific data is: The sharing and preservation of data advances science by broadening the value of research data across disciplines and to society at large, protecting the integrity of science by facilitating the validation of results, and increasing the return on investment of scientific research.

Data Management Plans
Data management planning should be an integral part of research planning.  NIH wants to ensure that all extramural researchers receiving Federal grants and contracts for scientific research and intramural researchers develop data management plans describing how they will provide for long-term preservation of, and access to, scientific data in digital formats resulting from federally funded research, or explaining why long-term preservation and access cannot be justified. In order to preserve the balance between the relative benefits of long-term preservation and access and the associated cost and administrative burden, NIH will continue to expect researchers to consider the benefits of long-term preservation of data against the costs of maintaining and sharing the data.

NIH will assess whether the appropriate balance has been achieved in data management plans between the relative benefits of long-term preservation and access and the associated cost and administrative burden. It will also develop guidance with the scientific community to decide which data should be prioritized for long-term preservation and access. NIH will also explore and fund innovative tools and services that improve search, archiving, and disseminating of data, while ensuring long-term stewardship and usability.

Assessing Long-Term Preservation Needs
NIH will provide for the preservation of scientific data and outline options for developing and sustaining repositories for scientific data in digital formats.  The policies expect long-term preservation of data.
Long-term preservation and sustainability will be included in data management plans and will collaborate with other agencies on how best to develop and sustain repositories for digital scientific data.

Siegfried v 1.0 released (a file format identification tool)

Siegfried v 1.0 released (a file format identification tool). Richard Lehane. Open Preservation Foundation. 25th Mar 2015. Siegfried is a file format identification tool that is now available. The key features are:
  • complete implementation of PRONOM (byte and container signatures)   
  • reliable results
  • fast matching without limiting the number of bytes scanned
  • detailed information about the basis for format matches
  • simple command line interface with a choice of outputs (YAML, JSON, CSV)
  • a built-in server for integrating with workflows 
  • options for debug mode, signature modification, and multiple identifiers.

Thursday, March 26, 2015

Letter to the editor concerning digital preservation of government information

DttP letter to the editor re digital preservation of government information. James R. Jacobs.  ALA Connect. January 26, 2015.
Digital preservation is an incredibly important topic for government information professionals. This letter, in response to previous article, includes several important points for all libraries.
  1. Preservation of born-digital information is a very real and important topic that the government documents community needs to understand and address. In a single year, more government information is born-digital than all the printed government information accumulated by all Federal Deposit libraries in over 200 years.
  2. Digitization of print information is not a preservation solution. Instead it creates new digital preservation challenges and is really just the first of many costly and technically challenging steps needed to ensure long-term access to content.
  3. Access is not preservation; it does not guarantee preservation or long-term access. 
    1. Access without preservation is temporary, at best. 
    2. Preservation without access is an illusion.
  4. Digital preservation is an essential activity of libraries. It cannot be dismissed as the responsibility of others. Digital preservation requires:
    1. resources, 
    2. a long-term commitment,
    3. an understanding of the long-term value of information (even information that is not popular or used by many people), 
    4. a commitment to the users of information.  
  5. Relying solely on the government or others to preserve its information is risky. “Who is responsible for this preservation?” Libraries should take this responsibility. Libraries can take actions now to promote the preservation of digital information
  6. Preserve Paper copies. The Federal Depository Library Program (FDLP) is successfully preserving paper and micro-form documents. "We often hear that “digitizing” paper documents will “preserve” them, but we do not need to convert these documents to digital in order to preserve them". While digitization can provide better access, usability, and re-usability of many physical documents, it does not guarantee the preservation of the content. Worse, there are repeated calls for digitizing paper collections so that the paper collections can be discarded and destroyed. Such actions will endanger preservation of the content if they do not include adequate steps to ensure digital preservation of those newly created digital objects. 
  7. Smart-Archive the Web. Although capturing web pages and preserving them is far from an adequate (or even accurate) form of digital preservation, it is a useful stop-gap until producers understand that depositing preservable digital objects with trusted repositories is the only way to guarantee preservation of their information. Libraries should use web archiving tools and services such as Archive-It.
  8. Promote Digital Preservation. Libraries should be actively preserving digital government information. The time of 'passive digital preservation' or looking to others to take care of digital preservation is long past. We can work with others, not leave the work to them.

Release of jpylyzer 1.14.1

Release of jpylyzer 1.14.1 versionNational Library of the Netherlands / Open Preservation Foundation. 25 March 2015.
Release of a new version of jpylyze. The tool validates that a JP2 image really conforms to the format’s specifications. It also is a feature (technical characteristics) extractor for JP2 images. Changes include: Improved XML output and Recursive scanning of directory trees.

Sowing the seed: Incentives and Motivations for Sharing Research Data, a researcher's perspective

Sowing the seed: Incentives and Motivations for Sharing Research Data, a researcher's perspective. Knowledge Exchange. November 2014. PDF.
This study has gathered evidence, examples and opinions on incentives for research data sharing from the researchers’ point of view. Using this study will help provide recommendations on developing policies and best practices for data access, preservation, and re-use. A emerging theme today is to make it possible for all researchers to share data and to change the collective attitude towards sharing.

A DCC project, investigating researchers’ attitudes and approaches towards data deposit,
sharing, reuse, curation and preservation found that the data sharing requirements should be defined at the finer-grained level, such as the research group.When researchers talk about ‘data sharing’ there are different modes of data sharing, such as:
  1. private management sharing, 
  2. collaborative sharing, 
  3. peer exchange, 
  4. sharing for transparent governance, 
  5. community sharing and 
  6. public sharing.
Important motivations for researchers to share research data are:
  1. When data sharing is an essential part of the research process; 
  2. Direct career benefits (greater visibility and recognition of one’s work, reciprocal data)
  3. As a normal part of their research circle or discipline;
  4. Existing funder and publisher expectations, policies, infrastructure and data services
Some points on preservation of research information for research institution and research funders:
  • Recognize and value data as part of research assessment and career advancement
  • Set preservation standards for data formats, file formats, and documentation
  • Develop clear policies on data sharing and preservation 
  • Provide training and support for researchers and students to manage and share data so it becomes part of standard research practice.
  • Make all data related to a published manuscript available
Actions of some organizations regarding data management and preservation:
  • The Royal Netherlands Academy of Arts and Sciences requests its researchers to digitally preserve research data, ideally via deposit in recognised repositories, to make them openly accessible as much as possible; and to include a data section in every research plan stating how the data produced or collected during the project will be dealt with.
  • The Alliance of German Science Organisations adopted principles for the handling of research data, supporting long-term preservation and open access to research data for the benefit of science.
  • Research organizations receiving EPSRC funding will from May 2015 be expected to have appropriate policies, processes and infrastructure in place to preserve research data, to publish metadata for their research data holdings, and to provide access to research data securely for 10 years beyond the last data request.
  • The European Commission has called  for coordinated actions to drive forward open access, long-term preservation and capacity building to promote open science for all EC and national research funding.
  • The UK Economic and Social Research Council has mandated the archiving of research data from all funded research projects. This policy goes hand in hand with the funding of supporting data infrastructure and services. The UK Data Service provides the data infrastructure to curate,
  • preserve and disseminate research data, and provides training and support to researchers.

Wednesday, March 25, 2015

I tried to use the Internet to do historical research. It was nearly impossible.

I tried to use the Internet to do historical research. It was nearly impossible. February 17, 2015. 
How do you organize so much information? So far, the Internet Archive has archived more than 430,000,000,000 web pages. It’s a rich and fantastic resource for historians of the near-past. Never before has humanity produced so much data about public and private lives – and never before have we been able to get at it in one place. In the past it was just a theoretical possibility, but now we have the computing power and a deep enough archive to try to use it.

But it’s a lot more difficult to understand than we thought. "The ways in which we attack this archive, then, are not the same as they would be for, say, the Library of Congress. There (and elsewhere), professional archivists have sorted and cataloged the material. We know roughly what the documents are talking about. We also know there are a finite number. And if the archive has chosen to keep them, they’re probably of interest to us. With the internet, we have everything. Nobody has – or can – read through it. And so what is “relevant” is completely in the eye of the beholder."

Historians must take new approaches to the data. No one can read everything, nor know what is even in the archive. Better sampling, specifically chosen for their historical importance, can give us a much better understanding. We need to ask better questions about how sites are constructed, what links exist between sites, and have more focused searches. And we need to know what questions to ask.

JHOVE Evaluation & Stabilisation Plan

JHOVE Evaluation & Stabilisation Plan. Open Preservation Foundation. March 2015.
JHOVE is an extensible software framework for performing format identification, validation, and characterization of digital objects. In February the JHOVE format validation tool was transferred to Open Preservation Foundation stewardship. Their initial review of JHOVE has been completed and the Evaluation & Stabilisation Plan is now available on the site. 

The main objective of our work to date has been to establish a firm foundation for future changes based on agile software development best practises. A further technical evaluation will be published in April that will also outline options for possible future development and maintenance tasks.

Tuesday, March 24, 2015

“An alarmingly casual indifference to accuracy and authenticity.” What we know about digital surrogates

“An alarmingly casual indifference to accuracy and authenticity.” What we know about digital surrogates. James A Jacobs. Free Government Information. March 1, 2015.
Post examines several articles concerning the reliability and accuracy of digital text extracted from printed books in five digital libraries: the Internet Archive, Project Gutenberg, the HathiTrust, Google Books, and the Digital Public Library of America.

In a study by Paul Conway of page images in the HathiTrust, he found 25%  of the 1000 volumes examined by Conway contained at least one page image whose content was “unreadable.” Only 64.9% of the volumes examined were considered accurate and complete enough to be considered “reliably intelligible surrogates.”  HathiTrust only attests to the integrity of the transferred file, and not to the completeness of the original digitization effort. 

The “uncorrected, often unreadable, raw OCR text” that most mass-digitization projects produce today, will be inadequate for future, more sophisticated uses. Libraries that are concerned about their future and their role in the information ecosystem should look to the future needs of users when evaluating digitization projects. Libraries have a special obligation to preserve the historic collections in their charge in an accurate form. 

Cites articles:

Monday, March 23, 2015

New policy recommendations on open access to research data

New policy recommendations on open access to research data. Angus Whyte. DCC News. 19 January, 2015.
Some of the recommendations of the RECODE case studies concerning open access to research data:
  • Develop policies for open access to research data
  • Ensure appropriate funding for open access to research data 
  • Develop policies and initiatives for researchers for open access to high quality data
  • Identify key stakeholders and relevant networks 
  • Foster a sustainable ecosystem for open access to research data
  • Plan for the long-term, sustainable curation and preservation of open access data
  • Develop comprehensive and collaborative technical and infrastructure solutions that afford open access to and long-term preservation of high-quality research data
  • Develop technical and scientific quality standards for research data
  • Address legal and ethical issues arising from open access to research data
  • Support the transition to open research data through curriculum-development and training 
Two things needed with open access to research data:
  1. coherent open data ecosystem
  2. attention to research practice, processes and data collections

Saturday, March 21, 2015

Reaching Out and Moving Forward: Revising the Library of Congress’ Recommended Format Specifications

Reaching Out and Moving Forward: Revising the Library of Congress’ Recommended Format Specifications. Ted Westervelt, Butch Lazorchak. The Signal. Library of Congress. March 16, 2015.
The Library has created the Recommended Format Specifications, which is the result of years of work by experts from across the institution because it is essential to the mission of the institution. The  Library is committed to making the collection available to its patrons now and for generations to come and must be able to determine the physical and technical characteristics needed to fulfill this goal. The Specifications have hierarchies of characteristics, physical and digital, in order to provide guidance and determine the level of effort involved in managing and maintaining content. In order to continue manage the materials, the Specifications must be constantly reviewed and updated and materials and formats change. An example is exploring the potential value of the SIARD format developed by the Swiss Federal Archives as a means of preserving relational databases.

A Geospatial Approach to Library Resources

A Geospatial Approach to Library Resources. Justin B. Sorensen. D-Lib Magazine. March/April 2015.
The fire insurance maps are a valuable resource. Digital versions of the original printed maps have been created and have been converted into georeferenced raster datasets, using ArcGIS software, aligning each map to its appropriate geospatial location to maintain consistent digital overlays for all of the historic maps. This allows the information to be displayed, expressed and presented in completely new ways. GIS can be one of the many tools libraries will have available to assist them in sharing their resources with others.

Friday, March 20, 2015

When checksums don't match...

When checksums don't match... Digital Archiving at the University of York. 2 February 2015.
Post about an example of files that had MD5 errors. Used various utilities to generate the check-sums for both MD5 and SHA1. One program showed a change, while another did not. However, when SHA1 was used, it showed that the files had different check-sums. Possibly an example of bit rot.

Forecasting the Future of Libraries 2015

Forecasting the Future of Libraries 2015. . American Libraries. February 26, 2015. 
While it’s nearly impossible to accurately predict the future, we can identify trends that can be key in understanding what the future might bring. It is important for libraries to spot trends and integrate them into their programs and services in order to remain useful and relevant. An article “Trending Now,” lists 5 trends that are worth looking at:
  1. Anonymity: it may help build community and is an increasingly important part of web interactions.
  2. Collective impact: organizations are adopting common agendas to address issues in the community. Librarians could become highly valued partners in collective-impact responses
  3. Fast casual: establishements incorporate customized services and products, and also inte­grate technology, with customer loyalty apps, online or mobile ordering, and mobile payments. Fast casual has advanced the growth of living-room-like flexible spaces (multiple and varied seating arrangements, easy-to-find power outlets) that accommodate social and business needs, and are tech­nologically savvy.
  4. Resilience: Resilience includes preparation for and rapid recovery from physical, social, and economic di­sasters, including natural disasters, terrorist at­tacks, or economic collapse.
  5. Robots: libraries have seen robots and robotics as a next wave for technology access and training, even lending robots to help users experience what might soon be a regular part of their futures. [They could also be places to learn more about technology.]
The trend library is designed to provide the library community with a centralized and regularly updated source for trends—including how they are developing; why they matter for libraries; and links to the reports, articles, and resources that can further explain their significance. As a collection, it will grow to include changes and trends across society, technology, education, the environment, politics, the economy, and demographics.  Makerspaces are playing an increasingly important role in libraries.

Another article “The Future, Today”addresses similar concepts:
  • Digital downloads, ebooks, personal content, and live programming together with books, periodicals, microfilm, audio, and video in today’s libraries. The library of the future will  support and en­hance navigation and exchange of these new forms of information. Library services must be delivered in ways that are digitally based or conveniently located in public places to help users with their busy schedules
  • Collections are being carefully consid­ered so as not to occupy too much square footage, leaving room for tech and social spaces, and a center for multiple activi­ties.  
  • Library staff in the future will be organized on the floor to be more effec­tive ‘information guides’ to help patrons.
  • There will be more flex­ible spaces for evolving services and forms of information offering.  
  • Libraries are no longer single-purpose repositories of books dedicated to quiet study. They have become dynamic hubs in various ways for the community of users

Tools for Discovering and Archiving the Mobile Web

Tools for Discovering and Archiving the Mobile Web. Frank McCown, Monica Yarbrough and Keith Enlow. D-Lib Magazine. March/April 2015.
Many websites have been adapted for access by smartphones and tablets. This has required web archivists to change the way they archive this ephemeral web content. A tool has been created using Heretrix called MobileFinder which can be used to automatically detect mobile pages when given a seed URL. It can be used as a web service.
There are three primary techniques used by websites to deliver mobile pages, and the results of a test to determine which technique was used:
  1. Using responsive web design  techniques to deliver the same content to both desktop and mobile devices: 68%
  2. Dynamically serving different HTML and CSS to mobile devices using the same URL: 24%
  3. Using different URLs to send out desktop or mobile pages: 8%
MobileFinder found in a test that 62% of randomly selected URLs had a mobile-specific root page. A web archiving tool needs to be aware of when these methods are being used so it doesn't miss finding mobile content.

Thursday, March 19, 2015

Falling Though the Cracks: Digital Preservation and Institutional Failures

Falling Though the Cracks: Digital Preservation and Institutional Failures. Jerome McDonough. CNI.  December 2014. Video.
A video that explores whether libraries, archives and museums are designed in a way to really provide long-term access to cultural heritage materials. Why are we doing digital preservation, how to do it better, how do we do librarianship better.  Looks at OAIS and the complexities of preserving cultural materials. Need to train people to have broader perspectives across different fields, such as librarians, archivists, and curators.

Trustworthiness: Self-assessment of an Institutional Repository against ISO 16363-2012

Trustworthiness: Self-assessment of an Institutional Repository against ISO 16363-2012. Bernadette Houghton. D-Lib Magazine. March/April 2015.
Digital preservation is a relatively young field, but progress has been made for developing tools and standards to better support preservation efforts. There is increased interest in standards for the audit and certification of digital repositories because researchers want to know they can trust digital repositories. Digital preservation is a long-term issue. The Trustworthy Repositories Audit and Certification (TRAC) checklist has been widely used as the basis of the activities. It later became ISO 16363 (based on the OAIS model) which contains 105 criteria in 3 areas:
  1. Organizational infrastructure (governance, structure and viability, staffing, accountability, policies, financial sustainability and legal issues)
  2. Digital object management (acquisition and ingest of content, preservation planning and procedures, information management and access)
  3. Infrastructure and security risk management (technical infrastructure and security issues)
 "Undertaking a self-assessment against ISO 16363 is not a trivial task, and is likely to be beyond the ability of smaller repositories to manage." An audit is an arms-length review of the repository, requiring evidence of compliance and testing to see that the repository is functioning as a Trusted Digital Repository.  Most repositories at this time are in an ad hoc, still-evolving situation. That is appropriate at this time, but a more mature approach should be taken in the future. The assessment process would rate features for: Full Compliance, Part Compliance, Not Compliant. The conclusions in the article include:
  • Self-assessment is time-consuming and resource-heavy, but a beneficial exercise
  • Self-assessment is needed before considering external certification. 
  • Certification is expensive.
  • Get senior management on board. Their support is essential.
  • Consider doing an assessment first against NDSA Levels of Digital Preservation  
  • Repository software may be OAIS-compliant, but it doesn't mean your repository is also
  • Not all ISO 16363 criteria have the same importance. Assess each criteria accordingly
  • ISO 16363 is based on a conceptual model and may not fit your exact situation
  • Determine in advance how deep the assessment will go.
  • Document the self-assessment from the start on a wiki and record your findings  

Wednesday, March 18, 2015

Storage is a Strategic Issue: Digital Preservation in the Cloud

Storage is a Strategic Issue: Digital Preservation in the Cloud. Gillian Oliver, Steve Knight. D-Lib Magazine. March/April 2015.
Many areas are mandating a 'cloud first' policy for information technology infrastructures. The article is a case study of the decision to outsource and its consequences. Some highlights:
  1. data held in archives must be expected to be both preserved and accessible beyond the commercial lifespan of any current technology or service provider.
  2. an approach to addressing serious risks (such as loss, destruction or corruption of data) that is based purely on financial reasons is not acceptable; it does not take into account the  preservation and custodial role of archives;
  3. there must be an explicit provision made for pre-defined exit strategies as well as effective monitoring and audit procedures
Two main challenges
  1. tensions between the information management and information technology perspectives. From the IT perspective the information managers were perceived as crossing boundaries into areas which were not of their concern.
  2. funding model. This change was a consequence of moving from the purchase of equipment for storage for use in house, to the provision of storage as a service.
"If most organisations lose a document, so long as they get the document back they're pretty happy. But because of digital preservation being what it is, you don't want to lose or corrupt any of the bits, they have to be exactly the way they were before." 
Cultural heritage institutions should investigate using storage as a service offerings, and also look ahead to utilizing other cloud based services. When making such decisions, you must be aware of the short term consequences of cost saving (i.e. increased burden on operating budgets) as set against potential long term benefits.

Tuesday, March 17, 2015

The news where you are: digital preservation and the digital dark ages

The news where you are: digital preservation and the digital dark ages. William Killbride. The informed. 25 February 2015.
Excellent post about the state of digital preservation and the existing concerns. "It’s undoubtedly true that preserving digital content through technological change is a real and sometimes daunting challenge.  Our generation has invested as never before in digital content and it is frankly horrifying when you consider what rapid changes in technology could do to that investment." We desperately need to raise awareness about the challenge of digital preservation so that solutions can be found and implemented.  Digital preservation is a concern for everyone. Many organizations depend on data but few have a mission to preserve that data. These are social and cultural challenges as well as technical challenges. We need better tools and processes. But the lack of skills is a bigger challenge than obsolescence.  The loss of data can cause major problems for countries, organizations and people.

The truth about contracts

The truth about contracts. Kevin Smith. Scholarly Communications at Duke. February 13, 2015.
This post looks at how to license student work for deposit in an institutional repository and also some basic truths about contracts and licenses.  It is a good statement of what contracts and licenses are. Contracts are business documents, intended to accomplish specific goals shared by the parties; they should clearly express the intentions of the parties involved.

Contracts can supersede copyright law "not because they are so 'big' but because they are small."   A contract is a “private law” arrangement by which two or more parties rearrange their relationship.  It need not be formal; it is simply the mechanism we use to arrange our relationships in a great many situations, including teaching situations that implicate the copyrights held by students.

A license is “a revocable permission to commit some act that would otherwise be unlawful". Not all licenses are contracts, but most are.

Main features of CERIF

Main features of CERIF. Website. March. 2015.
Common European Research Information Format (CERIF) is a concept about research entities and their relationships. The data-centric data model allows for a metadata representation of research entities, their activities, interconnections, such as research, and the results. It includes semantic relationships which enable quality maintenance, archiving, access and interchange of research information. It is very flexible and can be implemented in different ways. Today CERIF is used as
  • a model for implementation of a standalone Current Research Information System (CRIS)
  • as a model to define the wrapper around a legacy non-CERIF CRIS, and
  • as a definition of a data exchange format to create a common data warehouse from several systems
"Metadata is ... normally understood to mean structured data about resources that can be used to help support a wide range of operations. These might include, for example, resource description and discovery, the management of information resources and their long-term preservation.” 
Presentation for more information.

Saturday, March 14, 2015

Rosetta Metadata-extractor tool

Rosetta Metadata-extractor tool. Chris Erickson. March 10, 2015.
We have been looking at creating a Rosetta plugin to extract metadata from other types of image files, particularly Canon raw images. A tool on Github looked like it was a great starting place; it was created by Drew Noakes. This tool extracts many types metadata from many types of image files, including the Canon and Nikon raw images. This extractor tool metadata-extractor, is a Java library and works with Exif, IPTC, XMP, ICC and other metadata which may be present in a single image:

Metadata formats include:
  • Exif
  • IPTC
  • XMP
  • ICC Profiles
  • Photoshop fields
  • PNG properties
  • BMP properties
  • GIF properties
  • PCX properties
Types of files it will process:
  • JPEG
  • TIFF
  • WebP
  • PSD
  • PNG
  • BMP
  • GIF
  • ICO
  • PCX
  • Camera Raw
    Camera-specific "makernote" data is decoded for many types of digital cameras.
    Our plugin can extract the relevant metadata; we are working now to integrate that with Rosetta to process our SIP files.

    Friday, March 13, 2015

    Networked Information's Risky Future: The Promises and Challenges of Digital Preservation

    Networked Information's Risky Future: The Promises and Challenges of Digital Preservation. Amy Kirchhoff, Sheila Morrissey, and Kate Wittenberg. Educause Review. March 2, 2015.
    There has been tremendous growth in the amount of digital content, which has great benefits. But digital objects may be extremely short-lived without proper attention to preservation. "What are the odds that twenty years from now you will be able to find it and read it with whatever device and software you will be using then? What will be the cost to locate and reproduce the original files in a format that is usable in twenty years?" How do we ensure that our content is truly safe? There are a lot of questions to be answered. A few points:
    1. Near-Term Protection: Backup. Imperative for continuity of operations. Multiple copies in multiple locations will provide for near-term access. 
    2. Mid-Term Protection: Byte Replication. Create multiple identical copies of files and preferably store in other locations. Don't rely on special software for access. These byte replicas will provide content that is authentic and accessible for as long as the file formats remain usable.
    3. Long-Term Protection: Managed Digital Preservation. Establish policies and activities, including those above, to manage content over the very long term. 
    Four goals are key to successful managed digital preservation:
    • Usability: The objects must remain usable with current technology.
    • Discoverability: The objects must have metadata so they can be found by users over time.
    • Authenticity: The provenance of the content must be proven.
    • Accessibility: The content must be available for use by the appropriate community.
    An organization that wants to successfully preserve digital content needs to have, among other things:
    • A preservation mission
    • An infrastructure to support digital preservation
    • An economic model that can support preservation activities over time
    • Clear legal rights to preserve the content
    • Understand needs of stakeholders and users
    • A preservation strategy and policies consistent with best practices
    • A technological infrastructure that supports the selected preservation strategy
    • Transparency of preservation services, strategies, customers, and content
    The three items are required elements of long-term preservation and are appropriate steps in protecting content through preservation. There are best practices available. Institutions starting out should inventory their content, have good backups, and create a long term plan. There us still much to learn. "Ultimately, it is the responsibility of those who produce and care for valuable content to understand preservation options and take action to ensure that the scholarly record remains secure for future generations."