Thursday, March 31, 2016

Floppy disks and modern gadgets: Keeping a safe distance

Floppy disks and modern gadgets: Keeping a safe distance.  Isaiah Beard. Page2Pixel. March 25th, 2016.
     In preserving older, born-digital documents and data, a common situation is that people seek help to migrate data from old floppy disks, and sometimes they are not careful with what they put next to the disks. People who used to use floppy disks and other magnetic media understood the need to be careful about keeping them away from other things that generated magnetic fields. Data on a floppy disk is stored magnetically, which made them very sensitive to magnetic and electromagnetic fields. Today's storage media, USB drives, memory cards, optical discs and such, are not susceptible to magnetic fields. Cell phones, tablets, and other equipment often have strong magnets in them. It is "important to remind people that, should they come across an old floppy disk, and they would like to save the data, they must be careful where they place it, and what newer devices come into contact with it.  Old floppies should be kept as far away from strong magnets as possible.  And smartphones, tablets and even modern laptops shouldn’t come within 6 inches of floppies or any other magnetic media that could be easily erased."  In addition, 3.5" floppy drives are becoming harder to find.

Tuesday, March 29, 2016

Some Assembly Required – Micro-services and Digital Preservation

Some Assembly Required – Micro-services and Digital Preservation. Danielle Spalenka. POWRR Blog. March 22, 2016.   
     Very informative article about how micro services and tools can benefit libraries of all sizes and financial abilities. Many struggle with implementing a digital preservation infrastructure.  University of California created a set of free-standing but inter-operable applications that performed a single or limited number of tasks in the larger curation and preservation process, which they described as a micro-services approach.

This approach, which did not require the installation of a single, long-lived application, can help medium-sized and smaller institutions to identify and achieve digital preservation goals. The set of twelve independent but compatible micro-services performed preservation functions such as identity, storage, fixity, replication, inventory, ingest, index, search, transformation, notification and annotation. These simple utilities would pose fewer challenges in their development, deployment, maintenance and enhancement than a large, integrated system. The strategic combination of individual services could produce “the complex global function needed for effective curation” at large institutions. The Digital POWRR (Preserving Digital Resources with Restricted Resources) Project team begin a study of the problems and possible solutions for preserving digital objects.

Some understood that digital curation and preservation was an either/or issue: either an institution had implemented a digital preservation system or it had not. In reality, the preservation activities are "an ongoing, iterative set of actions, reactions, workflows, and policies." This means institutions can begin taking small steps rather than waiting to devise an ideal solution. The NDSA Levels of Digital Preservation provides a yardstick to measure progress toward a digital curation and preservation capacity. Two fundamental understandings at the heart of a micro-services approach are:
  1. digital curation and preservation is an uncertain process in which continuous, rapid technological change often renders monolithic, integrated applications cumbersome and outdated;
  2. simple tools focused on a specific aspect or aspects of the process can prove more helpful.
Micro-services tools can only be effective if users understand what roles they play in the larger digital preservation process and the path they take through the NDSA Levels. The COPTR (Community Owned Digital Preservation Tool Registry) web site provides information about many helpful tools and services that can provide incremental value.  Micro-services tools can also help those adopting more robust tools.

The use of individual tools performing discrete functions can help those starting preservation activities. The Digital POWRR Project has described the stages of a digital curation and preservation workflow and associated activities. Tools performing these functions are available at the COPTR web site.

Exploring appraisal, quality assurance and risk assessment in the data continuum

Exploring appraisal, quality assurance and risk assessment in the data continuum.   Linda Ligios. Pericles blog. 8 March 2016.
     PERICLES presented a workshop on "appraisal, quality assurance and risk assessment in relation to the lives of complex digital objects."   It introduced the key concepts of :
  • model-driven preservation in a continually evolving environment
  • appraisal processes that lend themselves to being automated,
  • development plans for tools on appraisal, risk assessment and quality assurance 
Three main dimensions
  1. Risk – probability of an entity being non-usable
  2. Proximity – time frame in which we consider risk/impact
  3. Impact – potential loss of functionality and cost of mitigating actions
"Policies should always reflect the vision of the institution and therefore contain principles that are more aspirational in nature."  The  PERICLES model-driven preservation approach:

Model-driven preservation

Related topics:

Monday, March 28, 2016

Translating theory to practice : defining digital preservation planning in museums

Translating theory to practice : defining digital preservation planning in museum. Emma Palakika James. Thesis, San Francisco State University. January 2016. [PDF, 292 pp.]
    A very interesting thesis looking at digital preservation as an emerging activity in museums today. Some of the chapters discuss Threats to Digital Objects; What is Digital Preservation? (including the basics, OAIS, Trusted Digital Repository, methods; case studies, and Digital Preservation Policy.  Some notes of interest:
  •  The development of technology as a tool for work, research, information capture, and artistic expression, as well as the increasing percentage of important cultural materials created only in digital form, argues that museums must begin to focus on digital preservation.
  • Four key themes are discussed, including 
    • defining digital preservation,
    • integration of digital preservation technology, 
    • collaboration, and 
    • policy development
  • currently, digital preservation remains a new, and not-broadly practiced activity in museums. 
  • The practice of digital preservation will become increasingly important to the museum field, and should be considered with the same responsibility and effort as traditional museum collection management. 
  • If museums are going to continue their role as well-equipped stewards for the cultural heritage of today and of our future, then digital preservation will need to be adopted within the broader scope of museum work.
  • digital preservation in a museum context must be viewed and implemented from a collections management perspective.
  • In some form or another, eventually all museums will adopt digital technology into their institutional assets, museum archives, and museum collections, all of which will continually be expected to be cared for and preserved just as long as any analog collections.
  • While digitization was the very beginning of increased public access to collections, digital preservation is simply the flip side of ensuring ongoing access — providing consistent entry to information that is already manifested in digital form. 
  • If access to collections is becoming a mainstreamed part of the Museum’s responsibility, the ongoing access to born-digital institutional assets is also certainly worthy of consideration. 
  • Storage has been the default long term preservation strategy used by museums for traditional collections, but it is the shortest-term solution for new media
  • American Institute for Conservation of Historic and Artistic Works: “every institution has a responsibility to safeguard the collections that are entrusted to it. That responsibility includes incorporating preservation and conservation awareness into all facets of the institution’s activities so as to ensure the long-term preservation of its collections”
  • active digital preservation tactics should be accessible, manageable, and realistic solutions.
  • Collaboration is necessary to ensure preservation and especially if the museum field wants to improve itsstewardship of digital materials
  • collaboration between libraries, archives, and museums will be a critical factor for whether the greater museum field can achieve digital preservation to the level of a ‘Trusted Digital Repository,’ which arguably, is the ideal level of preservation for medium to long-term stewardship. 
  • Together, LAMs can ensure that ‘bom-digital’ documents and artifacts become integrated into the cultural record through various levels of digital preservation activity that will help to keep them accessible, and to become a permanent part of the cultural memory of future generations.
  • most museum collection management policies do not address the issues of digital preservation or digital stewardship.
  • the need to draft and implement a digital preservation policy is of equal importance to that of collection management policy for a museum.
  • a full-formed policy is a way for the staff, and hopefully eventually upper management, to organize the overall mission, goals, scope, staff roles, and basic procedures. This may help better define how the staff can tackle digital preservation, making it a less intimidating process and to also document its official initiation 
  • the technology the three case studies falls into three categories: digital asset management systems, OAIS compliant software, and storage media.
  • is a digital asset is not a digital preservation system in of itself, because a DAMS does not usually follow the specific recommendations for metadata, fixity checks, and formats that are put forth by Trusted Digital Repository model, OAIS and other standards
  • Digital preservation systems ultimately are a set of processes, protocols, and policies that are most often mediated with some technological aspect to aid in creating information packages suitable for long-term storage
  • Digital preservation is not just a technology problem, but it is a management issue. 
  • Policy is a tangible method for institutions to outline the management support of their preservation activities
The thesis includes five conclusions concerning the state of digital preservation in museums:
  1. preservation is possible; 
  2. standards, guidelines, and best practices are already available, but use wisely; 
  3. embrace new practices in policy;
  4. collaboration will be key for success; and 
  5. embrace change and act now.
“To create a collection, to inherit one, or to be given oversight of a collection, is also to create, inherit, or accept a great responsibility”

Saturday, March 26, 2016

Caring for file formats

Caring for file formats. Ange Albertini. Presentation at Troopers 2016. March 17, 2016. [PDF]
     The risk to preserving digital objects is very high. The "attack surface with file formats is too big". The specifications of formats are a nice guide, but they don't represent reality; they are useless for managing the formats. "We can’t deprecate formats because we can’t preserve and we can’t define how they really work."
The formats need good documentation to show the landscape and "to express the reality of file format".  Once they are better understood, then "we can preserve and deprecate older format, which reduces attack surface". Then people can focus on making the present formats more secure.

What is a file format? A computer dialect to communicate between communities; file formats are community connectors. People don't care about the format itself, they care about the characteristics and how easy it is to use. We don't need new formats, since reality will diverge from the specs anyway. The need is for up to date, traceable specs. Formats are constantly being updated with new features added. That doesn't solve the problem.  Specs should reflect reality and be "updated, enforced, realistic, freely available". Deprecation is a natural cycle, but are afraid to deprecate because "no file format is fully preserved". Formats should be open and the specs kept up to date. But it won’t happen until "we experience a great disaster".

Friday, March 25, 2016

National Archives permits us to learn from mistakes

National Archives permits us to learn from mistakes. Peter Charleton, Supreme Court judge. The Irish Times. Feb 8, 2016.
      For the National Archives in Ireland, 1922 was a disaster. A direct hit from artillery destroyed centuries of records. The census records from 1821 through to 1891 were almost completely destroyed. Since then, the National Archives has tried to supplement its damaged holdings, but what has been lost is gone forever. With many places moving from paper to digital records history is on the point of repeating itself. The traditional policy of printing files to preserve a digital record no longer works. Files may be on several computers in several iterations; they may have "elements in office systems, email, even text messages or a tweet." With digital, there is a lot of data, which brings a challenge of what to preserve.  But not every record needs preservation.

To ensure records the preservation of long-term records, the records should be transferred to the National Archives. Permanent records need to be identified early and treated appropriately.  Creation of a digital archive will greatly reduce the volume of records that government departments store. "Millions are spent by departments on off-site storage and back-ups of network drives. By investing in a digital archive, departments will be able to transfer emails, business files, digital images and other electronic records to the National Archives. An efficient approach to records management based on legal obligation can target policy effectively."  Money should be directed to the National Archives for developing an efficient system so there are sufficient resources to capture, manage and preserve our digital heritage. This institution, the precious repository of this nation, deserves to be supported in ensuring Ireland continues to have a history.

Thursday, March 24, 2016

The FAIR Guiding Principles for scientific data management and stewardship

The FAIR Guiding Principles for scientific data management and stewardship. Mark D. Wilkinson, et al. Nature. 15 March 2016. [PDF]
     "There is an urgent need to improve the infrastructure supporting the reuse of scholarly data." Good data management is not a goal in itself, but a conduit leading to knowledge discovery, innovation and the reuse of the data. The current digital ecosystem prevents this, which is why the funding and publishing community is beginning to require data management and stewardship plans. "Beyond proper collection, annotation, and archival, data stewardship includes the notion of ‘long-term care’ of valuable digital assets" so they can be discovered and re-used for new investigations.

This article describes four foundational principles (FAIR) to guide data producers and publishers:
  • Findability,
    • assigned a globally unique and persistent identifier
    • data are described with rich metadata
    • metadata clearly include the identifier of the data it describes
    • data are registered or indexed in a searchable resource
  • Accessibility,
    • data are retrievable by their identifier using a standardized communications protocol
    • the protocol is open, free, and universally implementable
    • the protocol allows for an authentication and authorization procedure,
    • metadata are accessible, even when the data are no longer available
  • Interoperability, 
    • data use a formal, accessible, shared, and broadly applicable language for knowledge representation
    • data use vocabularies that follow FAIR principles
    • data include qualified references to other (meta)data
  • Reusability
    • meta(data) are richly described
    • (meta)data have a clear data usage license
    • (meta)data have a detailed provenance
    • (meta)data meet community standards
These FAIR principles guide data publishers and stewards in evaluating their implementation choices. They are a prerequisite for proper data management and data stewardship. Achieving these goals requires working together with shared goals and principles.

Wednesday, March 23, 2016

New Report on Web Archiving Available

New Report on Web Archiving Available. Andrea Goethals. IIPC. 21 March 2016.
     Harvard Library recently released a report to:
  • explore and document current web archiving programs
  • identify common practices, needs, and expectations in the collection of web archives
  • identify the provision and maintenance of web archiving infrastructure and services;
  • identify the use of web archives by researchers.
The environmental scan showed 22 opportunities for future research and development, which includes:
  • Dedicate full-time staff to work in web archiving to keep up on latest developments, best practices and be part of the web archiving community.
  • Conduct outreach, training and professional development for existing staff who are being asked to collect web archives.
  • Institutional web archiving programs should be transparent about holdings, terms of use, preservation commitment, are curatorial decisions made for each capture.
  • Develop a collection development tool to show holdings information to researchers and other collecting institutions.
  • Train researchers to be able to analyze big data found in web archives.
  • Establish a standard for describing the curatorial decisions behind collecting web archives.
  • Establish a feedback loop between researchers and the librarians/archivists.
There is also a need to "radically increase communication and collaboration" among all involved in web archiving. Much more communication and collaboration is needed between those collecting web content and researchers who would like to use it.

Tuesday, March 22, 2016

Do your Digital Records have an Expiration Date?

Do your Digital Records have an Expiration Date? Jon Tilbury. Information Management. March 22, 2016.
     A general article about the importance of digital preservation.  Some quotes of interest:
  • As more “born digital” content is produced every day, requirements get more complex, and the need for organization-wide digital preservation strategies becomes greater.
  • the consensus is that the “tipping-point” for accessing digital data is not 100 years after it is stored, but more realistically, around 10 years.
  • Ten years is a more realistic time-frame to consider when planning protection for critical and unique digital information assets. 
  • Building on reliable storage, digital preservation adds tools to accurately identify which formats are being used, pinpoint those at risk, and reliably recycle these into newer formats that can be read.
  • For many organizations, a proactive approach to safeguarding critical long-term digital records t using digital preservation technology is fast becoming a critical part of the overall information governance lifecycle.

Monday, March 21, 2016

From digital dark age to digital enlightenment

From digital dark age to digital enlightenment.  Caroline Pegden. National Archives. 17 February 2016.
     Recent media reports have talked about the ‘digital Dark Age‘.  This is a major challenge, now and for the years to come for institutions in the archives sector, who are concerned with managing, preserving and providing access to born-digital records. This is important for the UK National Archives because some government departments will soon transfer born-digital records to The National Archives under the Public Records Act. As the National Archives has been working on how to do this, their philosophy has been ‘learning by doing’. They have reviewed what other archival institutions around the world are doing to manage digital records and have been testing the process of transfers "to design and test the new process to appraise, select, sensitivity review, transfer, preserve and give access to born-digital records." Two major challenges are:
  1. extracting meaning from unstructured digital record collections in order to make appraisal and selection decisions.
  2. sensitivity reviewing born-digital records at scale without having to read all the individual documents
Most government departments’ information is on unstructured shared drives; some departments had up to 190 terabytes of information in email servers.  Technology-assisted-review, a process using reviewers and a combination of computer software and tools to electronically classify records may have interesting applications for the archives sector. "Although there is no ‘silver bullet’ or completely automated solution, technology-assisted review offers ways to prioritise and reduce the information to be manually reviewed."  Two reports are available that highlight challenges and shows how technology-assisted review could help addressing some of these challenges. 
  1. The digital landscape in government 2014-15: business intelligence review
  2. The application of technology-assisted review to born-digital records transfer, Inquiries and beyond: research report

How many of the EOT2008 PDF files were harvested in EOT2012

How many of the EOT2008 PDF files were harvested in EOT2012.  Mark Phillips. mark e. phillips journal. February 23, 2016.
     Post aabout  the author looking at some of the data from the End of Term 2012 Web Archive snapshot at the UNT Libraries. From the EOT2008 Web archive 4,489,675 unique (by hash) PDF files were extracted and then compared recently to see how many of those nearly 4.5 million PDFs were still around in 2012 when they crawled the federal Web again as part of the EOT2012 project. The findings:

After the numbers finished running,  it looks like the following.

                       PDFs        Percentage
Found             774,375       17%
Missing        3,715,300       83%
Total            4,489,675     100%

So 83% of the PDF files that were present in 2008 are not present in the EOT2012 Archive. It is possible that the items are still available at a different URL entirely in 2012 when it was harvested again. So the URL might not be available but the content could be available at another location.

Saturday, March 19, 2016

Preservation Watch

Preservation Watch. Barbara Sierman. DPC wiki. 12 February 2016.
     Preservation Watch is a well-accepted concept that was first created during the European PLANETS project by Barbara Sierman and Paul Wheatley. It goes beyond the OAIS monitoring activities and helps provide a better description of the Preservation Planning Functional Entity. The Planets Functional Model identifies 3 key preservation functions:
  1. Preservation Watch, 
  2. Preservation Planning and 
  3. Preservation Action.

Preservation Watch monitors internal and external entities, including the repository content. It deals with both the OAIS Administration and Preservation Planning areas. The Preservation Watch has 4 sub-functions:
  1. Monitor: collates preservation information from a variety of internal and external entities.
  2. Risk Analysis: assessment of this information, relaying critical risks to Preservation Planning.
  3. Representation Information Update: provides updates, including recording Risks and Executed Preservation Plans.
  4. Testbed:  a controlled environment for studying the operation of tools and services which will inform the Preservation Planning activities.

Friday, March 18, 2016

'A' is for AtoM

'A' is for AtoM. Jenny Mitcham. Digital Archiving at the University of York. 18 March 2016.
     Jenny has been working on an implementation of Access to Memory (AtoM) for a couple of years and provides an interesting list of information about it, the A to Z* of implementing AtoM. "It turns out that deciding to adopt a system is relatively simple, working out exactly how you are going to use it is far more complex!" A few that I found that apply in most software situations:
  • B is for Business as Usual: Any organisation when adopting a new and complex system like AtoM needs to think beyond initial implementation and consider how the solution can be embedded into their workflows for the longer term?
  • E is for Experimenting: We discovered that data may not always import in the way you expect.
  • J is for Just Start!: Reading the documentation is essential but testing and experimenting with AtoM are really the best ways of working it out.
  • N is for Not Perfect: AtoM (like all complex systems) has its limitations.
  • Q is for Quality: In an ideal world, all our data within AtoM would be of a high quality....but we do not live in an ideal world. Accepting that legacy data will not always meet current standards or be as accurate as we would like is key to moving forward with a system such as this.
  • T is for Training: Training is not just a one off exercise.

Applying DP Standards For Assessment & Planning

Applying DP Standards For Assessment & Planning. Bertram Lyons. PASIG 2016. March, 2016.
     ISO 16363:2012. Audit & Certification of Trustworthy Digital Repositories defines recommended practices for assessing the trustworthiness of digital repositories. The document will help those who audit repositories, but also those to design or redesign their digital repository processes. Some highlights from the standard:

3.1 Governance and Organizational viability: The repository shall have a collection policy or other document that specifies the type of information it will preserve, retain, manage, and provide access to. Without the policy the collection scope is unclear and it becomes difficult to say no to out of scope content. The standard expects a policy to exist and be documented.

4.2 Ingest: Creation of AIPs: Organizations should have a description of how AIPs are constructed from SIPs. It should document all changes to the processes, as well as defining what happens to the content (such as normalization of files, etc.)

5.2 Security Risk Management: The repository should have a written disaster preparedness and recovery plan, including at least one off-site backup of all preserved information together with an off-site copy of the recovery plan. This means the organization should be prepared administratively.

The elements are scored as follows
  • 0 - non-compliant or not started
  • 1 - slightly compliant (needs a lot of work to do in address the requirement.
  • 2 - half compliant: partially addressed but still significant work to do
  • 3 - mostly compliant: mostly addressed and working on full compliance.
  • 4 - fully compliant: can demonstrate the requirement is comprehensively addressed.
Elements needed:
  • Documentation: records of policy, procedure, and outcomes of activities
  • Policy: the definition of approaches and protocol for repository functions and procedures
  • Procedures: specification of preservation and infrastructure management activities
  • Software: development or configuration of preservation systems
  • Infrastructure: procurement, monitoring, and management of hardware infrastructure
  • Organization: organizational infrastructure including funding, staffing, and strategy
  • Action Plan

Thursday, March 17, 2016

Guidelines for the selection of digital heritage for long-term preservation

Guidelines for the selection of digital heritage for long-term preservationUNESCO/PERSIST Content Task Force. March 2016.
     Libraries, archives, and museums traditionally have the responsibility of preserving the intellectual and cultural resources produced by society but this is in jeopardy because of amount of information created every day in digital form. Digital content is doubling in size every two years.The digital content is also in danger because much of it is ephemeral; it lacks the longevity of physical objects. The challenge of keeping digital content "requires a rethinking of how heritage institutions identify significance and assess value". Institutions must be proactively identify and preserve digital heritage and information before it is lost. The role of libraries, archives, and museums are blurring in the digital age, but they still have major interests to preserve heritage.

Libraries face the challenge selecting digital content for long-term preservation. Many focus on short term use content already in their collection, rather than assessing new publications for acquisition. Archives have traditionally "relied on the passage of time between their creation and their acquisition by an archive to lend historical perspective in making selection decisions". However, the time frame for selecting content is shorter now since the rapid obsolescence of digital formats, storage media, system hardware and software systems, of opportunity of selection.  Some strategies for selecting digital content:

Acting locally 1: Strategies for collecting digital heritage.
  • Comprehensive collecting to acquire all of the material produced on a given subject area, time period, or geographic region.
  • Representative sampling to capture a representative picture makes selection and preservation more manageable and less resource-intensive.
  • Selecting material for addition to their collections based on specific criteria, such as
    • Subject/Topic.
    • Creator/Provenance.
    • Type/Format.
    • Institutions could defer selection by capturing all the digital heritage material now and apply selection criteria later.

Acting locally 2: Developing selection criteria for a single institution
How should institutions select, identify, and prioritize digital heritage before it is lost? Evaluating and assessing digital content should be based on the principles that underlie traditional selection, but include long term perspective for use and access as defined by its mandate and users.
Decision Tree for Selection in an individual Institution
  1. Identification. Identify the material to be acquired or evaluated. 
  2. Legal framework. Does the institution have a legal obligation to preserve the material?
  3. Application of three selection criteria to determine if content should be preserved: significance, sustainability, and availability
  4. Decision. make a decision based on the three items and then document the rationale and justification for the evaluation or decision.
"The long-term preservation of digital heritage is perhaps the most daunting challenge facing heritage institutions today. Developing and implementing selection criteria and collecting policies is the first step to ensuring that vital heritage material is preserved for the benefit of current and future generations."

Appendix 1: Management of long-term digital preservation and metadata. If the digital heritage is the “content”, then the metadata provides the “context”.

"Selection of digital heritage is closely connected with issues related to long-term preservation and access. Some losses of important digital heritage may be unavoidable, but the risk can be mitigated by following best practices in digital preservation, including redundancy, active management, and metadata management."

Three key types of metadata crucial to long-term preservation:
  • Structural (required for the technical capacity to read digital content)
  • Descriptive (containing bibliographic, archival, or museum contextual information, which can be system-generated or created by heritage professionals, content creators, and/or users)
  • Administrative (documenting the management of a digital object while in its collection).

Five basic functional requirements for digital metadata:
  1. Identification: The metadata must identify each digital object uniquely and unambiguously.
  2. Location: The metadata must allow each digital object to be located and retrieved.
  3. Description: A description of digital object as well as data about the content and the context.
  4. Readability: Metadata about the structure, format and encoding of digital objects
  5. Rights management: Rights and conditions of use and restrictions must be recorded.

Wednesday, March 16, 2016

File identification ...let's talk about the workflows

File identification ...let's talk about the workflows. Jenny Mitcham. Digital Archiving at the University of York. 27 November 2015.
     When adding files to a digital archive, an important questions is "What file formats have we got here?" Knowing this can:
  • determine the right software to open the file and view the contents 
  • start the conversation with the data provider about what formats are best to use for archiving
  • discuss the risks on the format and define a migration pathway for preservation and/or access
There are many tools for working with formats; each tool has strengths and weaknesses. Defining a workflow can help determine how best to use these tools, how to interact with them, or if manual steps should be taken instead. File identification tools are often incorporated into digital preservation systems that may determine the workflow in using the tools. Additional workflow questions around format tools include:
  • what should happen if ingested data can't be identified?  
  • should the curator/digital archivist be able to over-ride file identifications?
  • what should happen if there is more than one possible identification for a file?
  • is there a sustainable manual identification process if tools cannot identify a file? 
  • how to contribute to file format registries such as PRONOM
  • is the digital preservation system configurable enough to resolve these questions? 
Their Archivematica development work is focusing in the first instance on allowing the digital curator to see a report of the files that are not identified in order to understand the problem.

[Our Rosetta system has a format library that handles these questions, as well as a user driven Format Working Group that helps resolve questions and interacts with PRONOM if there are questions, changes or new additions. - Chris]

Tuesday, March 15, 2016

The Digital Preservation Network (DPN) Has Launched and Is Accepting Content

The Digital Preservation Network (DPN) Has Launched and Is Accepting Content.  Mary Molinaro. D-Lib Magazine. March/April 2016.
    Several years ago a group of academic leaders examined the risk to future scholars if the digital output from academia is not properly preserved and felt that the risk of loss was very high if nothing was done to protect against natural disasters, technological failure, or institutional failure. They pledged to create a large-scale digital preservation service that is built to last beyond the life spans of individuals, technological systems, and organizations. After three years of work, the resulting Digital Preservation Network is open and is accepting content from members. Five preservation repositories make up the DPN network. They have varying technical architectures and replicate content and perform services to safeguard the content. Content from member institutions can be added to DPN through two sites: DuraCloud Vault and the Academic Preservation Trust. The deposited content is replicated to the other nodes (Hathitrust, the Texas Preservation Node, and the Stanford Digital Repository).

DPN operates as an independent organization under the umbrella of Internet2 and is currently examining ways to open up DPN to other kinds of members.  More information is available at the DPN website.

Monday, March 14, 2016

Archives and SharePoint

Archives and SharePoint. Heather Emily Roberts. HerArchivist. March 8, 2016.
     Post that looks at "Is SharePoint (or other flexible cloud-based ERMS software) suitable for digital repositories of archives?"  Some pros and cons of using SharePoint as a digital repository:
  • Can lock documents against editing
  • Tells you when documents were last accessed and by whom
  • Will not serve long-term needs of accessibility or use of records
  • Will not support migration requirements of archival records
  • Will not guarantee integrity of archival records during software updates
  • Does not conform to OAIS model
  • Archive preservation practices are not standard
The main recommendation if using SharePoint as an Electronic Record Management System is to export archival documents to an OAIS compliant system (some are listed).
[We use our harvest tool to import permanent SharePoint records into our Rosetta system - Chris.]

Related posts:


Digital Preservation - Knowing where to start

Digital Preservation - Knowing where to start. Nik Stanbridge.  Cloud Computing Intelligence.
23 February 2016.
     Memory institutions face increasing demands on their collections, such as the need to manage costs better, provide access, or degradation of objects, which then require digital preservation. Some organizations already have a strategy and are digitally preserving their assets. Many though are only just starting to think about digital preservation and need to know where to start and how to implement digital preservation.

Digital preservation is the process of managing and storing digital files and associated metadata in a way that they will be accessible and usable in the future. The processes apply both objects that were originally created in digital form and to those that have been digitized. If you have a need to maintain digital objects then the first step is to define a strategy; understand what needs to be preserved and how. This includes information about the digital object. "It’s important to remember that digital preservation is as much about preserving the meaning and context of the asset as it is about preserving the asset itself."
  • File format preservation is the process of maximising the accessibility of the file through its repeated migration to any number of more stable or current file formats. 
  • Data archiving is the process of storing all of the resulting digital assets for the long term, using active archiving principles and processes.  
  • A preservation strategy will also need to cover the people and processes you are going to use, the quality of the digital assets to preserve and the IT infrastructure and associated support.
Most strategies will need a phased approach that addresses the digital preservation factors that are important to the institution. Establishing a digital preservation strategy is the very first step. Consider carefully also how the digital objects will be stored long term.

Saturday, March 12, 2016

Demystifying Digital Preservation for the Audiovisual Archiving Community

Demystifying Digital Preservation for the Audiovisual Archiving Community. Kathryn Gronsbell, Abbey Potter. The Signal. February 22, 2016.
     "The intersection of digital preservation and audiovisual archiving has reached a tipping point." Media production and use as well as the preservation strategies, including improvements in digital capture technology, adoption of file-based production workflows, digital distribution technology. storage solutions, over the past decade we have witnessed a series of transformations that fundamentally alter dominant theories and practices of moving image preservation and access.  The acceptance of digital preservation has been slower in the moving image archiving and preservation community than in other fields. Rarely are the challenges of preserving audiovisual materials discussed.  Recent proposals for audiovisual preservation include:
  • Transition to a stream-based preservation model
  • Digital preservation in practice (strategies)
  • Discussions on how to preserve (innovation and practical engagement)
  • The necessity of multi-disciplinary input for preservation
  • Transitioning from a short-term digital preservation project to a long-term program (sustainability)
Preservation in the moving image community may be slower because resources must be devoted to managing the physical collections, as well as the cost and complexity of preserving analog film/video content. "The shifting focus towards digital preservation is an opportunity to dissolve the manufactured boundary between A/V and still image (or other) content and include audiovisual specialists in broader discussions of preservation and access." There is a pressing need and collective desire to address some of the questions that digital preservation raises.  "Increasing the engagement of the analog film and video world with the digital preservation community, and vice versa, will yield tremendous benefits on both sides of the divide." Digital preservation should be understood as a core competency in the A/V archiving field and become part of a wider conversation about digital preservation across disciplines.

The AMIA organization hopes to bring together those who have limited resources or haven’t started strategically thinking about digital preservation, a place where the A/V community can learn without feeling lost in a wave of information. Hopefully this will increase the visibility of the intersection between audiovisual preservation and digital preservation continue the conversation between these two fields.

Friday, March 11, 2016

How Digital Storage Is Changing the Way We Preserve History

How Digital Storage Is Changing the Way We Preserve History. Arielle Pardes. Vice. February 19, 2016
     Article starts with an account of a digital diary platform called Oh Life; after the site had been shut down, thousands of archives were deleted and years of personal history were gone.  Digital disappearance like this is a warning sign to historians of problems to come with recording and preserving our history in the digital age. Digital storage is fragile and the files can easily be lost or locked up in encryption. Digital technology might not be around tomorrow, and many of the information storage platforms are owned by private companies, which makes it harder for archival institutions to save them.  Abby Smith Rumsey tried to troubleshoot how to store digital materials in the long-term and discusses concerns and possible future solutions for our digital age.
  • In the digital age, there's a lot circulating in the way of information, but none of it is kept very thoroughly. 
  • Technically, we don't know how to preserve it yet. Even more than that, what do we preserve? How do we know what's valuable?
Entire digital archives can vanish if the storage platform, technology, or software disappears. Many of the websites we use are owned by private companies and individuals do not own the content. We won't know for a while if the content we have saved is valuable in the future. "The more the mind can be freed of certain types of memory tasks, the freer the mind is to engage in other activities that machines cannot do for us."

Thursday, March 10, 2016

Teaching Files: Incorporating Born-Digital Materials into Instruction

Teaching Files: Incorporating Born-Digital Materials into Instruction. Dorothy Waugh. CurateGear 2016. January 14, 2016.
  • Establish a sustainable model for the integration of born-digital materials into instruction programs
  • Help faculty and students use born-digital collections and promote greater understanding
  • Explore requirements and policies needed to use born-digital materials in the classroom and library
  • Train Library staff to talk confidently  about born-digital materials and provide support for research methods using born-digital materials
Voyant Tools: Voyant Tools is a web-based text reading and analysis environment to work with texts in a variety of formats, including plain text, HTML, XML, PDF, RTF, and MS Word.

Wowza as a Streaming Media Service at the National Library of New Zealand

Wowza as a Streaming Media Service at the National Library of New Zealand. Mathachan Kulathinal. Rosetta Tech Blog. March 2, 2016.
     The National Library of New Zealand uses the Wowza Streaming Engine Server to stream derivative copies of video and audio content stored within their Rosetta preservation system. Details of our integration with Rosetta are outlined below, configuration or environment specific details used in the implementation of this model at the Library have not been included.

According to the diagrams, the Wowza Streaming Engine service is separate from the Rosetta servers; the streaming media servers have been kept separate and independent of the Rosetta services. The servers are also used as proxy servers to re-direct http traffic as required. The post includes the implementation details.

Wednesday, March 09, 2016

The Human Face of Digital Preservation: Organizational and Staff Challenges and Initiatives at the Bibliothèque nationale de France

The Human Face of Digital Preservation: Organizational and Staff Challenges and Initiatives at the Bibliothèque nationale de France.  Emmanuelle Bermès, Louise Fauduet.  iPres, October 2009.  [Video, full paper, slides]
     Great presentation. The National Library has been working for several years on their digital preservation efforts, with the Spar  project (Système de Préservation et d'Archivage Réparti). They are looking at the human aspects of their digital projects. The library has become digital as a whole, which was a major change. Earlier the digit library was treated differently from the rest of the library. Originally the digital side was led by experts or early adopters who were learning by doing and were organized separately from the rest of the library workflows. Digital definitely meant "different". Now the library has become digital, which means  they have regular production teams running operational projects for digital content and digital preservation.  All organizations within the library are involved in these tasks and there are formal training processes. The library shifted and part of this was a large scale shift in the scale of content digitized, as well as making digital activities closely related with traditional library skills. If you want to disseminate the digital activities through out the library you have to disseminate the tasks and the people as well. You have to take the time to train everyone and move slowly to bring all people along, or you leave people behind.

Many of the digital activities can be related to the other workflows, like ingest and acquisitions, metadata, cataloging, etc. Relate traditional librarian skills with digital skills; digital can be built on traditional library knowledge. Help integrate the digital by having common projects that people can work together on. It is important to take time to stop and look back at what has been done and talk about it. Difficult to take advantage of what has been done if you don't review. The library took time to move the conceptual OAIS model into the reality of the library organization, to decide how does it really fit, define roles and decide how the employees would interact.

The Library created a digital training curriculum and opened it up to everyone in the library to learn. "This training includes an introduction to digital libraries, digital documents, and digital preservation, and then three optional tutorials, one on metadata and protocols (including semantic web technologies), one on user oriented design (including usage studies, accessibility and usability, and the Web 2.0), and one on digitization and preservation (including rights management, preservation metadata and persistent identifiers)." They had to draw a line between those who wanted the training for their job, or those who were just curious. They started a project about organizations and human resources under digital influence to better understand the digital library and their people. Six wishes:
  1. Clarify the institution’s policy and digital strategic vision
  2. Define priorities 
  3. Define what a digital collection is
  4. Facilitate transverse workflows that span easily across organizational borders
  5. Develop digital skills
  6. Analyze job qualifications, revisiting job requirements and updating staff skills

Tuesday, March 08, 2016

Assessing Digital Preservation at the John F. Kennedy Presidential Library

Assessing Digital Preservation at the John F. Kennedy Presidential Library. Alice Sara Prael, Abbey Potter. The Signal. March 2, 2016.
      The question is with all the data in the library holdings, how to preserve the digital files over the long term? The goal of the project is to ‘develop a long-range digital preservation strategy’ to address all digital archival holdings at the Library. This challenging goal consists of three phases:
  1. assess current infrastructure against community standards and make brief recommendations on how to improve digital preservation practices
  2. explore potential solutions to address the recommendations made in the first phase
  3. determine a single path forward based on the solutions explored and create an action plan for how to implement that solution.
Plans started by interviewing archivists and IT personnel about their processes and how they use the systems and then research the systems. Like many cultural heritage institutions, is in need of better documentation. The biggest gap in the existing documentation for digital archives is a digital preservation policy which is a record of decisions, including those that have been made but not documented or documented elsewhere. Important community standards and guidelines include
  • ISO 14721: Reference Model for an Open Archival Information System (OAIS), 
  • ISO 16363: Audit and Certification of Trustworthy Digital Repositories,
  • the National Digital Stewardship Alliance Levels of Digital Preservation
Each of these items gives a slightly different perspective on what is required for digital preservation. With limited resources and staff time it’s important to recognize when to aim for “good enough” digital preservation, which can be defined for each institution by "the available resources, the needs of the collection, and priorities of the institution".  The NDSA Levels of Digital Preservation are not as in depth as but they are easier to understand. The intermediary levels can address digital preservation in a phased approach and also create a way for identifying strengths and weaknesses. Once the NDSR project is complete the Library will have a picture of the digital practices and a clear implementation plan for improved digital preservation.

Monday, March 07, 2016

Pursuing the "Long Tail" of Elusive Publishers

Pursuing the "Long Tail" of Elusive Publishers. Stephanie Orphan, Amy J. Kirchoff, Kate Wittenberg. Center for Research Libraries. Webinar. Wednesday, February 24, 2016.
     Pursuing the "Long Tail" is the first in a series of four CRL webinars that will examine the benefits, capabilities and possibilities of the Portico digital preservation service.
The archived title coverage information shows what titles are not archived in Portico, CLOCKSS, LOCKSS, or JSTOR, compared to the total number of journals listed by CrossRef. The slide shows:
  • Titles Preserved Somewhere: 28,473 (62%)
  • Titles Not Preserved Anywhere: 17,388 (38%)
In the slide, there are 28 publishers that publish over 200 titles, while there are 4,650 that publish 1 or 2 titles and they have 4,039 titles that are not preserved anywhere. That will take a long time to include all those publishers. Libraries can help by:
  • Providing the names of specific publishers that are a high priority.
  • Providing categories of publishers or  platforms that are a high preservation priority.
  • Talking to publishers about participating in a preservation service

Saturday, March 05, 2016

The M Word: The Good, the Bad, the Ugly

The M Word: The Good, the Bad, the Ugly. Robert H. McDonald and Juliet L. Hardesty. Educause Review. January 11, 2016.
     Metadata is a part of all aspects of the research process. "The expectations are that metadata will be clean and understandable, secure and accessible when appropriate, and easily shareable." In reality this doesn’t happen naturally or without "concerted effort and cooperation" in all areas of policy, design, and practice. Metadata created by hand can be problematic; the academic research librarians work with metadata to "ensure good storage, maintainability, shareability, and most importantly, accessibility."

Libraries are often the right agency to serve as a neutral mediator for collaborations among researchers. New directions in the research process are "creating new roles and opportunities for libraries to help in preserving, managing, publishing, and accessing data."

Friday, March 04, 2016

Digital Stewardship in a Radio Archive: An NDSR Project Update

Digital Stewardship in a Radio Archive: An NDSR Project Update. Mary Kidd, Erin Engle. The Signal. January 5, 2016. 
     This is an update on the Radio Archive project and includes:
  • a Digital Preservation Roadmap
  • an overview of the current digital production throughout the various stations
  • a network-wide analysis of NYPR’s digital holdings
The definition of what makes radio includes both traditional broadcasting as well as the digital formats, particularly the WAV and MP3 files, live and on-demand streaming audio and video, on-the-go podcasts, and social media posts. The archive will have to "develop new ways to address the fact that digital assets are often interconnected with other assets, rather than standalone audio objects".  Archives are not just safekeepers of the past, but informants of the present. The evolving identity of radio is influencing the development of a new and emerging field referred to as “radio preservation studies”.

An important part of the studies for the archive is to distinguish radio archives from traditional sound archives.They appear to be similar, but radio has its own distinct sound and modes of production. And radio archives require their own set of archival best practices that contextualize broadcast recordings. "Striking a balance between the archival priorities and the expectations of producers is quite possibly one of the greatest challenges for an archive embedded in a media station."

Thursday, March 03, 2016

Research Software Sustainability: Report on a Knowledge Exchange Workshop

Research Software Sustainability: Report on a Knowledge Exchange Workshop. Simon Hettrick. The Software Sustainability Institute. February 2016; 3 March 2016.   [PDF]
     "Without software, modern research would not be possible since it is connected to the software that is used to generate results." Overlooking software will put at risk the reliability and the ability to reproduce the research itself. Like the research, and any other tool, software must stand up to the same scrutiny. It is not easy to define software sustainability, but it is the practices that allow software to continue to function as expected in the future.  This is neither easy nor straightforward. Software has a lifecycle: it is conceived, matures and decays. Not all software should be sustained, we should  concentrate on sustaining software that is most useful. Software is always reliant on other software in order to work., including operating system, system libraries, and other necessary packages. Any change or decay at any level can affect the operation of the software higher up the stack.  "If we attempt to preserve software, it quickly becomes out of step with its dependent software."

  • Research software: software developed within academia for the purposes of research, particularly to generate, process and analyze results.
  • Software sustainability: the technical and non-technical practices that allow software to continue to operate as expected in the future. A constant level of effort is required to maintain the software’s operation.
  • Software preservation: an approach to extend the lifetime of software that is no longer actively maintained.
  • Software archiving: one important aspect of software preservation. It is the process of storing a copy of software so that it may be referred to in the future.
Approaches to software sustainability and preservation
  1. Encapsulation. Preserve the original hardware and software to ensure that the software continues to operate (an example is
  2. Emulation. Emulate the original hardware and operating environment so that the software continues to operate
  3. Migration. Update the software to maintain the original functionality and transfer it to new platforms as necessary to prevent obsolescence
  4. Cultivation. Keep the software up to date by adopting an open development model that allows new contributors to be brought on board
  5. Hibernation. Preserve knowledge of how to resuscitate the software’s exact functionality at a later date
  6. Deprecation. Formally retire the software. Unlike hibernation, no time is invested into preparations to make it easier to resuscitate the software
  7. Procrastination. Do nothing
Key recommendations
  1. Raise awareness of the fundamental role of software in research
  2. Recognize research software as a valuable research object
  3. Promote software sustainability
  4. Embed software sustainability skills in the research community
  5. Create organizations as focal points for software sustainability expertise
Benefits include:
  1. Trusted research
  2. Increased rate of discovery
  3. Increased return on investment
  4. Research data remains readable and usable

Wednesday, March 02, 2016

Leveraging Heritrix and the Wayback Machine on a Corporate Intranet: A Case Study on Improving Corporate Archives

Leveraging Heritrix and the Wayback Machine on a Corporate Intranet: A Case Study on Improving Corporate Archives. Justin F. Brunelle, et al. D-Lib Magazine. January/February 2016.
     This is a case study about using open-source, web-scale web archiving tools, Heritrix and the Wayback Machine. Internet archiving does not have the opportunity to archive Intranet-based resources, such as corporate content. Past research has shown that "web pages' reliance on JavaScript to construct representations leads to a reduction in archivability".  The Internet Archive uses Heritrix and the Wayback Machine to archive web resources and replay mementos on the public web.
The article recommends content authors use robots.txt and noarchive HTTP response headers to avoid sensitive information. Accidentally archiving sensitive information can result in loss of mementos within a WARC. Recommendations include:
  • Use smaller storage devices to limit the problems if sensitive information is crawled;
  • Develop a way to remove a sensitive memento from a WARC file 
  • Identify high-risk vs. low-risk archival targets within the Intranet.
Archiving intranet content needs to fit within a larger documentation plan and knowing what the key resources and elements are that need to be preserved in order to preserve corporate memory. There is value for a corporation to have a web crawling archiving strategy. It "may make more sense for a corporate archives to preserve information about its corporation's projects that is tracked in a database and served to an Intranet through an export directly from the database rather than crawling the Intranet for the project data".

The case study and the next steps proposed will help archive corporate memory, improve information longevity, and can help corporate archivists implement web archiving strategies.

Tuesday, March 01, 2016

ODF: The Open Document Format

ODF: The Open Document Format. Carl Fleischhauer, Erin Engle. The Signal. January 19, 2016.
     In December 2015, descriptions of eleven new parts of the Open Document Format group were added to the Format Sustainability website. The Format Sustainability website provides technical descriptions about formats of all types to help staff in assessing new content. ODF is part of a complex format family, which exists in several versions and “parts.” The list of ODF-related formats added to the sustainability website last month:
  •     ODF_Family, ODF (OpenDocument Format) Family, OASIS and ISO/IEC 26300
  •     ODF_package_1_1, OpenDocument Package Format, ODF 1.1, ISO/IEC 26300:2006
  •     ODF_package_1_2, OpenDocument Package Format, ODF 1.2 part 3; ISO/IEC 26300-3:2015
  •     ODF_text_1_1, OpenDocument Text Format (ODT), ODF 1.1, ISO/IEC 26300:2006
  •     ODF_text_1_2, OpenDocument Text Format (ODT), ODF 1.2, ISO/IEC 26300-1:2015
  •     ODF_chart_1_2, OpenDocument Chart Format (ODC), ODF 1.2, ISO/IEC 26300-1:2015
  •     ODF_draw_1_2, OpenDocument Drawing Format (ODG), ODF 1.2, ISO/IEC 26300-1:2015
  •     ODF_spreadsheet_1_1, OpenDocument Spreadsheet Format (ODS), Version 1.1, ISO 26300:2006
  •     ODF_spreadsheet_1_2, OpenDocument Spreadsheet Format (ODS), Version 1.2, ISO 26300:2015
  •     ODF_dbfront_1_2, OpenDocument Database Front End Document Format (ODB), Version 1.2, ISO 26300-1:2015
  •     ODF_presentation_1_2, OpenDocument Presentation Document Format (ODP), Version 1.2, ISO 26300-1:2015
ODF was designed so no one using public documents would be "forced to buy software from one particular vendor or for one particular operating system platform.” It has been supported by institutions trying to preserve documents for the long term. There are also a number of organizations, including government, that have "adopted the ODF family of formats as mandatory or recommended for documents that must be editable in order to support collaboration within the government or between the government and the public."