Thursday, May 19, 2016

One Billion Drive Hours and Counting: Q1 2016 Hard Drive Stats

One Billion Drive Hours and Counting: Q1 2016 Hard Drive Stats. Andy Klein. Backblaze. May 17, 2016.
     Backblaze reports statistics for the first quarter of 2016 on 61,590 operational hard drives used to store encrypted customer data in our data center. The hard drives in the data center, past and present, totaled over one billion hours in operation to date.The data in these hard drive reports has been collected since April 10, 2013. The website shows the statistical reports of the drive operations and failures every year since then. The report shows the drives (and drive models) by various manufacturers, the number in service, the time in service, and failure rates. The drives in the data center come from four manufacturers, most of them are from HGST and Seagate. Notes:
  • The overall Annual Failure Rate of 1.84% is the lowest quarterly number we’ve ever seen.
  • The Seagate 4TB drive leads in “hours in service” 
  • The early HGST drives, especially the 2- and 3TB drives, have lasted a long time and have provided excellent service over the past several years.
  • HGST has the most hours in service

Related posts:

IBM Scientists Achieve Storage Memory Breakthrough

IBM Scientists Achieve Storage Memory Breakthrough. Press release. 17 May 2016.
     IBM Research demonstrated reliably storing 3 bits of data per cell using phase-change memory. This technology doesn't lose data when powered off and can endure at least 10 million write cycles, compared to 3,000 write cycles for an average flash USB stick. This provides "fast and easy storage" to capture the exponential growth of data.


Wednesday, May 18, 2016

Floppy Disk Format Identifer Tool

Floppy Disk Format Identifer Tool. Euan Cochrane. Digital Continuity Blog. May 13, 2016.
     Euan created this tool https://github.com/euanc/DiskFormatID (which he documents in this great blog post) to:
  1.     “Automatically” identify floppy disk formats from kryoflux stream files.
  2.     Enable “simple” disk imaging workflows that don’t include a disk format identification step during the data capture process.
The tool processes copies of floppy disk data saved in the kryoflux stream file format, creates a set of disk image files formatted according to assumptions about the disk’s format, and allows the user to try mounting the image files as file systems. It requires the Kryoflux program to function. The documentation also provides detailed information on how to use it, along with other interesting information.

Friday, May 13, 2016

JHOVE 1.14 released

JHOVE 1.14 released. Open Preservation Foundation. 12 May 2016.
     "The latest version of JHOVE, the open source file format identification, validation and characterisation tool for digital preservation, is now available to download." This version has three new format modules: gzip, WARC and PNG. Among other features, it has a black box testing module and support for Unicode 7.0.0.  Relevant links:

Related posts:

Thursday, May 12, 2016

The Center for Jewish History Adopts Rosetta for Digital Preservation and Asset Management

The Center for Jewish History Adopts Rosetta for Digital Preservation and Asset Management. Ex Libris. Press Release. May 12, 2016.
     After a thorough search process, the Center for Jewish History selected the Ex Libris Rosetta digital asset management and preservation solution. They wanted a system to handle their comprehensive list of requirements for both long‑term digital preservation and robust management of digital assets, including the ability to interface with their other systems.

The Center’s partners are American Jewish Historical Society, American Sephardi Federation, Leo Baeck Institute, Yeshiva University Museum, and YIVO Institute for Jewish Research.  The collections include more than five miles of archival documents, over 500,000 volumes, and thousands of artworks, textiles, ritual objects, recordings, films, and photographs.

Monday, May 09, 2016

Looking Across the Digital Preservation Landscape

Looking Across the Digital Preservation Landscape. Margaret Heller. ACRL TechConnect Blog. April 25, 2016.
     "When it comes to digital preservation, everyone agrees that a little bit is better than nothing." The article cited refers to two presentations from Code4Lib 2016, “Can’t Wait for Perfect: Implementing “Good Enough” Digital Preservation” by Shira Peltzman and Alice Sara Prael, and “Digital Preservation 101, or, How to Keep Bits for Centuries” by Julie Swierczek. This article mentions two major items about digital preservation:
  1. Digital preservation doesn’t have to be hard, but it does have to be intentional.
  2. Digital preservation requires institutional commitment. 
Understanding all the basic issues and what your options are can be daunting. They had a committee that started examining born digital materials, but expanded the  focus to all digital materials because it made it easier to test their ideas. Some of the tasks they accomplished included: created a rough inventory of digital materials, a workflow manual, and secured networked storage  to replace all removable hard drives used for backups. "While backups aren’t exactly digital preservation, we wanted to at the very least secure the backups we did have". The inventory and workflow manual are living documents and are useful for identifying gaps in the processes.

They also looked at the end-to-end systems available for digital preservation, such as Preservica, ArchivesDirect, and Rosetta. Migrating from one system to another if you change your mind may involve some very difficult processes, so people may tend to stay with providers.  Another option is to join a preservation network, such as Digital Preservation Network (DPN) or APTrust, that have the larger preservation goal ensuring long-term access to material even if the owning institution disappears.

Sustainable Financing for many is the crux of the digital preservation problem. "It’s possible to do a sort of ok job with digital preservation for nothing or very cheap, but to ensure long term preservation requires institutional commitment for the long haul, just as any library collection requires."

Digital Preservation is receiving more attention digital preservation lately and hopefully more libraries will see this as a priority.

Thursday, April 28, 2016

Preserving the Fruit of Our Labor: Establishing Digital Preservation Policies and Strategies

Preserving the Fruit of Our Labor: Establishing Digital Preservation Policies and Strategies at the University of Houston Libraries. Santi Thompson, et al. iPres 2015. November 2015.
     Paper that presents the library's digital preservation efforts. They formed a Digital Preservation Task Force to assess previous digital preservation practices and make recommendations on future efforts. The group was charged to establish a digital preservation policy; identify strategies, actions, and tools needed to sustain long-term access to library digital objects. The group was to look at:
  • Define the policy’s scope and levels of preservation
  • Articulate digital preservation priorities by outlining current practices, identifying preservation gaps and areas for improvement, and establishing goals to address gaps 
  • Determine the tools, infrastructure, and other resources needed to address unmet needs and to sustain preservation activities in the future 
  • Align priorities with digital preservation standards, best practices, and storage services
  • Recommended roles, responsibilities, and next steps for implementing the strategy and policy 

The primary tool used for policy creation was  the Action Plan for Developing a Digital Preservation Program. It helps institutions establish a high-level framework with policies and procedures, and addressing resources to sustain a digital preservation program for the long term. The group also:
  • Selected and studied Action Plan for Developing a Digital Preservation Program to construct digital preservation policies
  • Drafted high-level policy framework
  • Outlined roles and responsibilities for internal and external stakeholders
  • Defined digital assets including digitization quality and metadata specifications; collection selection, acquisition policies, and procedures; and access and use policies
  • Identified and described key functional entities for the digital preservation system, including ingest, archival storage,preservation planning and administration, and access
  • Drafted potential start-up and ongoing costs for digital preservation
  • Focused on evaluating software
Principles outlined in their Digital Preservation Policy include collaboration, partnerships, and technological innovation. As more library resources and services become digital, the responsibilities must expand to include the identification, stewardship, and preservation of designated digital content.

The Digital Preservation Policy consist of three main sections: Policy Framework, Policies and Procedures, and Technological Infrastructure. Sections in the Digital Preservation Policy Framework include:
  • Purpose
  • Objectives
  • Mandate
  • Scope
  • Challenges
  • Principles
  • Roles and Responsibilities
  • Collaboration
  • Selection and Acquisition
  • Access and Use

Policies and Procedures section describe digital preservation policies, procedures, roles, and responsibilities in greater detail than the policy framework. It outlines requirements concerning digital assets, including recommended specifications for digital objects, preferred file formats, personnel also acquisition, transfer, and access of content

Technological Infrastructure section outlines digital preservation system functions and requirements in greater detail than the policy framework and includes:
  • The rules and requirements for Submission Information Packages (SIPs), Archival Information Packages (AIPs), and Dissemination Information Packages (DIPs)
  • The workflow for ingesting, updating, storing, and managing digital objects
  • The metadata requirements
  • The strategic priorities for future digital preservation efforts and risk management

Monday, April 25, 2016

Why Analog-To-Digital Video Preservation, Why Now

Why Analog-To-Digital Video Preservation, Why Now. Bay Area Video Coalition. April 4, 2016.
     The first part is from an article that revisits an earlier publication: How I Learned (Almost) Everything I Know About ½” Video from Dr. Martin Luther King, Jr. By Moriah Ulinskas, Former Director of Preservation. Originally published October 5th, 2011. It describes preserving a video recording of Martin Luther King, Jr. and the difficulties involved. Some quotes from the article and the website in general:
  • "I tell all our clients and partners that they have 5, maybe 10 years left in which they can have these works preserved and transferred and then these recordings are gone for good."
  • "These are the legacy recordings I refer to with such urgency when I talk about the immediacy and importance of video preservation. These moments of political and cultural significance that inspired someone, 40 years ago, to hook up a camera and record this tape which we’ve inherited from dusty basements and disregarded shelves."
  • "If we do not do diligence in transferring these recordings to new formats, as the originals become impossibly obsolete, these are the moments and the messages we will lose forever."

Some items from the rest of the website:
  • As audio and video technologies have changed, and as old formats age and disintegrate, we are at risk of losing significant media that documents the art, culture and history of our diverse communities. Link
  • Analog media preservation is necessary because of two central factors: technical obsolescence and deterioration. Experts say that magnetic media has an estimated lifespan for playback of 10-15 years, and companies have already ceased manufacture of analog playback decks, the devices required to digitize and preserve analog media.

Audio / Video Preservation Tools
  • QCTools (Quality Control Tools for Video Preservation) is a free, open ­source tool that helps  conservators and archivists ways to inspect, analyze and understand their digitized video files, in order to prioritize archival quality control, detect common errors in digitization, facilitate targeted response, and thus increase trust in video digitization efforts. 
  • A/V Artifact Atlas.  An open­-source guide used to define and identify common technical issues and problems with audio and video signals. The intent of the guide is to assist and promote reformatting archival media content.
  • AV Compass. A suite of free online resources to help with organizing and preserving media collections. It includes step-­by­-step educational videos, PDF guides, an overview of preservation concepts, and a simple tool for creating inventories. This guide helps users with creating a preservation plan and taking specific steps to make that plan happen.


Saturday, April 23, 2016

Closing the Gap in Born-Digital and Made-Digital Curation

Closing the Gap in Born-Digital and Made-Digital Curation. Jessica Tieman, Mike Ashenfelder. The Signal. April 21, 2016. 
     The post is about an upcoming symposium that refers to “Digital Frenemies”. The author observes that a trend in digital stewardship divides expertise into “made digital” and “born digital.” The landscape of the digital preservation field should not be divided like that. "Rather, the future will be largely defined by the symbiotic relationships between content creation and format migration. It will depend on those endeavors where our user communities intersect rather than lead to us to focus on challenges specific to our individual areas of the field."

Friday, April 22, 2016

Providing Access to Disk Image Content: A Preliminary Approach and Workflow

Providing Access to Disk Image Content:  A Preliminary Approach and Workflow. Walker Sampson, Alexandra Chassanoff. iPres 2015. November 2015.   Abstract    Poster
     The paper describes a proposed workflow that can be used by collecting institutions acquiring disk images to support the capture, analysis, and final access to disk image content of born-digital collections. The materials present certain challenges. Some use open-source digital forensics software environments like BitCurator, for the capture and analysis of these born-digital materials.

The workflow is for the research archives at the University of Colorado Boulder; they do not have a digital repository or collection management software deployed. However it "addresses the immediate needs of the material, such as bit-level capture and triage, while remaining flexible enough to have the outputs integrate with a future digital repository and collection management software." It allows researchers to access a bit-level copy of a floppy disk found in an archival collection. Access is typically regarded as the last milestone of processing work.

The workflow for processing born-digital materials starts with obtaining the physical disk; it is photographed then a disk image is created. The BitCurator Reporting Tool generates analytic reports and other programs can be carried out here as well. The total output from BitCurator is placed into a single BagIt package and uploaded to a managed storage space with redundant copies. That will be the AIP in a future repository. The disk image can provide access to the public.

Scientific Archives in the Age of Digitization

Scientific Archives in the Age of Digitization. Brian Ogilvie. The University of Chicago Press Journals. March 2016.
     Historians are increasingly working with material that has been digitized; they need to be aware "of the scope of digitization, the reasons why material is chosen to be digitized, and limitations on the dissemination of digitized sources."  Some physical aspects of sources, and of collections of sources, are lost in their digital versions. Some notes from the article:
  • "Digitization of unique archival material occupies an ambiguous place between access and publication."
  • digitized archives reproduce unique archival material with finding aids but without significant editorial commentary that allows for open-ended historical inquiry without the need to travel to archives  
  • the digitized archive also raises questions and challenges for historical practice, specifically 
    • the digitizing decision and funding
    • balancing digital access against some owners’ interests in restricting access
    • aspects of the physical archive that may be lost in digitization
    • the possibility of combining resources from a number of physical archives
  • most digitization projects have been selective in their scope
  • scholars cannot assume that material has been digitized, nor that all material has been digitized, unless the archive specifically states that
  • digitized material is not always freely available, e.g. subscription based archives
  • many archivists "fear that their traditional task of preparing detailed collection inventories is under threat owing to dwindling resources and the demand for digitization."

Digital Preservation notes:
  • projects have undeniable benefits for the preservation of documents and access to them.
  • In the interest of preserving their holdings and disseminating them to a broad public, archives are increasingly digitizing their collections. 
  • historians interested in digital preservation of archives, and electronic access to them, would be well advised to seek out collaborations with archivists.

Thursday, April 21, 2016

Expanding NDSA Levels of Preservation

Expanding NDSA Levels of Preservation. Shira Peltzman, Mike Ashenfelder. The Signal. April 12, 2016.
     Alice Prael and Shira Peltzman have been working on a project to update the NDSA Levels of Digital Preservation to include a metric for access. The  NDSA Levels is a tool to help organizations manage digital preservation risks. The matrix contains a tiered list of technical steps that correspond to levels of complexity and preservation activities: Storage and Geographic Location, File Fixity and Data Integrity, Information Security, Metadata and File Formats. Access is one of the "foundational tenets of digital preservation. It follows that if we are unable to provide access to the materials we’re preserving, then we aren’t really doing such a great job of preserving those materials in the first place."

They have added an Access row to the NDSA Levels designed to help measure and enhance progress in proving access. The updated Levels of Preservation:


Level One
(Protect Your Data)
Level Two
(Know Your data)
Level Three
(Monitor Your Data)
Level Four
(Repair Your Data)
Storage and Geographic Location Two complete copies that are not collocated For data on heterogeneous media (optical disks, hard drives, etc.) get the content off the medium and into your storage system At least three complete copies At least one copy in a different geographic location/
Document your storage system(s) and storage media and what you need to use them
At least one copy in a geographic location with a different disaster threat Obsolescence monitoring process for your storage system(s) and media At least 3 copies in geographic locations with different disaster threats Have a comprehensive plan in place that will keep files and metadata on currently accessible media or systems
File Fixity and Data Integrity Check file fixity on ingest if it has been provided with the content Create fixity info if it wasn’t provided with the content Check fixity on all ingestsUse write-blockers when working with original media Virus-check high risk content Check fixity of content at fixed intervals Maintain logs of fixity info; supply audit on demand
Ability to detect corrupt data
Virus-check all content
Check fixity of all content in response to specific events or activities Ability to replace/repair corrupted data
Ensure no one person has write access to all copies
Information Security Identify who has read, write, move, and delete authorization to individual files Restrict who has those authorizations to individual files Document access restrictions for content Maintain logs of who performed what actions on files, including deletions and preservation actions Perform audit of logs
Metadata Inventory of content and its storage location Ensure backup and non-collocation of inventory Store administrative metadata Store transformative metadata and log events Store standard technical and descriptive metadata Store standard preservation metadata
File Formats When you can give input into the creation of digital files encourage use of a limited set of known open file formats and codecs Inventory of file formats in use Monitor file format obsolescence issues Perform format migrations, emulation and similar activities as needed
Access Determine designated community1 Ability to ensure the security of the material while it is being accessed. This may include physical security measures (e.g. someone staffing a reading room) and/or electronic measures (e.g. a locked-down viewing station, restrictions on downloading material, restricting access by IP address, etc.)
Ability to identify and redact personally identifiable information (PII) and other sensitive material
Have publicly available catalogs, finding aids, inventories, or collection descriptions available to so that researchers can discover material Create Submission Information Packages (SIPs) and Archival Information Packages (AIPs) upon ingest2 Ability to generate Dissemination Information Packages (DIPs) on ingest3 Store Representation Information and Preservation Description Information4
Have a publicly available access policy
Ability to provide access to obsolete media via its native environment and/or emulation

1 Designated Community essentially means “users”; the term that comes from the Reference Model for an Open Archival Information System (OAIS).
2 The Submission Information Package (SIP) is the content and metadata received from an information producer by a preservation repository. An Archival Information Package (AIP) is the set of content and metadata managed by a preservation repository, and organized in a way that allows the repository to perform preservation services.
3 Dissemination Information Package (DIP) is distributed to a consumer by the repository in response to a request, and may contain content spanning multiple AIPs.
4 Representation Information refers to any software, algorithms, standards, or other information that is necessary to properly access an archived digital file. Or, as the Preservation Metadata and the OAIS Information Model put it, “A digital object consists of a stream of bits; Representation Information imparts meaning to these bits.” Preservation Description Information refers to the information necessary for adequate preservation of a digital object. For example, Provenance, Reference, Fixity, Context, and Access Rights Information.

[I've been asked to add the footnotes, which I have done. By way of clarification, my notes are the things that I want to remember from the articles I read. The real source for the concepts is the actual article itself; the link is provided at the top of the notes. - chris]


Wednesday, April 20, 2016

On the Marginal Cost of Scholarly Communication

On the Marginal Cost of Scholarly Communication. Tiffany Bogich, et al. Science.ai by Standard Analytics. 18 April, 2016.
     An article that looks at the marginal cost of scholarly communication from the perspective of an agent looking to start an independent, peer-reviewed scholarly journal. It found that vendors can accommodate all of the services required for scholarly communication for between $69 and $318 per article, and with alternate software solutions replacing the vendor services, the marginal cost of scholarly communication would drop to between $1.36 and $1.61 per article, almost all of which is the cost of  DOI registration. The development of high quality “plug-and-play” open source software solutions would have a significant impact in reducing the marginal cost of scholarly communication, making it more open to experimentation and innovation.  For the cost of long term journal preservation, the article looked at CLOCKSS and Portico.

Tuesday, April 19, 2016

Requirements on Long-Term Accessibility and Preservation of Research Results with Particular Regard to Their Provenance

Requirements on Long-Term Accessibility and Preservation of Research Results with Particular Regard to Their Provenance. Andreas Weber, Claudia Piesche. ISPRS Int. J. Geo-Inf. 11 April 2016.
     The importance of long-term accessibility increased when the “OECD Principles and Guidelines for Access to Research Data from Public Funding” was published. The description of the long-term accessibility of research data now has to be a part of research proposals and a precondition for the funding of projects.
The demand for long-term preservation of research data has developed slowly and are established in only few research areas.  Existing solutions for the long-term storage of specialized data are specialized and usually not designed for public use or reuse.

At universities, the support for the preservation of research data is mostly limited to the provision of high-available disk storage and appropriate backup solutions. Collaboration is limited tools to support the search of metadata are very rare. "The institutions that could play an important role, like libraries or IT centers, hesitate to build up solutions, because policies for the treatment of research results are not yet installed by the administration."   Solutions to manage research data would also need a very sophisticated rights management system to protect data from unauthorized access, yet also providing access. 

"Long-term preservation in a more classical sense means the bit stream preservation, and aims at a subsequent use of data in content as well as in technical purpose." A solution for the long-term preservation of research data should be compliant with OAIS. To access the specific research data, a unique identifier  would be needed and the storage has to satisfy the "norms of long-term preservation".

"Currently the most important standard is the Open Archival Information System (OAIS) reference model. The OAIS model specifies how digital assets can be preserved through subsequent preservation strategies. It is a high-level reference model, and therefore is not bound to specific technology. Although the model is complex, systems for the long-term storage of digital data will have to meet the requirements."


Monday, April 18, 2016

Calculating All that Jazz: Accurately Predicting Digital Storage Needs Utilizing Digitization Parameters for Analog Audio and Still Image Files

Calculating All that Jazz: Accurately Predicting Digital Storage Needs Utilizing Digitization Parameters for Analog Audio and Still Image Files. Krista White. ALCTS. 14 Apr 2016.

  The library science literature does not show a reliable way to calculate digital storage needs when digitizing analog materials such as documents, photographs, and sound recordings in older formats."Library professionals and library assistants who lack computer science or audiovisual training are often tasked with writing digital project proposals, grant applications or providing rationale to fund digitization projects for their institutions." Digital project managers need tools to accurately predict the amount of storage for digital objects and also estimate the starting and ongoing costs for the storage. This paper provides two formulae for calculating digital storage space for uncompressed, archival master image and document files and sound files.

Estimates from earlier sources:
  • thirty megabytes of storage for every hour of compressed audio,  
  • one megabyte for a page of uncompressed, plain text (bitmap format)
  • three gigabytes for two hours of moving image media
  • 90 megabytes for uncompressed raster image files, 
  • 600 megabytes for one hour of uncompressed audio recording, 
  • “nearly a gigabyte of disk space,” for one minute of uncompressed digital video.
  • 100 gigabytes (GB) of storage for 100 hours of audio tape
  • These can be adjusted to alter both file size and quality, depending on the choice of digitization standard, the combination of variables used in a chosen standard and the quantity of digital storage required.
Some additional notes from the article:
  • As the experiments demonstrate, the formulae for still image and audio recordings are extremely accurate. They will prove invaluable to digital archivists, digital librarians and the average user in helping to plan digitization projects, as well as in evaluating hardware and software for these projects. 
  • Digital project managers armed with the still image and audio formulae will be able to calculate file sizes using different standards to determine which standard will suit the project needs. 
  • Knowing the parameters of the still image and audio formulae will allow managers to evaluate equipment on the basis of the flexibility of the software and hardware before purchase. 
  • Using the still image and audio calculation formulae in workflows will help digital project managers create more efficient project plans and tighter grant proposals. 
  • The formulae for calculating storage sizes: length of the original audio recording, sampling rate, bit depth, and number of audio channels. 
  • Formula for Calculating File Sizes of Uncompressed, Still Images:

https://journals.ala.org/lrts/article/view/5961/7582

One of the tables in the article on calculating file size and comparing to the actual size:

Friday, April 15, 2016

Students' project revolutionizes BYU library digitization

Students' project revolutionizes BYU library digitization. Braley Dodson. Daily Herald. Apr 13, 2016.
     The Mass Archival Scanning System (MASS) was created by a group of seven engineering students working with staff at the Harold B. Lee Library at Brigham Young University.  The system is a way to speed up the process of digitizing documents from the manuscript collections.  The system, which can work eight to 10 times faster, uses an automatic rotating table that moves the documents under a camera inside a hood that controls the lighting. Once the document is imaged, the table rotates to the next document.

Thursday, April 14, 2016

Fulfill Your Digital Preservation Goals with a Budget Studio

Fulfill Your Digital Preservation Goals with a Budget Studio. Yongli Zhou. Information Technology and Libraries. April 4, 2016.   [PDF]
     Article about finding a cost effective solution for digitizing materials as part of preservation goals. Many institutions use in-house high-end scanners to scan historical and other materials. "No digitization equipment or system is perfect. They all have trade-offs in image quality, speed, convenience of use, quality of accompanying software, and cost." The article discusses in depth the comparison tests for using digital cameras for digitizing, as opposed to the expensive scanners that do not fit in many library budgets. The result is that for most archival materials a Digital single-lens reflex camera (DSLR) camera will do a better job than an overhead scanner. In most of the cases, the camera produced superior images. "This paper compares images delivered by a high-end overhead scanner and a consumer-level DSLR camera, discusses pros and cons of using each method, demonstrates how to set up a cost-efficient shooting studio, and presents a budget estimate for a studio."


Wednesday, April 13, 2016

Beyond the Binary: Pre-Ingest Preservation of Metadata

Beyond the Binary: Pre-Ingest Preservation of Metadata. Jessica Moran, Jay Gattuso. iPres 2015. Nov. 2015.
     This paper describes some of the challenges the National Library of New Zealand has faced to maintain the authenticity of born digital collections (objects and metadata) from the time they are first received until they are ingested into their Rosetta digital preservation system. Two specific challenges relate to contextual metadata of filenames and file dates.

"The digital preservation analyst is responsible for technical assessment of digital content going into the digital preservation system, and troubleshooting digital content that fails validation checks". The digital archivists serve as archival and content subject matter experts; the digital preservation analyst is the subject matter expert for technical concerns. The two perspectives allow for robust workflows that better preserve the content.  They are especially interested in "file system metadata such as filenames and dates that are not embedded with the objects themselves, but rather are stored externally in the file system". Filename and date metadata have "challenged us to think critically about what constitutes acceptable, reversible, and recordable change and where and how this metadata should be stored for preservation and later for delivery to users".

Proper handling rules means that for digital preservation we need to treat files slightly more sensitively. We might want to know what the original file extension was as it is an important part of a file’s provenance.

Most born digital objects they receive have three dates: created date, last modified date, and last accessed date. They can be used to confirm an object is what it says it is. They have a practice of "touching the original file as little as possible and only as much as needed to get the file into the preservation environment".

One solution is the creation of forensic disk images as a first step in the transfer process. Another solution would be to create "a tool to help us automate the original and any subsequent transfers of born digital content, ensure the capture of original filename and date metadata and any preconditioning actions we performed, and at the same time create a log of that activity that is auditable and both human and machine readable."  They have been developing a script to accomplish what they need.

Their ongoing questions concern the delivery of objects from the digital preservation system should include proof of the integrity and authenticity of the binary object through delivery of the associated metadata.

Tuesday, April 12, 2016

Digital Curation and the Public: Strategies for Education and Advocacy

Digital Curation and the Public: Strategies for Education and Advocacy. Jaime Mears, Mike Ashenfelder. The Signal. April 6, 2016.
     The Washington DC Public Library hosted Digital Curation and the Public: Strategies for Education and Advocacy that included a tour of the Memory Lab, a public-facing digitization lab, and a workshop, Methods of promoting digital curation to the public, that reaches the audience by creating targeted promotional and educational material about digital preservation.
  • Case studies are incredibly effective,  "especially when the absence of a digital record proves why it should have been preserved". 
  • Other methods include train-the-trainer programs and creating engaging educational resources, such as the Activist’s Guide to Archiving Video.
  • Sometimes effectiveness is a matter of timing, such as waiting to contact people until they have enough material to care about preservation. 
  • Including preservation education into larger training sessions that address other needs.
  •  Identify four or five communities to support and identify the challenges and strategies to working each community.  
  • Digital content creators have to understand that preservation is a necessary part of effective life-cycle management and the long-term value of content.

Saturday, April 09, 2016

A DNA-Based Archival Storage System

A DNA-Based Archival Storage System. James Bornholt, et al. ACM International Conference. April 6, 2016.
    This paper presents an architecture for a DNA-backed archival storage system. "Demand for data storage is growing exponentially, but the capacity of existing storage media is not keeping up." All data worldwide is expected to exceed 16 zettabytes in 2017. For some, using DNA as a storage medium is a possibility because it is extremely dense. Most data today is stored on magnetic and optical media, but storage durability is another critical aspect of archiving.Spinning disks are "rated for 3–5 years, and tape is rated for 10–30 years."

A DNA storage system must overcome several challenges:
  1. DNA synthesis and sequencing is far from perfect, with error rates on the order of 1% per nucleotide. Stored sequences can also degrade compromising data integrity. 
  2. Randomly accessing data in DNA-based storage results in read latency and exiting work requires the entire DNA pool be sequenced and decoded. 
  3. Current synthesis technology does not scale: data beyond the hundreds of bits therefore cannot be synthesized as a single strand of DNA. Isolating only the molecules of interest is non-trivial
The presentation authors believe DNA storage is worth serious consideration and envision it as "the very last level of a deep storage hierarchy, providing very dense and durable archival storage with access times of many hours to days." It has the potential to be the ultimate archival storage solution because it is extremely dense and durable, but it is not practical yet due to the current state of DNA synthesis and sequencing.

Friday, April 08, 2016

CNI Spring 2016 Trip Report

2016-04-05: CNI Spring 2016 Trip Report. Michael L. Nelson. Web Science and Digital Libraries Research Group. April 5, 2016.
     The article is his trip report on the CNI meetings. A few of the items listed are:
1. "Digital Preservation of Federal Information Summit". Martin Halbert, Katherine Skinner, discussed "...the topic of preservation and access to at-risk digital government information."

2. "Why We Need Multiple Archives". Michael L. Nelson, Herbert Van de Sompel. Slides.
  • Two Common Misconceptions About Web Archiving
    • old content is  obsolete, stale, bad
    • The Internet Archive has every copy of everything that has ever existed
  • There are other archives that may have the same or similar content. There may be a need to resolve conflicts with the content of the archives
  • A single archive is vulnerable.
  • In the Web world, in order to monetize their content the copyright owner has to minimize the number of copies.
  • Archives aren’t magic web sites They’re just web sites.
  • Don’t Throw Away the Original URL – Use Robust Links!
3.  "Defining the Scholarly Record for Computational Research", Victoria Stodden.  Presented the "Reproducible Research Standard",

4.  "Microservices Architecture: Building Scalable (Library) Software Solutions." Jason Varghese. A website provides a detailed discussion of the APIs they've implemented

5. Storytelling for Summarizing Collections in Web Archives. Michael Nelson.

6. "Activist Stewardship: The Imperative of Risk in Collecting Cultural Heritage".  Todd Grappone, Elizabeth McAulay, Heather Briston.  They presented about the Digital Ephemera Project, and in general the role of archivists in collecting materials that may cause trouble. "The Digital Ephemera Project is an initiative to digitize, preserve and provide broad public access to print items, images, multimedia, and social networking resources produced world-wide."

Wednesday, April 06, 2016

Validating migration via emulation

Validating migration via emulation. Euan Cochrane. Digital Continuity Blog. Apr 07, 2016.
     "Automated migration of content between files of different formats can often lead to content being lost or altered." Verifying the migration of content is mostly a manual process, and when done for a large number of objects it is not-cost effective. A possible way to do this is to automatically migrate to preferred formats as much as possible and give users the option of working with the object in the “original” software as well as an emulation service. The users could look at both the migrated and emulated versions and verify that the migrated object is valid. By involving multiple users, the migrated object becomes a trusted object.

If this were done together with migration or emulation on demand, then validated digital objects could be separately ingested into a digital preservation system and preserved along with the original version. This could reduce the storage of migrated versions by "only preserving 'validated' migrated versions" and also ensure that trusted content was "available and properly preserved". 

Tuesday, April 05, 2016

Transforming User Knowledge into Archival Knowledge

Transforming User Knowledge into Archival Knowledge. Tarvo Kärberg, Koit Saarevet. D-Lib Magazine. March/April 2016.
     The Open Archival Information System (OAIS) defines long term preservation as the act of maintaining information independently understandable by a designated community. This can be very difficult to achieve in practice for a number of reasons:
  • the information may not have been sufficiently described /structured during pre-ingest or ingest 
  • the producer organization no longer existed at the time of archiving, 
  • the content may not have the desired quality level for submission 
  • resources may not be available
There are three basic terms to distinguish: data, information and knowledge, though these may not be agreed definitions:
  • discrete facts without explicit relations can be considered simple data;
  • information: content, which has relations, an aggregation of data
  • knowledge: if they are interconnected 
A "reason to have a distinction between these terms is that it provides more structure and clarity to understanding the complexity of digital preservation." It provides a better overview of archival collections and better access solutions to the archived knowledge. Metadata is crucial for digital projects but it is time-consuming to create.  As with any metadata, the further away from the actual project it is created, the costlier it becomes to achieve, up to the point where it may be practically impossible to create. Archival institutions simply lack the staff to process all their vast holdings. Another overwhelming challenge is the depth and width of expertise required for enriching the descriptions. One possible solution is to "crowd source" the knowledge, since "the users and archivists together can be more knowledgeable about the archival materials than an archivist alone can be". Crowdsourcing allows the content descriptions to include a detailed level of granularity across a broad range of subjects and collections.

Monday, April 04, 2016

Minimal Effort Ingest

Minimal Effort Ingest. Bolette Ammitzbøll Jurik, Asger Askov Blekinge, Kåre Fiedler Christiansen. Statsbiblioteket, Danmark. March 29, 2016. Poster   Abstract
     The poster won best poster award at iPres2015. An expensive part of ingesting digital collections into repositories is the quality assurance, which traditionally happens before ingest.  This ensures that only data which complies with the repository data formatting and documentation standards is preserved.

With Minimal Effort Ingest, which is a different approach to ingest and Quality Assurance, the data is ingested as it is; quality assurance happens after ingest. This method makes it possible to ingest the content quickly, especially older collections. Quality assurance failures are handled within the repository. Preservation actions can be taken in the future as needed and as resources are available. "Repositories implementing Minimal Effort Ingest are eventually consistent, content- and preservation-wise, with the OAIS model."

"Performing preservation actions post-ingest on the repository content, rather than during ingest provides benefits in both development effort and preservation liability."

When our culture’s past is lost in the cloud.

When our culture’s past is lost in the cloud. Nicholas Carr. Washington Post. March 25, 2016.
     The article begins by referring to "the 'rough draught' of the Declaration of Independence. Over Thomas Jefferson’s original, neatly penned script ran edits by John Adams, Benjamin Franklin and other Founding Fathers. Words were crossed out, inserted and changed, the revisions providing a visual record of debate and compromise. A boon to historians, the four-page manuscript provides even the casual viewer with a keen sense of the drama of a nation being born." If the Declaration were composed today it would have been written on a computer and the edits, made electronically through email or a shared Internet file, would probably have been lost. It is likely "the digital file would come to be erased or rendered unreadable by changes in technical standards. We’d have the words, but the document itself would have little resonance."
  • Abby Smith Rumsey: “A physical connection between the present and past is wondrously forged through the medium of time-stained paper,” The “distinctive visceral connection” with history may be diminished or lost when these historical items are in databases rather than in actual objects.
  • As more and more of what we know, make and experience is recorded as vaporous bits in the cloud, what exactly will we leave behind for future generations?
  • Scientists are discovering that our senses and even our emotions play important roles in recollection and remembrance. 
  • Memory is a way to navigate and make sense of the world
  • Nature embeds history in matter.
  • The technologies a society uses to record, store and share information will play a crucial role in determining the richness, or sparseness, of its legacy.  
  • In choosing among media technologies through the ages, people have tended to trade durability for transmissibility.
  • “Digital memory is ubiquitous yet unimaginably fragile, limitless in scope yet inherently unstable.” 
  • All media are subject to decay, of course. Clay cracks, paper crumbles. What’s different now is that our cultural memory is embedded in a complex and ever-shifting system of technologies. Any change in the system can render the record unreadable. 
  • If we’re not careful “the history of the twenty-first century will be riddled with large-scale blanks and silences.”
To protect our cultural legacy we "need to overcome our complacency and start taking the long-term protection of valuable data seriously. We’ll need a reinvigorated system of libraries and archives, spanning the public, private and nonprofit sectors, that are adept at digital preservation. We’ll need thoughtful protocols for determining what data needs to be saved and what can be discarded. And we’ll need to ensure that control over culturally significant data doesn’t end up in the hands of a small group of commercial enterprises that focus on profit, not posterity."