Showing posts with label workflows. Show all posts
Showing posts with label workflows. Show all posts

Friday, December 14, 2018

In-House Digitization with the Lossless FFV1 Codec At the University of Notre Dame Archives. AMIA Poster

In-House Digitization with the Lossless FFV1 Codec At the University of Notre Dame Archives. Erik Dix and Angela Fritz, University of Notre Dame Archives. AMIA 2018. Poster. [pptx].
     An interesting poster at AMIA which shows their digitization workflow and processing steps from accessioning to preservation system.
WHY FFV1 as a codec for Digital Preservation Masters?
1. Lossless compression (no quality loss)
2. A Standard Definition FFV1 file is ca. 46 % of the size of the uncompressed file.
    A High Definition FFV1 file is ca. 57 % of the size of the uncompressed file.
3. FFV1 is part of the FFmpeg project and open source
4. It is safe for long term preservation.
5. Encoding into FFV1 can be done with low cost Windows PCs.
6. The video is captured in FFV1 in real time.
7. Standard definition FFV1 files can be played with the VLC media player

Digitization Workflow

Accessioning as Processing:
  • Archives conducts a preliminary inventory, assigns collection code,  creates CMS record 
  • AV materials transferred to AV Archivist for a preservation and digitization assessment
  • Descriptive and technical metadata gathered
  • Analog materials reorganized and stabilized for long-term storage.
Basic Metadata Creation:
  • AV Archivist creates item–level metadata
  • Descriptive and technical metadata promotes access and discoverability
  • Descriptive metadata added to finding aid and uploaded to the Archives, the IR, and ILS
Inspection & Prep of A-V materials:
  • Only requested AV items or at-risk items will be digitized 
  • Videotapes often require baking or splicing
VCRS without SDI output:
  • The digitization capture card uses SDI [Serial Digital Interface] connections. 
  • VHS, Betamax, and older professional formats, e.g. 1” type C, U-matic don’t have SDI outputs. 
  • A DPS-575 frame synchronizer is used to create a SDI signal from the S-video or output of these items
  • Basic color correction is done at this step if necessary. 
  • The SDI signal from the frame synchronizer is then split in two to feed a Windows PC for the creation of the FFV1 preservation file and to feed a Mac computer to create an Apple ProRes 422 mezzanine file.
VCRs with SDI output:
  • They have VCRs for the DV tape family from Mini DV up to DVCPro HD, DVCam, and HDV, as well as the Betacam tape family from Betcam to HDcam that can output an SDI or HD-SDI signal. 
  • The signal is also split in two to simultaneously create a FFV1 file on a Windows Pc and an Apple ProRes 422 file on a MAC.
Digital Preservation System:
  • Use an LTO tape library for the storage of our digitized files. 
  • Currently, the Archives is evaluating digital preservation systems for implementation. 
  • Archives capabilities will be expanded to provide digital preservation micro-services to ensure continued access to its digital collections.


Thursday, November 15, 2018

Announcing the Digital Processing Framework

Announcing the Digital Processing Framework. Erin Faulder, et al. bloggERS! November 13, 2018.   [PDF]
     The Digital Processing Framework suggests a minimum processing method for digital archival content. The framework brings together archival processing practice and digital preservation activities. The intention is to  promote consistent practices and to establish common terminologies.  A few of the 23 framework activities are: 
• Survey the collection
• Capture digital content off physical media
• Create checksums for transfer, preservation, and access copies
• Determine level of description
• Identify restricted material based on copyright/donor agreement
• Gather metadata for description
• Organize electronic files according to intellectual arrangement
• Perform file format analysis
• Identify deleted/temporary/system files
• Manage personally identifiable information (PII) risk
• Normalize files
There is a reusable Excel version of the framework as well. The framework is for people who "process born digital content in an archival setting and are looking for guidance in creating processing guidelines and making level-of-effort decisions for collections."  It was designed to be practical, usable, and adaptable to local institutional settings.

Thursday, November 30, 2017

Libraries and the Preservation of Cultural Heritage: Building on the Past to Develop Our Future

Libraries and the Preservation of Cultural Heritage: Building on the Past to Develop Our Future. Chris Erickson. XXVII Congresso Brasileiro de Biblioteconomia, Documentação e Ciência da Informação. Fortaleza, Brazil. October 19, 2017.
     A few weeks ago I had the wonderful opportunity to visit Brazil and give this presentation on digital preservation at the Congresso Brasileiro de Biblioteconomia, Documentação e Ciência da Informação. Many thanks to Adriana Cybele Ferrari (FEBAB) and Anderson de Santana. The presentation provides an overview of digital preservation concepts as well as our approach to preservation and our workflow.

Wednesday, October 11, 2017

Digital Preservation, Eh?

Digital Preservation, Eh? Alexandra Jokinen. bloggERS! February 14, 2017.    
     This is a post about international perspectives on digital preservation and about digital preservation in an institution in Canada. One way they are working on digital preservation, which they see as a very large, very complex (but exciting!) endeavour is to "start on a small scale, focusing on the processing of digital objects within a single collection, and then using those experiences to create documentation and workflows for different aspects of the digital archives program."  They chose one collection to start with and the first area of focus was appraisal. Their next step will be to physically organize the material, and the final steps will be to take the born-digital content that has been collected and create Archival Information Packages for storage and preservation with Archivematica . They want to "ensure that solid policies and procedures are in place for maintaining a trustworthy digital preservation system in the future."

Thursday, August 03, 2017

Library Preservation Workflows: Importing, Exporting, and Managing Content

Library Preservation Workflows: Importing, Exporting, and Managing Content. Chris Erickson. June 12, 2017. [PDF slides]
     This is my presentation at the Eight Annual Rosetta's Advisory Group meeting held in June at the wonderful University of Sheffield. This is my favorite conference because of the attendees, the topics discussed, the interaction with the Ex Libris employees who attend, and the many things I learn about digital preservation and Rosetta, in the Advisory meeting and in the accompanying Rosetta Users Group. I hate to see this conference end.

The short presentation is a view of some of the ongoing changes and refinements we have made to our digital preservation workflow in the past year. We have worked to streamline our processes, both because of the increased volume of content we ingest into Rosetta, and also the desire to minimize the file movement and copying during the processing. In addition, we have used our preservation repository to recover documents in our access systems that became unavailable.

During the year we have updated our digital preservation policies to help determine our preservation selection workflow. They include:
The changes have helped with a smoother transition from selection and acquisition, processing and SIP creation and submission to Rosetta, and the preservation disposition.


Wednesday, August 02, 2017

Digital Preservation Workflow Curriculum

Digital Preservation Workflow Curriculum. Mary Molinaro. DPN, AVPreserve. August 2, 2017.
     DPN and AVPreserve have developed a "digital preservation workflow curriculum to share with DPN members and others in the digital preservation community". This workshop curriculum, released with a Creative Commons license, will provide participants with skills and knowledge to implement and manage a digital preservation program within their organization. They ask that the terms of the CC-BY-SA license be observed.

The workshop modules show the requirements of a digital preservation ecosystem from the viewpoints of governance / program management, as well as asset management. This is not an introduction to digital preservation or the OAIS model; instead it looks at the 'why' and 'how' questions of "making digital preservation an underlying, operational function of an organization". The curriculum, which is available here in a zip file, consists of:


Tuesday, March 28, 2017

Thumbs.db – what are they for and why should I care?

Thumbs.db – what are they for and why should I care? Jenny Mitcham. Digital Archiving at the University of York. 7 March 2017.
     Post about the thumbs.db system files and how to deal with them in an archival situation. Windows uses a file called Thumbs.db to create thumbnail images of any images within a directory, and the thumbs.db files are stored in each directory that contains images. They proliferate quickly. If the Windows Explorer preferences must be set to display hidden files and "Hide protected operating system files" option also needs to be disabled in order to see these and other hidden files.  IT can change account options to stop these thumbnail images from being created.

"Do I really want these in the digital archive? In my mind, what is in the ‘original’ folders within the digital archive should be what OAIS would call the Submission Information Package (SIP). Just those files that were given to us by a donor or depositor. Not files that were created subsequently by my own operating system."

[In our data ingest workflow, we use a utility that creates a csv file of items in directories for processing. The csv file is the ingest template which contains the file names and file metadata. This controls the files that are ingested. Unwanted files are removed from the csv file, which means that during ingest time, they are excluded from being ingested into Rosetta. - Chris]

Wednesday, March 15, 2017

Developing a Digital Preservation Infrastructure at Georgetown University Library

Developing a Digital Preservation Infrastructure at Georgetown University Library. Joe Carrano, Mike Ashenfelder. The Signal. March 13, 2017.
     At the library of Georgetown University, half of the library IT department is focused on digital services such as digital publishing, digitization and digital preservation. These IT and library functions overlap and support each other, which creates a need for the librarians, archivists and IT to work together. It provides better communication and makes it easier to get things done. "Often it is invaluable to have people with a depth of knowledge from many different areas working together in the same department. For instance, it’s nice to have people around that really understand computer hardware when you’re trying to transfer data off of obsolete media." 

While digital preservation and IT is centered in one department, the preservation files are in different systems and on different storage mediums throughout the library, but they are in the process of  putting them into APTrust.  Several strategies to improve their digital preservation management are:
  1. Implement preservation infrastructure, including a digital-preservation repository
  2. Develop and document digital-preservation workflows and procedures
  3. Develop a training program and documentation to help build skills for staff
  4. Explore and expand collaborations with both university and external partners to increase the library’s involvement in regional and national digital-preservation strategies.
These goals build upon each other to create a sustainable digital-preservation framework which includes APTrust and the creation of tools to manage and upload the content, particularly creating  custom automated solutions to fit their needs. They are also developing documentation and workflows so any staff member can "upload materials into APTrust without much training".

Librarians and archivists need to be trained and integrated into the process to ensure the sustainability of the project’s outcome and to speed up the ingest rate. "Digital curation and preservation tasks are becoming more and more commonplace and we believe that these skills need to be dispersed throughout our institution rather than performed by only a few people". 

"By the end of this process we hope to have all our preservation copies transferred and the infrastructure in place to keep digital preservation sustainable at Georgetown."

Monday, March 13, 2017

What Makes A Digital Steward: A Competency Profile Based On The National Digital Stewardship Residencies

What Makes A Digital Steward: A Competency Profile Based On The National Digital Stewardship Residencies. Karl-Rainer Blumenthal, et al. Long paper, iPres 2016. (Proceedings p. 112-120 / PDF p. 57-61).
       Digital stewardship is the active and long-term management of digital objects with the intent to preserve them for long term access. Because the field is relatively young, there is not yet a "sufficient scholarship performed to identify a competency profile for digital stewards". A profile details the specific skills, responsibilities, and knowledge areas required and this study attempts to describe a competency profile for digital stewards by using a three-pronged approach:
  1. reviewing literature on the topics of digital stewardship roles, responsibilities, expected practices, and training needs
  2. qualitatively analyzing current and completed project descriptions
  3. quantitatively analyzing the results from a survey conducted that identified competencies need to successfully complete projects
"This study had two main outputs: the results of the document analysis (qualitative), and the results of the survey (quantitative)."  Seven coded categories of competence emerged from the analysis:
  1. Technical skills;
  2. Knowledge of standards and best practices;
  3. Research responsibilities;
  4. Communication skills;
  5. Project management abilities;
  6. Professional output responsibilities; and
  7. Personality requirements.
Based on the responses for Very important and Essential, a competency statement representing this profile would suggest that "effective digital stewards leverage their technical skills, knowledge of standards and best practices, research opportunities, communication skills, and project management abilities to ensure the longterm viability of the digital record." They do this by:
  • developing and enhancing new and existing digital media workflows
  • managing digital assets
  • creating and manipulating asset metadata
  • commit to the successful implementation of these new workflows
  • manage both project resources and people
  • solicit regular input from stakeholders
  • document standards and practices
  • create policies, professional recommendations, and reports,
  • maintain current and expert knowledge of standards and best practices for metadata and data management
  • manage new forms of media
The study suggests that, in practice, technical skills are not always as essential in digital stewardship as job postings suggest. Hardware/software implementation and Qualitative data analysis skills were important to only half of the respondents. Workflow management is a universally important skill deemed ”Essential" by almost all respondents. Other categories appeared as Somewhat Important, or as areas that need further research.

The study suggests that "although specific technical skills are viewed as highly important in different settings, a much larger majority of projects required skills less bound to a particular technology or media, like documentation creation and workflow analysis."  Digital stewards should possess, not only a deep understanding of their field, but the ability to "effectively disseminate their work to others."

Thursday, December 01, 2016

Implementing Automatic Digital Preservation for a Mass Digitization Workflow

Implementing Automatic Digital Preservation for a Mass Digitization Workflow. Henrike Berthold, Andreas Romeyke, Jörg Sachse.  Short paper, iPres 2016.  (Proceedings p. 54-56 / PDF p. 28-29). 
     This short paper describes their preservation workflow for digitized documents and the in-house mass digitization workflow, based on the Kitodo software, and the three major challenges encountered.
  1. validating and checking the target file format and the constraints to it,
  2. handling updates of d content already submitted to the preservation system, 
  3. checking the integrity of all archived data in an affordable way
They produce several million scans a year and preserve these digital documents in their Rosetta based archive which is complemented by a submission application for pre-ingest processing, an access application that prepares the preserved master data for reuse, and a storage layer that ensures the existence of three redundant copies of the data in the permanent storage and a backup of data in the processing and operational storage. They have customized Rosetta operations with plugins they developed.  In the workflow, the data format of each file is identified, validated and technical metadata are extracted. AIPS are added to the permanent storage (disk and LTO tapes). The storage layer, which uses hierarchical storage management, creates two more copies and manages them.

To ensure robustness, only single page, uncompressed TIFF files are accepted. They use the open-source tool checkit-tiff to check files against a specified configuration. To deal with AIP updates, files can be submitted multiple times: the first time is an ingest, all transfers after that are updates. Rosetta ingest functions can add, delete, or replace a file. Rosetta can also manage multiple versions of an AIP, so older versions of digital objects remain accessible for users.

They manage three copies of the data, which totals 120 TBs. An integrity check of all digital documents, including the three copies, is not feasible due to the time that is required to read all data from tape storage and check them. So to get reliable results without checking all data in the archive they use two different methods:

  • Sample Method Integrity 1% sample of archival copies is checked yearly 
  • Specified fixed bit pattern workflow that is checked quarterly.

Their current challenges are in developing new media types (digital video, audio, photographs and pdf documents), unified pre-ingest processing, and automation of processes (e.g. to perform tests of new software versions).


Tuesday, June 28, 2016

Protecting the Long-Term Viability of Digital Composite Objects through Format Migration

Protecting the Long-Term Viability of Digital Composite Objects through Format Migration. Elizabeth Roke, Dorothy Waugh. iPres 2015 Poster. November, 2015.
     The poster discusses work done at Emory University’s Manuscript, Archives, and Rare Book Library to "review policy on disk image file formats used to capture and store digital content in our Fedora repository". The goal was to to migrate existing disk images to formats more suitable for long-term digital preservation. Trusted Repositories Audit & Certification (TRAC) requires that digital repositories monitor changes in technology in order to respond to changes. Advanced Forensic Format offered a good solution for capturing forensic disk images along with disk image metadata, but Libewf by Joachim Metz, which is a library of tools to access the Expert Witness Compression Format (EWF) has replaced it. They have decided to acquire raw disk images, or when not possible, to use tar files, because the disk images may be less vulnerable to obsolescence.

In attempting to migrate formats, they had to develop methods for migrating the files setup the repository to accept the new files. They also rely on PREMIS metadata.  The migration of disk images from a proprietary or unsupported format to a raw file format has made it easier for us to manage and preserve these objects and mitigates the threat of obsolescence for the near term. There have been some consequences. Some metadata is no longer available. Also, the process will be more complicated and require other workflows, and files will no longer contain embedded metadata. "The migration to a raw file format has made the digital file itself easier to preserve."



Friday, April 22, 2016

Providing Access to Disk Image Content: A Preliminary Approach and Workflow

Providing Access to Disk Image Content:  A Preliminary Approach and Workflow. Walker Sampson, Alexandra Chassanoff. iPres 2015. November 2015.   Abstract    Poster
     The paper describes a proposed workflow that can be used by collecting institutions acquiring disk images to support the capture, analysis, and final access to disk image content of born-digital collections. The materials present certain challenges. Some use open-source digital forensics software environments like BitCurator, for the capture and analysis of these born-digital materials.

The workflow is for the research archives at the University of Colorado Boulder; they do not have a digital repository or collection management software deployed. However it "addresses the immediate needs of the material, such as bit-level capture and triage, while remaining flexible enough to have the outputs integrate with a future digital repository and collection management software." It allows researchers to access a bit-level copy of a floppy disk found in an archival collection. Access is typically regarded as the last milestone of processing work.

The workflow for processing born-digital materials starts with obtaining the physical disk; it is photographed then a disk image is created. The BitCurator Reporting Tool generates analytic reports and other programs can be carried out here as well. The total output from BitCurator is placed into a single BagIt package and uploaded to a managed storage space with redundant copies. That will be the AIP in a future repository. The disk image can provide access to the public.

Wednesday, March 16, 2016

File identification ...let's talk about the workflows

File identification ...let's talk about the workflows. Jenny Mitcham. Digital Archiving at the University of York. 27 November 2015.
     When adding files to a digital archive, an important questions is "What file formats have we got here?" Knowing this can:
  • determine the right software to open the file and view the contents 
  • start the conversation with the data provider about what formats are best to use for archiving
  • discuss the risks on the format and define a migration pathway for preservation and/or access
There are many tools for working with formats; each tool has strengths and weaknesses. Defining a workflow can help determine how best to use these tools, how to interact with them, or if manual steps should be taken instead. File identification tools are often incorporated into digital preservation systems that may determine the workflow in using the tools. Additional workflow questions around format tools include:
  • what should happen if ingested data can't be identified?  
  • should the curator/digital archivist be able to over-ride file identifications?
  • what should happen if there is more than one possible identification for a file?
  • is there a sustainable manual identification process if tools cannot identify a file? 
  • how to contribute to file format registries such as PRONOM
  • is the digital preservation system configurable enough to resolve these questions? 
Their Archivematica development work is focusing in the first instance on allowing the digital curator to see a report of the files that are not identified in order to understand the problem.

[Our Rosetta system has a format library that handles these questions, as well as a user driven Format Working Group that helps resolve questions and interacts with PRONOM if there are questions, changes or new additions. - Chris]

Wednesday, March 09, 2016

The Human Face of Digital Preservation: Organizational and Staff Challenges and Initiatives at the Bibliothèque nationale de France

The Human Face of Digital Preservation: Organizational and Staff Challenges and Initiatives at the Bibliothèque nationale de France.  Emmanuelle Bermès, Louise Fauduet.  iPres, October 2009.  [Video, full paper, slides]
     Great presentation. The National Library has been working for several years on their digital preservation efforts, with the Spar  project (Système de Préservation et d'Archivage Réparti). They are looking at the human aspects of their digital projects. The library has become digital as a whole, which was a major change. Earlier the digit library was treated differently from the rest of the library. Originally the digital side was led by experts or early adopters who were learning by doing and were organized separately from the rest of the library workflows. Digital definitely meant "different". Now the library has become digital, which means  they have regular production teams running operational projects for digital content and digital preservation.  All organizations within the library are involved in these tasks and there are formal training processes. The library shifted and part of this was a large scale shift in the scale of content digitized, as well as making digital activities closely related with traditional library skills. If you want to disseminate the digital activities through out the library you have to disseminate the tasks and the people as well. You have to take the time to train everyone and move slowly to bring all people along, or you leave people behind.

Many of the digital activities can be related to the other workflows, like ingest and acquisitions, metadata, cataloging, etc. Relate traditional librarian skills with digital skills; digital can be built on traditional library knowledge. Help integrate the digital by having common projects that people can work together on. It is important to take time to stop and look back at what has been done and talk about it. Difficult to take advantage of what has been done if you don't review. The library took time to move the conceptual OAIS model into the reality of the library organization, to decide how does it really fit, define roles and decide how the employees would interact.

The Library created a digital training curriculum and opened it up to everyone in the library to learn. "This training includes an introduction to digital libraries, digital documents, and digital preservation, and then three optional tutorials, one on metadata and protocols (including semantic web technologies), one on user oriented design (including usage studies, accessibility and usability, and the Web 2.0), and one on digitization and preservation (including rights management, preservation metadata and persistent identifiers)." They had to draw a line between those who wanted the training for their job, or those who were just curious. They started a project about organizations and human resources under digital influence to better understand the digital library and their people. Six wishes:
  1. Clarify the institution’s policy and digital strategic vision
  2. Define priorities 
  3. Define what a digital collection is
  4. Facilitate transverse workflows that span easily across organizational borders
  5. Develop digital skills
  6. Analyze job qualifications, revisiting job requirements and updating staff skills

Friday, February 05, 2016

Developing a Born-Digital Preservation Workflow

Developing a Born-Digital Preservation Workflow. Jack Kearney, Bill Donovan. April 8, 2014.
     Presentation that looks at developing a systematic approach to preserving digitally born collections. The example from Boston College are the Mary O’Hara papers. This was an opportunity for a collaborative project involving the Digital Libraries, Archives, and the Irish Music Center.
  • Important elements of the workflow:
  • Chain of Custody, 
  • Digital Forensics, 
  • Computed initial checksums, 
  • File/folder names, 
  • Local Archival Copies, Distributed Digital Preservation
“Digital forensics focuses on the use of hardware and software tools to collect, analyze, interpret, and present information from digital sources, and ensuring that the collected information has not been altered in the process.” The presentation has some specific steps and procedures in ways to not alter the information, including multiple copies, write blockers, and such. In working with external drives, they would build and output an inventory taken with this Unix command:
     :  find directory-name  -type f -exec ls -l {} ; >c:\data\MOH\inventory.txt

Local conventions regarding naming files and folders:
  • Use English alphabet and numbers 0 - 9
  • Avoid punctuation marks other than underscores or hyphens.
  • Do not use spaces.
  • Limit file/folder names to 31 characters, including the 3 digit extension . Prefer shorter names.
  • Decision: They may remediate folder and file names, but only for the working copies.
They also look for files that need actions taken:
  • Any files off-limits or expendable? System files,
  • Personally Identifiable Information (PII)
  • Unsupported Formats (Can normalize using Xena)
  • They also use a variety of tools, such as: FITS,  JHove 
Important to keep track of digital preservation actions:
  • File migrations
  • Obsolete file formats
  • Proprietary file formats
  • Metadata changes

Saturday, January 23, 2016

Exactly: A New Tool for Digital File Acquisitions

Exactly: A New Tool for Digital File Acquisitions. AVPreserve News. January 13, 2016.
     A new tool, Exactly, has been developed to help to acquire born digital content from donors and to start establishing provenance and fixity early in the acquisition process. The tool:
  • can remotely and safely transfer any born-digital material from a sender to a recipient 
  • uses the BagIt File Packaging Format
  • supports FTP transfer, network transfers
  • can be integrated into sharing workflows using Dropbox or Google Drive
  • metadata templates can be created for the sender to fill out before submission
  • can send email notifications with transfer data and manifests when files have been delivered