Wednesday, March 16, 2016

File identification ...let's talk about the workflows

File identification ...let's talk about the workflows. Jenny Mitcham. Digital Archiving at the University of York. 27 November 2015.
     When adding files to a digital archive, an important questions is "What file formats have we got here?" Knowing this can:
  • determine the right software to open the file and view the contents 
  • start the conversation with the data provider about what formats are best to use for archiving
  • discuss the risks on the format and define a migration pathway for preservation and/or access
There are many tools for working with formats; each tool has strengths and weaknesses. Defining a workflow can help determine how best to use these tools, how to interact with them, or if manual steps should be taken instead. File identification tools are often incorporated into digital preservation systems that may determine the workflow in using the tools. Additional workflow questions around format tools include:
  • what should happen if ingested data can't be identified?  
  • should the curator/digital archivist be able to over-ride file identifications?
  • what should happen if there is more than one possible identification for a file?
  • is there a sustainable manual identification process if tools cannot identify a file? 
  • how to contribute to file format registries such as PRONOM
  • is the digital preservation system configurable enough to resolve these questions? 
Their Archivematica development work is focusing in the first instance on allowing the digital curator to see a report of the files that are not identified in order to understand the problem.

[Our Rosetta system has a format library that handles these questions, as well as a user driven Format Working Group that helps resolve questions and interacts with PRONOM if there are questions, changes or new additions. - Chris]

No comments: