Saturday, March 26, 2016

Caring for file formats

Caring for file formats. Ange Albertini. Presentation at Troopers 2016. March 17, 2016. [PDF]
     The risk to preserving digital objects is very high. The "attack surface with file formats is too big". The specifications of formats are a nice guide, but they don't represent reality; they are useless for managing the formats. "We can’t deprecate formats because we can’t preserve and we can’t define how they really work."
The formats need good documentation to show the landscape and "to express the reality of file format".  Once they are better understood, then "we can preserve and deprecate older format, which reduces attack surface". Then people can focus on making the present formats more secure.

What is a file format? A computer dialect to communicate between communities; file formats are community connectors. People don't care about the format itself, they care about the characteristics and how easy it is to use. We don't need new formats, since reality will diverge from the specs anyway. The need is for up to date, traceable specs. Formats are constantly being updated with new features added. That doesn't solve the problem.  Specs should reflect reality and be "updated, enforced, realistic, freely available". Deprecation is a natural cycle, but are afraid to deprecate because "no file format is fully preserved". Formats should be open and the specs kept up to date. But it won’t happen until "we experience a great disaster".

No comments: