Thursday, May 07, 2015

Top 50 file formats in the KB e-Depot

Top 50 file formats in the KB e-Depot. Johan van der Knijff. KB Research. April 29, 2015.
Koninklijke Bibliotheek’s digital repository system doesn’t include any tools for automated file format identification yet, but an analysis of the “top 50” most prevalent file extensions shows:
  • The total number of of unique extensions is no less than 1163
  • The most prevalent extension is .gif
  • A long tail of extensions exists (less than 10 file objects), which account for over half of all unique extensions.
The top 10 extensions are:
  1. gif
  2. xml
  3. jpg
  4. sml
  5. pdf
  6. raw
  7. tif
  8. oa3
  9. doc
  10. htm
They grouped all extensions into 12 format categories and checked that the PCs in the reading rooms could render the objects. The conclusion was that most formats in the “Top 50” were sufficiently accessible, though there were some improvements needed.

No comments: