Monday, March 09, 2020

The Future of Past Email is PDF

The Future of Past Email is PDF. Chris Prom. Information and Data Manager (IDM). March 6, 2020.
     The article reports on a group of people who look at the question: How should governments, universities, business, and archives ensure the future generations can access and render email? A group looks at ways to capture, preserve, and render. It builds on an earlier report:
The Future of Email Archives: A Report from the Task Force on Technical Approaches for Email Archives. CLIR Publications. August 2018. [PDF
  • Email is an increasingly important part of the historical record, yet it is particularly difficult to preserve, putting future access to this vast resource at risk. It looks at what makes email archiving so complex and describes emerging strategies to meet the challenge.
  • Addressing the challenges will require commitment from stakeholders, as well as for tool support, testing, and development.
Some institutions preserve emails with MBOX, EML or PST; maintain or emulate old email environments; or transform them to XML. All these ways require a high level of technical support. Others simply store email archives.

The group suggests the PDF format could be used for email, though there are gaps and risks.
  • PDF includes data structures that could fully accommodate the diversity of email content and metadata. It is completely self-contained, PDF and designed to capture text and graphical content for archival purposes. 
  • Email-to-PDF provides a migration pathway for email messages independent of email applications and could preserve essential attributes of the message.
  •     A standardized application of PDF technology could provide source data, universally usable archival-quality renderings including attachments, and provenance metadata.
  • It could use existing standards and a diverse vendor community for preserving, searching and reusing email.
  • Using PDF could integrate with existing preservation tools for ingesting, storing, preserving and disseminating content from established repository systems already in use in government, academic, public, and corporate archives and libraries.
  • Since the PDF format is so widely implemented, there would already be a common understanding of best-practices for archiving email with PDF.
"In short, the "email archiving in PDF" concept seeks to build on widely implemented standards and technologies.  It would allow individuals and institutions a pathway to migrate email into the most widely used format for the distribution of text documents."

Currently there is a drawback for using PDF for email preservation: "attachments, metadata, context, and sometimes, even searchable text are missing. Simply "printing to PDF" fails to meet the specific needs of institutions archiving volumes of complex email messages, at least as currently implemented."  So how can "institutions ensure authenticity, completeness, privacy, security and other needs, especially when working with thousands or millions of messages, when most header metadata and attachments are lost in the conversion?"

The group identified and documented the essential characteristics and technical requirements for converting email into PDF, which will soon be published as a set of fundamental requirements for archiving email.

No comments: