Monday, August 31, 2015

Beyond TIFF and JPEG2000: PDF/A as an OAIS Submission Information Package Container

Beyond TIFF and JPEG2000: PDF/A as an OAIS Submission Information Package Container. Yan Han. Library Hi Tech. 2015.
     PDF/A can be used as a file format, but it can also be used as OAIS SIP containers. The PDF/A open standards can "simplify digitization process, reduce digitization cost, improve production substantially and build more confidence for preservation and access." PDF/A can be used as an Archival Information Package container.

The three main goals of PDF/A are to:
  • provide a way to present the appearance of documents independent of the tools and systems used
  • provide a framework for recording the context and history of electronic documents in the metadata
  • define a framework for representing the logical structure of electronic documents within conforming files

A typical SIP may consist of a directory containing the following information"
  • Content: 
    • Preservation master files (such as TIFF images files). 
    • Access files (such as a PDF or JPG / JPG2000 files).
    • Other content (such as OCR data).
  • Preservation description: 
    • Preservation metadata in the TIFF header
    • Other structural and technical metadata
    • Checksum files.
  • Packaging information: 
    • Directory and File naming, structural metadata.
  • Descriptive information: 
    • Descriptive metadata saved in digital management system, catalog, or textual/XML files.
"The key requirement of PDF/A is that it is self-described and self-contained so that it can bereproduced exactly the same way with different software in various platforms." It will include all information needed to display the content in the PDF/A file (text, images, fonts, and color profiles).

Master file formats should be non-proprietary, open and documented international standards that are  commonly used. The files should be unencrypted, and should be uncompressed or else use lossless compression. The author of the article recommends using PDF/A as the preferred file format for text and image files, and possibly using it as an OAIS SIP container. The author shows how PDF/A is a better file format than the currently preferred TIFF or JPEG2000 formats.

There are several issues with PDF/A naming and implementation. The most critical need is reliable open source software for producing and validating PDF/A files.

Yes, We’re Still Talking About Email

Yes, We’re Still Talking About Email. Lynda Schmitz Fuhrig. Smithsonian Institution Archives. August 4, 2015.
     There has been talk that email was going away.Some say that email is being replaced by texting and social media tools, which are options depending on the message content and who it is intended for. But email still has many uses. A business contract is not being sent via Facebook Messenger, and there are still many online forms that require an email address.

Even if email is obsolete in five years (as claimed in an article called "Why Email Will Be Obsolete by 2020") institutions will continue to receive email accounts from previous years that need to be accessible to researchers.

Archives, libraries, museums, universities, and various organizations are exploring email preservation challenges within their collections. Email messages and attachments come from artists, authors, professors, and government officials, to name a few. Researchers, scholars, and journalists have always had an interest in the correspondence from the past. Previously this information was in the printed form of letters, memos, cards, etc.

There are a number of projects that are being developed to provide access and to help preserve email collections. The various options that are being tested and implemented demonstrate that many institutions and organizations understand the importance of preserving email communications from the late 20th and early 21st centuries. Some examples of projects are:
The Smithsonian has been testing an in-house program called DArcMail (Digital Archive Mail System) to provide XML preservation output and a database for searching email messages and attachments within accounts.

Related posts:

Saturday, August 29, 2015

A multi-agent approach for autonomous digital preservation

A multi-agent approach for autonomous digital preservation. J. Pellegrino,  M. Maggiora, W. Allasia. Multimedia & Expo Workshops, 2015 IEEE International Conference. June 29 2015-July 3 2015. 
     Digital obsolescence is caused by the ongoing development of new software and new formats, so the risk of obsolescence can be estimated from a global environment.The model described in the article
presents two main strategies to cope with digital obsolescence: migration and emulation, and then focuses on migration. Migration consists in converting digital objects into a new and more recent format and is the process we are going to focus on in this work.

An agent can be defined as a computer system that is capable of autonomous actions to meet its  objectives. The environment and all the agents that interact together to share information constitute a multi-agent system. The agents interact together in a variety of ways to meet the objectives. An agent-based model could be set up to deal with digital preservation issues; they can acquire, evaluate and share information in order to understand obsolescence risks and the best preservation action to perform. This will require a trust relationship between archive entities. There are some tools that perform obsolescence identification and metadata extraction, such as AONS (Automated Obsolescence Notification System), DROID (Digital Record and Object Identification), and JHOVE (JSTOR/Harvard Object Validation Environment).

Preservation processes require institutions to define a preservation plan. PLANETS (Preservation and Long-term Access through Networked Services) and Scout (developed within the SCAPE Project) are tools that help with identifying preservation issues and managing digital repositories. None of the tools discussed so far use the described model which aims to emulate a distributed environment where  archive entities can share information to find solutions to the digital preservation issues.

The model is self updating from preservation actions taken, and is made up of three main parts:
  1. Global: it includes all those variables accessible to every agent. 
  2. Entities: contains the declarations of all the species of agents that take part in the model. 
  3. Experiment: dedicated to the experimental setup.
The work presented in this paper provides a novel approach to the decision processes concerning one of the most common digital preservation issues such as the migration process. The agents in this migration process have the capability of communicating, cooperating and propagating information about the performed actions in order to help each of the other agents find the best solution to a given preservation issue. The goal is to provide a framework so users can simulate different digital preservation scenarios and approaches. The model has show the stability of the framework under various use cases.

Update on JHOVE

Update on JHOVE. Gary McGath. Mad File Format Science blog. August 27, 2015.
  Open Preservation Foundation has accepted the stewardship of JHOVE, and Carl Wilson has made impressive progress.  Changes include reorganizing the code and making installation more straightforward.

Friday, August 28, 2015

CCIA Releases Webinar, Whitepaper: “Copyright Reform For A Digital Economy”

CCIA Releases Webinar, Whitepaper: “Copyright Reform For A Digital Economy”. Heather Greenfield. Computer & Communications Industry Association.  August 25, 2015. [PDF]
     There is a possibility of copyright reform in the near future. The Computer & Communications Industry Association has released its latest whitepaper “Copyright Reform For a Digital Economy” along with a webinar. Points and quotes from the whitepaper:

Copyright exists to incentivize authorship with the prospect of economic reward and to promote the public good. Copyright grants temporary monopoly privileges in order to turn private motivation into public benefit, so the laws must always be balanced in favor of the public interest. This is one of various ways to promote creativity and innovation.

Any reform to copyright law should include two important principles:
  1. accommodate new technology innovation and commerce so as not to make every licensee or consumer a copyright infringer; and
  2. provide certainty to businesses that are not the “content industry” but are nevertheless substantially affected by the Copyright Act.
Other notes and quotes:
  • Copyright is an important way to "incentivizing the creation of certain kinds of economically valuable content" however it inhibits some creative models.
  • Empowered by new technology and the Internet, virtually everyone may become a content creator with a potentially limitless audience.
  • Survey data suggests there is declining public respect for copyright.
  • Exceptionally long copyright terms have proven to be a significant problem for researchers, historians, and preservationists.
  • “[o]verprotecting intellectual property is as harmful as underprotecting it. Creativity is impossible without a rich public domain.”
Technology, especially Internet-enabled technology, has radically advanced the economy in recent years. To make sure that this growth continues, "any copyright reform should acknowledge the significance of doctrines ensuring copyright flexibility, particularly limitations and exceptions like the fair use doctrine and first sale."

Another difficulty is the interaction of copyright law with contract "It is not uncommon for licenses to prohibit the user from exercising rights under the Copyright Act, such as fair use" which means that a contract can defeat Congressional intent.

Copyright trolls are known for their abusive litigation tactics, such as “baseless shakedowns,” shotgun style law suits, and using copyrights acquired for the purpose of suing. They rely on astronomical awards for damages  which they use as a punitive tool, “designed to discourage wrongful conduct.” 

Congress needs to, among other things, ensure fair use is central to any legislation, preserve the first sale doctrine, reform the licensing landscape, and provide greater guidance under the law.

It is important that any copyright reform makes the system fit for a digital environment. [I would add to allow preservation of digital content.]

The Internet Is Failing The Website Preservation Test

The Internet Is Failing The Website Preservation Test. Ron Miller. Tech Crunch. August 27, 2015.
     Article about an author finding out that information may not remain on the internet very long. There are issues with content preservation on the internet. "If the internet is at its core is a system of record, then it is failing to complete that mission." When websites disappear, all of the content may disappear as though it never existed. That "can have a much bigger impact than you imagine on researchers, scholars" or others.  The content "should all be automatically archived, a digital Library of Congress to preserve and protect all of the content on the internet."

Publishers cannot be relied upon to keep an historical record. "When it no longer serves a website owner’s commercial purposes, the content can disappear forever." That will leave large gaps in the online record. “So much of our communication and content consumption now ... is online and in digital form. We rely on publishers (whether entertainment, corporate, scientific, political) that have moved to predominantly, if not exclusively, digital formats. Once gone or removed from online access we incur black holes in our [online] memory”. The lost content "extends to science, law, education, and all sorts of other cultural aspects that depend on referencing a stable and informative past to build our knowledge for the present. The loss is pernicious, because we don’t immediately notice it - it’s only over time we realize what we have lost." The problem of link rot extends to many areas, including the legal profession where it is having an enormous impact on legal research.

Organizations, such as The Internet Archive, can offer partial solutions, but it can be a challenge to find what we are looking for in the vast archive. The access tools are lacking. "Content preservation should not be the sole responsibility of individuals or businesses. We have to find a way to make it an official part of the coding process." We should try to find "automated technological solutions to preserve our content for future generations. At the very least, we are duty bound to try."

Digital Curation: think use, not preservation

Digital Curation: think use, not preservation. Jane Stevenson. Archives Hub blog. October 29, 2010.
     From a keynote presentation by Chris Rusbridge about the Blue Ribbon Task Force (BRTF) and  Sustainable Digital Preservation and Access. The most elegant technical solution is no good if it is not sustainable; digital preservation has to be a sustainable economic activity. The focus is on the economic and organizational problems. "It is not just about money; it requires building upon a value proposition, providing incentives to act and defining roles and responsibilities."

No one specifically ‘wants’ preservation; they want access to a resource.  Digital preservation represents a derived demand; it isn't easy to convince someone that they want preservation--you have to sell it on some other basis, such as selling others on the importance of providing use over time. Digital preservation is also ‘path dependent’, which means that your actions and decisions will change over time and they will be different as materials move through the life-cycle. Your actions today can also remove other options for all time.

Often the value of digital preservation is not recognized nor valued, so long-term preservation activities are often funded by short-term allocations. Usually it is not clear who has the responsibility for digital preservation so appropriate organization and oversight is essential for efficient ongoing preservation.

The task force reports that it is essential to:
  1. show a compelling value proposition; 
  2. provide clear incentives to preserve content; 
  3. define preservation roles and responsibilities.
Often people think that digital preservation is expensive, but we must remember that it is in fact relatively cheap when compared to the preservation of physical archives that often require acid-free boxes, rows and rows of shelving, secure, controlled search rooms.  "So, if the cost is actually not prohibitive, and the technical know-how is there, then it seems imperative to address the organisational issues and to really hammer home the true value of preserving our digital data."

Thursday, August 27, 2015

Google is not the answer: How the digital age imperils history

Google is not the answer: How the digital age imperils history. John Palfrey. Salon.  May 30, 2015.
     We get better at storing digital content, but are not good and preserving our digital history. The problem in brief is that no one is doing enough to select and preserve the bits that really matter.
"One of the great paradoxes of the digital age is that we are producing vastly more information than ever before, but we are not very good at preserving knowledge in digital form for the long haul." Industry is good at creating storage systems but not very good at choosing and preserving the data that matters, and then being able to make it useful in the future. "We are radically underinvesting in the processes and technologies that will allow us to preserve our cultural, literary and scientific records."  We are continuously making progress in how we store our media, and trapping information in lost formats in the process. Obsolescence of unimportant information may, in fact, be a blessing, but not when the lost knowledge has historical significance.

It is possible to transfer information from one format to another; with enough effort and cost, most data can be transferred to formats that can be read today. But different problems come when we create information at such speed and scale.  Most data companies now are for-profit firms that are not in the business of long-term storage. And, unlike universities, libraries and archives, these businesses will probably not be around for hundreds of years. Plus, the amount of important information being created makes it very difficult to create scale-able solutions to curate the meaningful content.

"Today, librarians and archivists are not involved enough in selecting and preserving knowledge in born-digital formats, nor in developing the technologies that will be essential to ensuring interoperability over time. Librarians and archivists do not have the support or, in many cases, the skills they need to play the central role in preserving our culture in digital format." The Government Accountability Office even criticized the Library of Congress for its information technology practices:  “Library of Congress: Strong Leadership Needed to Address Serious Information Technology Management Weaknesses.”

"The deeper problem behind the problem of digital preservation is that we undervalue our libraries and archives." We under-invest in them in them in an important time as we move from an analog society to a digital one. "If we fail to support libraries in developing new systems, those who follow us will have ample reason to be angry at our lack of foresight."

"If we don’t address our underinvestment in libraries and archives, we will have too much information we don’t need and too little of the knowledge we do."

Crash-tolerant data storage

Crash-tolerant data storage.  Larry Hardesty. MIT News. August 24, 2015.
      MIT researchers will present in an October symposium the first file system that is mathematically guaranteed not to lose track of data during crashes.The file system is slow by today’s standards, but the techniques used to verify its performance could be used for more sophisticated designs and more  reliable, efficient file systems. This formally verified working file system could lead to computers guaranteed never to lose your data. "Formal verification involves mathematically describing the acceptable bounds of operation for a computer program and then proving that the program will never exceed them." 

Wednesday, August 26, 2015

Holographic Data Storage

Carrier wavefront demodulation for coherent-readout holographic memories. Mark R. Ayres. SPIE Newsroom. 24 August 2015.
     As the the big data revolution continues to grow, existing data storage technologies are struggling to keep up. One technology is holographic data storage (HDS). This may help "alleviate the data glut in areas including cold and cool storage, searchable archives, and digital preservation."

With the holographic storage, the data is stored in holograms that contain images of a microdisplay in which the state of each pixel corresponds to one or more data bits. Overlapping holograms can result in extremely high storage densities. This is a 3D data storage technology that could theoretical store petabytes in a single recording medium.

The data is traditionally encoded using a bright or dark pixel state to represent a one or a zero data bit. This however has low sensitivity and a poor signal-to-noise ratio, though techniques are being worked on to reduce the problems. Shifting states with phase modulation and enhancing the display and detector can increase the data storage density. They are working to improve the density and robustness of the system and seeking enterprising collaborators for a commercial effort.

Digital Preservation with Film

Your ultimate digital insurance. Piql Website. 2015.
     The Piql Preservation Services, from a company based in Norway, are a turnkey solution that  includes all  the equipment and processes needed for writing, storing and retrieving files. The  piqlWriter is used to record the digital files and related metadata onto 35mm photosensitive film. The process uses error correction; checksums are applied to verify the integrity of the data. Forward Error Correction is used for controlling errors, making it possible to fully retrieve even damaged or corrupted data. This is made available from a €20 million Pan-European research project.

Both digital and visual storage of data is possible. The film is protected in a labelled piqlBox, which is tested together with film to ensure your data lasts for at least 500 years under ISO storage conditions. This can be integrated into an existing digital preservation system or digital asset management system for search and access. When a file is requested, the correct piqlBox is fetched from storage and placed on piqlReader. The film is read and decoded and then the information is made available.

Magnetic storage medium are short-lived and best used for back-ups. Security and privacy issues make cloud storage for archival content. Archival principles estimate that of the world's digital data, approximately 5% of it  requires long-term preservation. Non-permanent storage requires the content to be migrated frequently. However, the high cost of migration is expected to exceed the pace of the digital data growth.

ISO Storage conditions
The intended lifetime of the preservation medium, the packaging and their materials is 500 years, when stored under ISO 18911 storage conditions.  ISO 18911:2010(en)  states that "the important elements affecting preservation of processed film are humidity, temperature and air pollutants, as well as the hazards of fire, water, light, fungal growth, insects, microbiological attack, contact with certain chemicals in solid, liquid or gaseous form, and physical damage. Direct contact with other generic types of film can be detrimental to either film."

Also "extended-term storage conditions can extend the useful life of a majority of freshly processed films to 500 years. However, extended-term storage conditions will prolong the life of all films, independent of age, type or processing conditions. The storage protection provided by each level will differ in degree, as will the cost of providing and maintaining the storage facility.

 "It is recognized that many facilities will not be able to obtain the low humidity and low temperature levels specified in this International Standard because of energy considerations, climate conditions or building construction. Such deviation from the specified conditions will reduce the degree of protection offered, and in such cases maintaining a humidity and temperature as low as possible will still provide some benefits. This International Standard is not designed to provide protection against natural or man-made catastrophes, with the exception of fire and associated hazards, which are sufficiently common to warrant inclusion of protection measures."

Tuesday, August 25, 2015

Digital Curation is vital for cultural preservation

The Digital Divan: Dr Sean Pue explores ways to preserve Urdu poetry. Ali Raj. The Express Tribune. February 13, 2015.
     "Digital curation, a process by which digital assets are preserved and archived, is vital for cultural preservation...." 

Hero or Villain? A Tool to Create a Digital Preservation Rogues Gallery

Hero or Villain? A Tool to Create a Digital Preservation Rogues Gallery. Ross Spencer. Open Preservation Foundation blog. 25 Aug 2015.
     The tool, droid-sqlite-analysis, will create a 'rogues gallery' out of any digital collection for which you have a DROID report. This identifies files that pose a digital preservation risk. It can also be used to:
  • enable users to work on copies of content that requires immediate attention
  • clone the directory structures (context) containing rogue content  
  • provide ingest and delivery of a 'clean' collection independent of a rogues collection to promote immediate access while file format issues are worked on in an isolated treatment environment
  • create working copies of only those files of immediate interest
  • reduce collection complexities and issues to show patterns in collection

Current Issues and Approaches to Curating Student Research Data

Current Issues and Approaches to Curating Student Research Data. Andrew Creamer. Bulletin of the Association for Information Science and Technology. August 2015. [PDF May require subscription.]
     University libraries have collaborated with their colleges to archive and/or publish the students’ electronic theses and dissertations online. But there is little consistency in how they archive, curate and publish of the students’ research data and digital scholarship that underlies the ETDs. Most institutions have no policy on the caring for the ETD related data. Others have said that “Dissertation datasets represent ‘low-hanging fruit’ for universities who are developing institutional data collections” yet few have addressed the issues. At a recent conference, it was stated that curation of students’ ETD data can be seen as a scale model of the scholarly communication lifecycle and that these are valuable collections that universities should pursue, archive and make available.

One presenter described three organization-level digital curation challenges that libraries need to address:
  1. people not knowing how to do the work,
  2. not enough time or incentive for people to learn and
  3. insufficient resources.
Academic library administrators have an unrealistic expectation that all digital responsibilities and expertise can be put into just one employee.  Aaron Collie at Michigan is working on a three-year strategic plan to create a "collaborative approach to digital curation that will put the policies, people and technologies in place to build organizational capacity." “I think digital preservation is a strategic direction that is too often operationalized as an individual responsibility and skill set.” 

There are important questions to be asked about how to best curate and describe student ETD data.
  • Should there be more oversight over the documentation quality and quantity students provide with their datasets?
  • Should these digital objects receive their own record and metadata?
  • What are the best ways to show the relationships between these objects and the ETD?
  • Can we make the same archival/preservation commitments to supplementary data files that we do for the pdf file of the ETD?
A study of 93 ETDs with related data in OSU’s repository:
  • 45% were Excel files (30% of which had macros, charts and/or linked to other data),
  • 22% were image files and
  • 25% were document files.
  • Of the remaining included text, database and/or statistical software files, of which
  • 23% were code (and 15% of these executable files),
  • 12% of the files were metadata.
  • 30% were unknown, un-operable and/or obsolete; and
  • 3% of the ETDs were missing data files from what was listed among their manifests.
The consensus was that student data collections are worth pursuing and have much value for the public and research enterprise. Libraries interested in this data need to be realize that the content may be more important than previously thought.

What is actually happening out there in terms of institutional data repositories?

What is actually happening out there in terms of institutional data repositories?  Ricky Erway. OCLC Research. July 27, 2015.
     Academic libraries are talking about providing data curation services for their researchers.  In most cases they offer just training and advice, but not actual data management services. While technical, preservation, and service issues can be challenging, the funding issues are probably the thing that inhibits this service most. This is an important service that supports the university research mission.

The survey shows of the 22 institutions that answered the survey:
  • stand-alone data repository: 8
  • combination institutional repository and data repository: 12
  • DSpace: 6
  • Hydra/Fedora systems: 6
  • locally developed systems: 4
  • Rosetta, Dataverse, SobekCM, and HUBzero: 1 each
For preservation services:
  • all provide integrity checks except 1
  • keep offsite backup copies: 17
  • provide format migration: 12
  • put master files in a dark archive: 10
For funding:
  • the library’s base budget covered at least some of the expenses: 18
  • the library budget the only source of funding: 7
  • receive fees from researchers: 7
  • receive fees from departments: 4
  • receive institutional funding specifically for data management: 5
  • receive money from the IT budget: 4
  • receive direct funds from grant-funded projects: 1
  • receive indirect funds from grant-funded projects: 1

Monday, August 24, 2015

University Data Policies and Library Data Services: Who Owns Your Data?

University Data Policies and Library Data Services: Who Owns Your Data? Lisa D. Zilinski, Abigail Gobens and Kristin Briney. Bulletin of the Association for Information Science and Technology. August 2015. [PDF]
     'Who owns the data' is an important question, but the answer is often unclear, especially for unfunded research and pilot projects. Other questions that need to be asked are:
  • What happens if a researcher leaves the institution?
  • What if someone needs access to the data?
  • How long do I have to keep them and how should I discard them?
  • What happens if there is no policy? How should policies be determined?
  • If the data is part of a collaborating project then which policy takes precedence?
From the study, the author report that approximately
  • 50% of the libraries surveyed offer some form of data services beyond a resource guide. 
  • 40% of the libraries have a staff member (often the science librarian) assigned to research data management initiatives. 
  • 10% have a dedicated data repository.
This study points out the challenges that researchers, librarians and institutions face when trying to meet funding or journal requirements on public access. This study also found that top research institutions almost universally offer research data services. Libraries are developing programs and services aimed at the entire data life cycle while ownership of the data and other legal concerns are of highest significance to the universities. This provides an opportunity for librarians to lead policy development; educate faculty and administrators about best practices; and determine how to navigate the numerous policies from funding groups, academic journals, and collaborating institutions.

Turning a page: downsizing the campus book collections

Turning a page: downsizing the campus book collections. Donald Barclay. The Conversation. August 19, 2015. 
     An article looking at the changing academic libraries and especially the printed book collections. "While I believe there will always be a place for the book in the hearts of academics, it is far less likely there will be a place for the book, or at least for every book, on the academic campus." Keeping a printed book in a library is not cheap. A recent study shows that it costs $4.26 per year to keep a book on the shelf in an open stack collection. The cost of keeping a book in high-density shelving $0.86 per book. This does not mean that the books are disappearing, but that alternative storage solutions are being explored. Books are more in a supporting role rather than the main role.

Data Management Outreach to Junior Faculty Members: A Case Study

Data Management Outreach to Junior Faculty Members: A Case Study. Megan Sapp Nelson. Journal of eScience Librarianship. August 21, 2015.
     Data management is generally not addressed with new career faculty and it is either over looked or assumed that these faculty will figure it out on their own. A brownbag and workshop outreach program was developed and presented junior faculty early in their career to introduce them to potential issues and solutions of data management. This gave them an opportunity to brainstorm with more experienced faculty members. Objectives for the workshop included that the Faculty will:
  • Evaluate the current state of their data management practices.
  • Develop a prioritized list of actions that can be put into place.
  • Understand how those actions can be transmitted to a research group.
The case study and additional files are available at this link. Graduate students have a different perspective from the faculty.  Graduate students should:
  • Focus on mechanics over deeper understanding of concepts.
  • Learn data management from faculty within the context of an immediate
  • problem and therefore don’t necessarily get broad training in the full lifecycle of data management.
  • Figure out data management on their own, and figure it out differently from everyone else in the lab unless a protocol is put in place.
  • Have a wide spectrum of expertise.
  • Frequently suggest and adopt the data analysis tools used in labs which leads to fragmented data management for the professor over time.
The key to the success of the workshop was to involve the junior faculty peer group along with their mentor faculty members. This new tool was useful in addition to the The Data Curation Profile tool.

Staffing for Effective Digital Preservation

Staffing for Effective Digital Preservation. An NDSA Report. Winston Atkins, et al. National Digital Stewardship Alliance. December 2013.
     In 2012 most of the 85 organizations surveyed had no dedicated digital preservation department so preservation tasks fell to a library, archive, or other department. About half of respondents thought that the digital preservation function in their organizations was well organized.  Most organizations expected the size of their holdings to increase substantially in the next year with 20% expecting a doubling of content, and the majority were preserving under 50 TB. Images and text files were the most common types of content being preserved.

Almost 70% of organizations wanted to outsource digitization, and 43% wanted to outsource secure storage management. The library, archive or other department that stewarded the collections was responsible for digital preservation 73% of the time and 42% responded that it was an IT department,

Organizations would like to have twice as many FTEs as they currently had working on digital preservation activities. Their ideal number of FTEs for several roles would be:
  • Digital preservation manager: They had an average of .5, the ideal was 1
  • Electronic records archivist: They had an average of 1, the ideal was 2
  • Programer: They had an average of 1.5, the deal was 2.5
  • Content analyst / maintainer: They had an average of .5, the ideal was 3
According to the survey, 75% organizations retrained existing staff, 35% hired experienced digital preservation specialists, and 21% other options. Given the chance to hire a new digital preservation manager, organizations were asked to rank the relative importance of skills, knowledge and education. A passion and motivation for digital preservation and knowledge of digital preservation standards, best practices and tools were considered the most sought after skills, followed by general communication and analytical skills. Respondents were less concerned with the specific degrees or certificates people held, with the least important being a degree in computer science.

The survey also emphasized the importance of buy‐in from the entire organization and budget sustainability. “There is a general lack of understanding of the budgetary demands on digital collections and preservation that equal, and usually exceed, traditional collection development. The “lights‐on” cost are rarely, if ever, discussed or budgeted, thus digital libraries and preservation programs typically are funded with leftovers.”

The results of the Digital Preservation Staffing survey indicate that organizations are making do with what they have and generally think that their digital preservation programs and staffing are working well, but they feel a distinct need for more people to help do the work.