Friday, August 31, 2012

Digital documents debut on historically endangered list.

Digital documents debut on historically endangered list for the first time, Maine's most endangered historical assets now include digital records, according to an annual survey by the nonprofit Maine Preservation. This was done to emphasize the importance of the preservation of these documents to others. Electronic documents are in particular danger because of the ease with which they can be lost or destroyed. "We've not figured out how to manage this material. There's such a proliferation that nobody thinks about this material."

Monday, August 13, 2012

The Problem of Data

The Problem of Data. Lori Jahnke, Andrew Asherpub, Spencer D. C. Keralis. CLIR Report. Council on Library and Information Resources. August 12, 2012.
Excellent report on data storage, use, and curation.  A section contains a snapshot of the current digital data curation education landscape.  Below are some long notes and excerpts from the PDF article:

Key Findings
  • None of the researchers interviewed for this study have received formal training in data management practices, nor do they expresssatisfaction with their level of expertise.
  • Few researchers, especially among those who are early in their career, think about long-term preservation of their data.
  • The demands of publication output overwhelm long-term considerations of data curation. Metadata and documentation are of interest only if they help a researcher complete his or her work.
  • There is a great need for more effective collaboration tools, as well as online spaces that support the volume of data generated and provide appropriate privacy and access controls.
  • Few researchers are aware of the data services that the library might be able to provide and seem to regard the library as a dispensary of goods (e.g., books, articles) rather than a place for research/professional support.
  • There is unlikely to be a single out-of-the-box solution that can be applied to the problem of data curation. Instead, an approach is needed that emphasizes working with researchers to identify or build appropriate tools.
  • Researchers must have access to adequate networked storage.
  • Universities should revise access policies to support multi - institutional research projects.
  • Programs should begin early in the researcher career path for the greatest long-term benefit.
  • Data curation systems should be integrated with the active research phase (i.e., as a backup, etc).
  • Privacy and data access control tools should be developed to manage confidential data. Policies must be developed that support researchers in using these technologies.
Other notes:
  • Data curation, a term generally defined as a set of activities that includes the preserving, maintaining, archiving, and depositing of data to keep it secure, intact, and accessible for reuse.
  • Many researchers expressed concerns surrounding the ethical reuse of research data. Additional work is needed to establish best practices in this area, particularly for qualitative data sets.
  • Most participants reported feeling adrift when establishing protocols for managing their data and added that they lacked the resources to determine best practices, let alone to implement them. Almost none of the scholars reported that data curation training was part of their graduate curriculum.
  • Perhaps one of the more complicated issues for data curation is the complex life cycle of research data and projects. Data collection may occur throughout the project and change from before it is completed.
  • Scholars may collect data on a phenomenon unrelated to their current project with no clear idea of the potential usefulness of those data. Such data might be integrated with a later project, given away to an interested colleague, or never used at all.
  • It would be helpful to have a way to collect data into a collection space that could be used throughout the project.
  • The researchers held contradictory views about the value of their data. Some wanted to associate their data with publications or to have it available for use in the classroom
  • Few of the researchers thought about long-term preservation of their data, especially those who were early in their career.
  • The academic system offers little or no career reward for preserving one’s data.
  • Data preservation strategies must take into account varied, proprietary, and non-standard data formats, and provide a real-time benefit for the scholar in meeting research goals.
  • Given the lack of infrastructure for sharing and storing data, the social sciences may face similar problems of data loss in documenting social phenomena as researchers begin to work within larger collaborative groups and with larger data sets. Data stored on personal media devices are especially vulnerable to this type of loss, as few scholars have the skills necessary to maintain data over time and across hardware and software platforms. Several of the scholars interviewed reported storing data on legacy systems that may become inaccessible
  • University policies that appropriately address the ethical considerations relating to data sharing and preservation would benefit researchers, administrators, and technologists alike.
  • Researchers hold tremendous amounts of data on personal computers and hard drives, many of which are not backed up adequately. Among the participants, the research data ranged from under 1 GB to multiple terabytes. Data types included various formats of images, video, audio files, data sets, documents, etc.
  • Managing large files presents significant challenges for researchers in that university infrastructures typically do not provide adequate storage space or sufficient bandwidth for data access.  The data may be lost when researchers upgrade their computers or software. Few researchers put more than minimal effort into organizing non-active data or ensuring its continued compatibility with new software or hardware.
  • There is a clear need for libraries to move beyond passively providing technology to embrace the changes in scholarly production that emerging technologies have brought.  
  • The data preservation step must be fully integrated into a scholar’s research workflow. Not only are necessary metadata and other materials much more easily captured while research is in progress, but also there is a real opportunity to streamline research workflows and to provide much needed support. Scholars need help with the technical aspects of managing and preserving data, as well as with basic curation issues (e.g., what to keep and what to delete), and the ethical implications of sharing their data (e.g., what is an appropriate latency period for the data and how does one balance the need to provide meaningful access with the risk of inadvertently exposing confidential participant information).
  • Although some researchers acknowledge that their data could be useful to other researchers, there is little incentive to invest time in archiving or repackaging data sets.
  • Extensive outreach to scholars is necessary to build the relationships that will facilitate data preservation. This is likely to be a slow process initially. Researchers are unlikely to engage with those they do not view as peers.
  • Researchers need additional tools to manage preserved data on their own, and they would benefit from access to professionals who can offer advice on management strategies.
  • Researchers typically align themselves with their disciplines rather than with their institutions; therefore, support models that extend beyond the university are likely to be especially beneficial.
  • Reaching the level of collaboration among universities and the technical interoperability required to capture and preserve a career’s worth of data in the current environment is a challenge.
  • Current data management systems must be fundamentally improved so that they can meet the capacity demand for secure storage and transmission of research data. Integrating the data preservation system with the active research cycle is essential to encourage researcher investment.
  • Researchers are not well positioned to meet the technical and policy challenges without the coordinated support of libraries, information technology units, and professionals who possess both technical and research expertise.
  •  One example concerning the PETRA e+e collider project in Hamburg, Germany; In the more than 25 years since, theoretical insights and computing advancements have made the data valuable once again. However, much of the data have been irrevocably lost to corrupt storage media, lost computer code, and deactivated personal accounts. These early particle physics experiments are unique, as modern colliders operate at higher energy levels and cannot replicate the particle interactions.

Thursday, August 02, 2012

OAIS 2012 update.

OAIS 2012 update. Barbara Sierman. juli 30, 2012.
Reference Model for an Open Archival Information System (OAIS). Magenta Book. Issue 2. June 2012. 

The main changes are the following:
·         Access Rights information is added as an element to the Preservation Description Information.
·         emulation as a strategy to preserve access services or the original look and feel and explains how different varieties of emulation will fit in the OAIS model.
·         There is more interaction between the Administration Functional Entity and the Preservation Planning Functional Entity. The Preservation Planning Functional entity will create preservation plans (and not only “migration plans” as mentioned in the previous version), based on its monitoring activity,  and will send these to the Administration functional entity to be performed. But the Administration Functional Entity (cq.  the function The Establish Standards and Policies) will also receive periodic risk analyses created by Preservation Planning to act upon, which gives the Preservation Planning Functional Entity a more active role in monitoring not only the outside world but also the OAIS itself.  Interaction also takes place when Preservation Planning sends recommendations on AIP updates, and The Administration Functional Entity replies with preservation requirements (added to the already exisiting “migration goals and approved standards”). So for creating new migration packages not only the preservation requirements resulting from monitoring the Designated Community are input, but also the preservation requirements from Administration. In general some loose ends seems to be united here.
·         The confusing use of the word “authentication” has now changed and the term “authenticity” is defined (adding the much used definition of “The degree to which a person (or system) regards an object as what it is purported to be.” With an important addition: “ Authenticity is judged on the basis of evidence.”
·         The term Information Package is redefined to : “ A logical container composed of optional Content Information and optional associated Preservation Description Information. Associated with this Information Package is Packaging Information used to delimit and identify the Content Information and Package Description information used to facilitate searches for the Content Information.”
·         A new definition is introduced “Other Representation Information” and described as “Representation Information which cannot easily be classified as Semantic or Structural. For example software, algorithms, encryption, written instructions and many other things may be needed to understand the Content Data Object, all of which therefore would be, by definition, Representation Information, yet would not obviously be either Structure or Semantics. Information defining how the Structure and the Semantic Information relate to each other, or software needed to process a database file would also be regarded as Other Representation Information.”
·         This chapter is adapted with the above mentioned changes and refines the definitions for an AIP version (An AIP whose Content Information or Preservation Description Information has undergone a Transformation on a source AIP and is a candidate to replace the source AIP. An AIP version is considered to be the result of a Digital Migration.) versus an AIP edition (An AIP whose Content Information or Preservation Description Information has been upgraded or improved with the intent not to preserve information, but to increase or improve it. An AIP edition is not considered to be the result of a Migration.)