Showing posts with label xml. Show all posts
Showing posts with label xml. Show all posts

Friday, March 27, 2015

National Institutes of Health: Plan for Increasing Access to Scientific Publications and Digital Scientific Data from NIH Funded Scientific Research

National Institutes of Health: Plan for Increasing Access to Scientific Publications and Digital Scientific Data from NIH Funded Scientific Research. February 2015.

This document describes NIH’s plans to build upon and enhance its longstanding efforts to increase access to scholarly publications and digital data resulting from NIH-funded research. Sections relevant to digital preservation and long term management:

NIH intends to make public access to digital scientific data the standard for all NIH funded research. Following adoption of the final plan, NIH will:
  • Explore steps to require data sharing.
  • Ensure that all NIH-funded researchers prepare data management plans and that the plans are evaluated during peer review.
  • Develop additional data management policies to increase public access to designated types of biomedical research data.
  • Encourage the use of established public repositories and community-based standards.
  • Develop approaches to ensure the discoverability of data sets resulting from NIH-funded research to make them findable, accessible, and citable.
  • Promote interoperability and openness of digital scientific data generated or managed by NIH.
  • Explore the development of a data commons. NIH will explore the development of a commons, a shared space for basic and clinical research output including data, software, and narrative, that follows the FAIR principles of Find, Access, Interoperate and Reuse.

Preservation
Preservation is one of the Public Access Policy’s primary objectives. It wants to ensure that publications and metadata are stored in an archival solution that:
  • provides for long-term preservation and access to the content without charge; 
  • uses standards, widely available and, to the extent possible, nonproprietary archival formats for text and associated content (e.g., images, video, supporting data); 
  • provides access for persons with disabilities
The content in the NIH database is actively curated  using XML records which is future proof, in that XML is technology independent and can be easily and reliably migrated as technology evolves. 

The first principle behind the plan for increasing access to digital scientific data is: The sharing and preservation of data advances science by broadening the value of research data across disciplines and to society at large, protecting the integrity of science by facilitating the validation of results, and increasing the return on investment of scientific research.

Data Management Plans
Data management planning should be an integral part of research planning.  NIH wants to ensure that all extramural researchers receiving Federal grants and contracts for scientific research and intramural researchers develop data management plans describing how they will provide for long-term preservation of, and access to, scientific data in digital formats resulting from federally funded research, or explaining why long-term preservation and access cannot be justified. In order to preserve the balance between the relative benefits of long-term preservation and access and the associated cost and administrative burden, NIH will continue to expect researchers to consider the benefits of long-term preservation of data against the costs of maintaining and sharing the data.

NIH will assess whether the appropriate balance has been achieved in data management plans between the relative benefits of long-term preservation and access and the associated cost and administrative burden. It will also develop guidance with the scientific community to decide which data should be prioritized for long-term preservation and access. NIH will also explore and fund innovative tools and services that improve search, archiving, and disseminating of data, while ensuring long-term stewardship and usability.

Assessing Long-Term Preservation Needs
NIH will provide for the preservation of scientific data and outline options for developing and sustaining repositories for scientific data in digital formats.  The policies expect long-term preservation of data.
Long-term preservation and sustainability will be included in data management plans and will collaborate with other agencies on how best to develop and sustain repositories for digital scientific data.



Saturday, February 21, 2015

OAI-PMH harvesting from SharePoint

SharePoint 2010 to Primo.  Cillian Joy. Tech Blog. July 2014.
They have a system to manage the submission, storage, approval, and discovery of taught thesis documents., which uses SharePoint 2010 as a the document repository and Exlibris Primo as the discovery tool. The solution uses PHP, XML, XSLT, CURL, and SharePoint REST API using oData.
Uses standards ATOM and OAI-PMH.

SharePoint 2013 .NET Server, CSOM, JSOM, and REST API index




Saturday, February 07, 2015

Digital Preservation Coalition publishes ‘OAIS Introductory Guide (2nd Edition)’ Technology Watch Report

Digital Preservation Coalition publishes ‘OAIS Introductory Guide (2nd Edition)’ Technology Watch Report. Brian Lavoie.  Digital Preservation Coalition. Watch Report. October, 2014. [PDF]

The report describes the OAIS, its core principles and functional elements, as well as the information model which support long-term preservation, access and understandability of data. The OAIS reference model was approved in 2002 and revised and updated in 2012. Perhaps “the most important achievement of the OAIS is that it has become almost universally accepted as the lingua franca of digital preservation”.

The central concept in the reference model is that of an open archival information system. An OAIS-type archive must meet a set of six minimum responsibilities to do with the ingest, preservation, and dissemination of archived materials: Ingest, Archival Storage, Data Management, Preservation Planning, Access, and Administration. There are also Common Services, which consist of basic computing and networking resources.

An OAIS-type archive references three types of entities: Management, Producer, and Consumer, which includes the Designated Community: consumers expected to independently understand the archived information in the form in which it is preserved and made available by the OAIS. This is a  framework to encourage dialogue and collaboration among participants in standards-building activities, as well as identifying areas most likely to benefit from standards development.

An OAIS-type archive is expected to:
  • Negotiate for and accept appropriate information from information producers;
  • Obtain sufficient control of the information in order to meet long-term preservation objectives;
  • Determine the scope of the archive’s user community;
  • Ensure the preserved information is independently understandable to the user community
  • Follow documented policies and procedures to ensure the information is preserved against all reasonable contingencies
  • Make the preserved information available to the user community, and enable dissemination of authenticated
An OAIS should be committed to making the contents of its archival store available to its intended user community, through access mechanisms and services which support users’ needs and requirements. Such requirements may include preferred medium, access channels, and any access restrictions should be clearly documented.

 The OAIS information model is built around the concept of an information package, which includes: the Submission Information Package, the Archival Information Package, and the Dissemination Information Package. Preservation requires metadata to support and document the OAIS’s preservation processes, called Preservation Description Information, which ‘is specifically focused on describing the past and present states of the Content Information, ensuring that it is uniquely identifiable, and ensuring it has not been unknowingly altered’. The information consists of:
  • Reference Information (identifiers)
  • Context Information (describes relationships among information and objects)
  • Provenance Information (history of the content over time)
  • Fixity Information (verifying authenticity)
  • Access Rights Information (conditions or restrictions)
OAIS is a model and not an implementation. It does not address system architectures, storage or processing technologies, database design, computing platforms, or other technical details of setting up a functioning archival system. But it has been used as a foundation or starting point. Efforts, such as TRAC, have been made to put the attributes of a trusted digital archive into a ‘checklist’ that could be used to support a certification process. PREMIS is a preservation metadata initiative that has emerged as the de facto standard. METS, and XML based  document form, has become widely used for encoding OAIS archival information packages.

The ‘OAIS reference model provides a solid theoretical basis for digital preservation efforts, though theory and practice can sometimes have an uneasy fit.’




Tuesday, February 03, 2015

Office Opens up with OOXML

Office Opens up with OOXML. Carl Fleischhauer, Kate Murray. The Signal. February 3, 2015.
Nine new format descriptions have been added to the Library’s Format Sustainability Web site. These closely related formats relate to the Office Open XML (OOXML) family, which are the formats of the Microsoft family of “Office” desktop applications, including Word, PowerPoint and Excel. Formerly, these applications produced files in proprietary, binary formats with the extensions doc, ppt, and xls. The current versions employ an XML structure for the data and an x has been added to the extensions: docx, pptx, and xlsx.

"In addition to giving the formats an XML expression, Microsoft also decided to move the formats out of proprietary status and into a standardized form (now focus on the word Open in the name.) Three international organizations cooperated to standardize OOXML."

The list of the nine:
  • OOXML_Family, OOXML Format Family, ISO/IEC 29500 and ECMA 376
  • OPC/OOXML_2012, Open Packaging Conventions (Office Open XML), ISO 29500-2:2008-2012
  • DOCX/OOXML_2012, DOCX Transitional (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 1-4
  • DOCX/OOXML_Strict_2012, DOCX Strict (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 2-4
  • PPTX/OOXML_2012, PPTX Transitional (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 1-4
  • PPTX/OOXML_Strict_2012, PPTX Strict (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 2-4
  • XLSX/OOXML_2012, XLSX Transitional (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 1-4
  • XLSX/OOXML_Strict_2012, XLSX Strict (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 2-4
  • MCE/OOXML_2012, Markup Compatibility and Extensibility (Office Open XML), ISO 29500-3:2008-2012, ECMA-376, Editions 1-4
 "Meanwhile, readers should remember that the Format Sustainability Web site is not limited to formats that we consider desirable. We list as many formats (and subformats) as we can, as objectively as we can, so that others can choose the ones they prefer for a particular body of content and for particular use cases."