Digital Preservation Matters: data management

Showing posts with label data management. Show all posts

Showing posts with label data management. Show all posts

Wednesday, November 16, 2016

A Doomsday Scenario: Exporting CONTENTdm Records to XTF

A Doomsday Scenario: Exporting CONTENTdm Records to XTF. Andrew Bullen. D-Lib Magazine. November/December 2016.
Because of budgetary concerns, the Illinois State Library asked Andrew Bullen to explore how their CONTENTdm collections could be migrated to another platform. (The Illinois Digital Archives repository is based on CONTENTdm). He chose methods that would allow him to quickly migrate the collections using existing tools, particularly PHP, Perl, and XTF which they use as the platform for a digital collection of electronic Illinois state documents. The article shows the perl code written, metadata, record examples, and walks through the process. He started A Cookbook of Methods for Using CONTENTdm APIs. Each collection presented different challenges and required custom programming. He recommends reviewing the metadata elements of each collection and normalizing like elements as much as possible, and plan what elements can be indexed and how faceted browsing could be implemented. The test was to see if the data could be reasonably converted so not all parts were implemented. In a real migration, CONTENTdm's APIs could be used as a data transfer medium.

Digital Preservation Matters.

Saturday, October 15, 2016

DPTP: Introduction to Digital Preservation Planning for Research Managers

DPTP: Introduction to Digital Preservation Planning for Research Managers. Ed Pinsent, Steph Taylor. ULCC. 15 October 2016.
Today I saw this course offered and thought it looked interesting (wish I were in London to attend). It is a one-day introduction to digital preservation and is designed specifically to look at preservation planning from the perspective of the research data manager. Digital preservation, the management and safeguarding of digital content for the long-term, is becoming more important for research data managers to make sure content remains accessible and authentic over time. The learning outcomes are:

Understand what digital preservation means and how it can help research managers
How to assess content for preservation
How to integrate preservation planning into a research data management plan
How to plan for preservation interventions
How to identify reasons and motivations for preservation for individual projects
What storage means, and the storage options that are available
How to select appropriate approaches and methods to support the needs of projects
How to prepare a business case for digital preservation

The course contains eight modules, which are:

Find out about digital preservation and how and why it is important in RDM.
Assessing research data and understanding how to preserve them for the longer term, and understanding your users.
Learn how a RDM plan can include preservation actions.
Managing data beyond the life of projects, planning the management of storage and drafting a selection policy.
Understanding individual institutions, stakeholders and requirements and risk assessment.
Understand why preservation storage has extra requirements, considering‘the Cloud’
The strategy of migrating formats, including databases; risks and benefits, and tools you can use.
Making a business case (Benefits; Risks; Costs) to persuade your institution why digital preservation is important

Digital Preservation Matters.

Thursday, March 24, 2016

The FAIR Guiding Principles for scientific data management and stewardship

The FAIR Guiding Principles for scientific data management and stewardship. Mark D. Wilkinson, et al. Nature. 15 March 2016. [PDF]
"There is an urgent need to improve the infrastructure supporting the reuse of scholarly data." Good data management is not a goal in itself, but a conduit leading to knowledge discovery, innovation and the reuse of the data. The current digital ecosystem prevents this, which is why the funding and publishing community is beginning to require data management and stewardship plans. "Beyond proper collection, annotation, and archival, data stewardship includes the notion of ‘long-term care’ of valuable digital assets" so they can be discovered and re-used for new investigations.

This article describes four foundational principles (FAIR) to guide data producers and publishers:

Findability,

assigned a globally unique and persistent identifier
data are described with rich metadata
metadata clearly include the identifier of the data it describes
data are registered or indexed in a searchable resource

Accessibility,

data are retrievable by their identifier using a standardized communications protocol
the protocol is open, free, and universally implementable
the protocol allows for an authentication and authorization procedure,
metadata are accessible, even when the data are no longer available

Interoperability,

data use a formal, accessible, shared, and broadly applicable language for knowledge representation
data use vocabularies that follow FAIR principles
data include qualified references to other (meta)data

Reusability

meta(data) are richly described
(meta)data have a clear data usage license
(meta)data have a detailed provenance
(meta)data meet community standards

These FAIR principles guide data publishers and stewards in evaluating their implementation choices. They are a prerequisite for proper data management and data stewardship. Achieving these goals requires working together with shared goals and principles.

Digital Preservation Matters.

Friday, December 04, 2015

March 2015 PASIG Meeting Presentations and Recent Webinars

March 2015 PASIG Meeting Presentations and Recent Webinars . Preservation and Archiving Special Interest Group (PASIG). March 11-13, 2015.
Recent presentations and webinars from PASIG and ASIS&T are available on the PASIG site. These include:

March 2015 PASIG Meeting Presentations
Tiered Adaptive Storage for Big Data and Supercomputing. Jason Goodman
Video Surveillance: Consuming I.T. Capacity At Significant Rates. Jay Jason Bartlett
Archive and Preservation for Collections Leveraging Standards Based Technologies and the Cloud. Brian Campanotti
What Would an Ideal Digital Preservation Technical Registry Look Like?. Steve Knight and Peter McKinney
Three Critical Elements of Long-Term Storage in the Cloud. Amir Kapadia
Policy Based Data Management. Reagan Moore
Digital Forensics and BitCurator. Christopher (Cal) Lee
The Essential Elements of Intelligently Managed Tiered Storage Infrastructures. Raymond Clarke
Implementing Sustainable Digital Preservation.
How to Access Your Digital Value at Risk: An Introduction to the Digital Value at Risk.
Building Communities and Services in Support of Data-Intensive Research. Stephen Abrams
Storage Technology Trends for Archiving. Tom Wultich and Bob Raymond
Stewarding Research Data with Fedora and Islandora. Mark Leggott
Challenges of Digital Media Preservation in an Active Archive. Karen Cariani, David W. MacCarn
An Introduction to the National Digital Information Infrastructure and Preservation Program (NDIIPP) and its Digital Preservation Initiatives. Leslie Johnston
Digital Preservation in Theory and Practice: A Preservation and Archiving Special Interest Group (PASIG) Boot Camp Webinar. Tom Cramer

Digital Preservation Matters.

Monday, November 23, 2015

Introduction to Metadata Power Tools for the Curious Beginner

Introduction to Metadata Power Tools for the Curious Beginner. Maureen Callahan, Regine Heberlein, Dallas Pillen. SAA Archives 2015. August 20, 2015. PowerPoint Google Doc
"At some point in his or her career, EVERY archivist will have to clean up messy data, a task which can be difficult and tedious without the right set of tools." A few notes from the excellent slides and document:

Basic Principles of Working with Power Tools

Create a Sandbox Environment: have backups. It is ok to break things
Think Algorithmically: Break a big problem down into smaller steps
Choosing a Tool: The best tools, works for your problem and skill set
Document: Successes, failures, procedures

Dare to Make Mistakes

as long as you know how to recognize and undo them!
view mistakes as an opportunity
mistakes can teach you as much about your data as about your tool
share your mistakes so others may benefit
realize that everybody makes them

General Principles

Know the applicable standards
Know your data
Know what you want
Normalize your data before you start a big project
The problem is intellectual, not technical
Use the tools available to you
Don’t do what a machine can do for you
Think about one-off operations vs. tools you might re-use or re-purpose
Think about learning tools in terms of raising the level of staff skill

Tools

XPath
Regex
XQuery
XQuery Update
XSLT
batch
Linux command line
Python
AutoIt

Digital Preservation Matters.

Thursday, October 01, 2015

Towards Sustainable Curation and Preservation: The SEAD Project’s Data Services Approach

Towards Sustainable Curation and Preservation: The SEAD Project’s Data Services Approach. James Myers, et al. IEEE International Conference on eScience. September 3, 2015. [PDF]
This is a preview of a paper that will be presented at the conference on the Sustainable Environment: Actionable Data (SEAD). It details efforts to develop data management and curation services and to make those services available for active research groups to use. The introduction raises an apparent paradox: researchers face data management challenges yet curation practices that could help are used only after research work is completed (if at all). Adding data and metadata incrementally as the data are produced, the metadata could be used to help organize data during research.

If the system that preserved the data also generated citable persistent identifiers and dynamically updated the project’s web site with those citations, then completing the publication process would be in the best interest of the researcher. The discussions have revolved around two general areas that have been termed Active and Social Curation:

Active Curation: focus primarily on the activities of data producers and curators working during research projects to produce published data collections.
Social Curation: explores how the actions of the user community can be leveraged to provide further value. This could involve the ability of research groups to

publish derived value-added data products,
notify researchers when revisions or derived products appear,
monitor the mix of file formats and metadata to help determine migration strategies

SEAD’s initial capabilities are provided by three primary interacting components:

Project Spaces: secure, self-managed storage and toolsto work with data resources
Virtual Archive: a service that manages publication of data collections from Project Spaces to long-term repositories
Researcher Network: personal and organizational profiles that can include literature and data publications.

SEAD has developed the ability to manage, curate, and publish to sustainability science projects data through hosted project spaces. This is a new option for projects that is more powerful than just using a shared file system and that is also more cost effective than a custom project solution.

Digital Preservation Matters.

Tuesday, September 29, 2015

Do You Have an Institutional Data Policy?

Do You Have an Institutional Data Policy? A Review of the Current Landscape of Library Data Services and Institutional Data Policies. Kristin Briney, Abigail Goben, Lisa Zilinski. Journal of Librarianship and Scholarly Communication. 22 Sep 2015. [PDF]
This study was to look at a correlation between policy existence and either library data services or the presence of a data librarian. Data services in libraries are becoming mainstream and librarians have an opportunity to work with researchers at their institutions and help them understand the policies in place or to work toward a policy. Some items of note from the article:

Fewer universities have a data librarian on staff (37%) than offer data services.
Many libraries (65%) have a research data repository, either in an IR or in a repository specifically for data.
Fewer universities (11%) have dedicated data repositories as compared with IRs that accept data (58%).
All universities with over $1 billion per year in research expenditures offer data services and a place to host data. Most (89%) of these institutions also have a data librarian. And (33%) have a data repository
Nearly half (44%) of all universities studied have some type of policy covering research data

Half of the policies designated an owner of university research data (67%)
Data is required to be retained for some period of time (52%)

Standalone data policies covered many topics:

defined data (61%),
identified a data owner (62%),
state a specific retention time (62%),
identified who can have access to the data (52%),
described disposition of the data when a researcher leaves the university (64%)
designate a data steward (46%)

Data services are becoming a standard at major research institutions. However, institutional data policies are often difficult to identify and may be confusing for researchers. The trend of libraries having a data policy, offering data services, and having a data librarian will become typical at major research institutions.

Digital Preservation Matters.

Friday, September 25, 2015

Data Management Practices Across an Institution: Survey and Report

Data Management Practices Across an Institution: Survey and Report. Cunera Buys, Pamela Shaw. Journal of Librarianship and Scholarly Communication. 22 Sep 2015.
Data management is becoming increasingly important to researchers in all fields. The results of a survey show that both short and long term storage and preservation solutions are needed. When asked, 31% of respondents did not know how much storage they will need, which makes establishing a correctly sized research data storage service difficult. This study presents results from a survey of digital data management practices across all disciplines at a university. In the survey, 65% of faculty said it was important to share data, but less than half of the them "reported that they 'always' or 'frequently' shared their data openly, despite their belief in the importance of sharing".

Researchers produce a wide variety of data types and sizes, but most create no metadata or do not use metadata standards and most researchers were uncertain about how to meet the NSF data management plan requirements (only 45% had a plan). A study in 2011 of data storage and management needs across several academic institutions and found many researchers were satisfied with short-term data storage and management practices, but not satisfied with long-term data storage options. Researchers in the same study did not believe their institutions provided adequate funds, resources, or instruction on good data management practices. When asked about where research data is stored:

Sixty-six percent use computer hard drives
47% use external hard drives
50% use departmental or school servers
38% store data on the instrument that generated the data
31% use cloud-based storage services

Dropbox was the most popular service at 63%

27% use flash drives
6% use external data repositories.

Most researchers expected to store raw and published data, “indefinitely”. Many respondents also selected 5-10 years, and very few said they keep data for less than one year. All schools all schools suggest that data are relevant for long periods of time or indefinitely. Specific retention preferences by school were:

The college of arts and sciences prefers “indefinitely” for ALL data types
Published data: All schools prefer “indefinitely” for published data except

The law school prefers 1-5 years for published data

Other data:

The school of medicine prefers 5-10 years for all other data types
The school of engineering prefers 1-5 years for all other data types
The college of arts and sciences “Indefinitely” for raw data
The school of management “Indefinitely” for raw data

Keeping raw data / source material was useful since researchers may use it for

future / new studies (77 responses),
utilize it for longitudinal studies (9 responses)
share it with colleagues (6 responses).
valuable for replicating study results (10 responses),
responding to challenges of published results,
data would be difficult or costly to replicate
simply stated that it is good scientific practice to retain data (4 responses).

When asked, 66% indicated they would need additional storage; most said 1-500 gigabytes or “don’t know.” Also, when asked what services would be useful in managing research data the top responses were:

long term data access and preservation (63%),
services for data storage and backup during active projects(60%),
information regarding data best practices (58%),
information about developing data management plans or other data policies (52%),
assistance with data sharing/management requirements of funding agencies (48%), and
tools for sharing research (48%).

Since most respondents said they planned to keep their data indefinitely, that means that institutional storage solutions would need to accommodate "many data types and uncertain storage capacity needs over long periods of time". The university studied lacks a long term storage solution for large data, but has short term storage available. Since many researchers store data on personal or laboratory computers, laboratory equipment, and USB drives, there is a greater risk of data loss. There appears to be a need to educate researchers on best practices for data storage and backup.

There appears to be a need to educate researchers on external data repositories that are available and on funding agencies’ requirements for data retention. The library decided to provide a clear set of funder data retention policies linked from the library’s data management web guide. Long-term storage of data is a problem for researchers because of the data and the lack of stable storage solutions and that limits data retention and sharing.

Digital Preservation Matters.

Wednesday, September 23, 2015

A Selection of Research Data Management Tools Throughout the Data Lifecycle

A Selection of Research Data Management Tools Throughout the Data Lifecycle. Jan Krause. Ecole polytechnique fédérale de Lausanne. September 9, 2015. [PDF]
This article looks at the data lifecycle management phases and the many tools that exist to help manage data throughout the process. These tools will help researchers make the most out of their data, save time in the long run, promote reproducible research, and minimize the risks with the data. The lifecycle management phases are: discovery, acquisition, analysis, collaboration, writing, publication and deposit in trusted data repositories. There are tools in each of the areas. A few of the many tools listed are:

How to Develop a Data Management and Sharing Plan
DMPonline and DMPTool to help write data management plans.
Recommended Data Formats
ownCloud synchronizes and shares data on several computers
re3data global registry of 1,200+ research data repositories in different disciplines

It is important to use appropriated data and metadata standards, especially data formats, which should happen at the beginning since these are difficult to after the project is started.

Digital Preservation Matters.

Tuesday, August 25, 2015

What is actually happening out there in terms of institutional data repositories?

What is actually happening out there in terms of institutional data repositories? Ricky Erway. OCLC Research. July 27, 2015.
Academic libraries are talking about providing data curation services for their researchers. In most cases they offer just training and advice, but not actual data management services. While technical, preservation, and service issues can be challenging, the funding issues are probably the thing that inhibits this service most. This is an important service that supports the university research mission.

The survey shows of the 22 institutions that answered the survey:

stand-alone data repository: 8
combination institutional repository and data repository: 12
DSpace: 6
Hydra/Fedora systems: 6
locally developed systems: 4
Rosetta, Dataverse, SobekCM, and HUBzero: 1 each

For preservation services:

all provide integrity checks except 1
keep offsite backup copies: 17
provide format migration: 12
put master files in a dark archive: 10

For funding:

the library’s base budget covered at least some of the expenses: 18
the library budget the only source of funding: 7
receive fees from researchers: 7
receive fees from departments: 4
receive institutional funding specifically for data management: 5
receive money from the IT budget: 4
receive direct funds from grant-funded projects: 1
receive indirect funds from grant-funded projects: 1

Digital Preservation Matters.

Monday, August 24, 2015

University Data Policies and Library Data Services: Who Owns Your Data?

University Data Policies and Library Data Services: Who Owns Your Data? Lisa D. Zilinski, Abigail Gobens and Kristin Briney. Bulletin of the Association for Information Science and Technology. August 2015. [PDF]
'Who owns the data' is an important question, but the answer is often unclear, especially for unfunded research and pilot projects. Other questions that need to be asked are:

What happens if a researcher leaves the institution?
What if someone needs access to the data?
How long do I have to keep them and how should I discard them?
What happens if there is no policy? How should policies be determined?
If the data is part of a collaborating project then which policy takes precedence?

From the study, the author report that approximately

50% of the libraries surveyed offer some form of data services beyond a resource guide.
40% of the libraries have a staff member (often the science librarian) assigned to research data management initiatives.
10% have a dedicated data repository.

This study points out the challenges that researchers, librarians and institutions face when trying to meet funding or journal requirements on public access. This study also found that top research institutions almost universally offer research data services. Libraries are developing programs and services aimed at the entire data life cycle while ownership of the data and other legal concerns are of highest significance to the universities. This provides an opportunity for librarians to lead policy development; educate faculty and administrators about best practices; and determine how to navigate the numerous policies from funding groups, academic journals, and collaborating institutions.

Digital Preservation Matters.

Data Management Outreach to Junior Faculty Members: A Case Study

Data Management Outreach to Junior Faculty Members: A Case Study. Megan Sapp Nelson. Journal of eScience Librarianship. August 21, 2015.
Data management is generally not addressed with new career faculty and it is either over looked or assumed that these faculty will figure it out on their own. A brownbag and workshop outreach program was developed and presented junior faculty early in their career to introduce them to potential issues and solutions of data management. This gave them an opportunity to brainstorm with more experienced faculty members. Objectives for the workshop included that the Faculty will:

Evaluate the current state of their data management practices.
Develop a prioritized list of actions that can be put into place.
Understand how those actions can be transmitted to a research group.

The case study and additional files are available at this link. Graduate students have a different perspective from the faculty. Graduate students should:

Focus on mechanics over deeper understanding of concepts.
Learn data management from faculty within the context of an immediate
problem and therefore don’t necessarily get broad training in the full lifecycle of data management.
Figure out data management on their own, and figure it out differently from everyone else in the lab unless a protocol is put in place.
Have a wide spectrum of expertise.
Frequently suggest and adopt the data analysis tools used in labs which leads to fragmented data management for the professor over time.

The key to the success of the workshop was to involve the junior faculty peer group along with their mentor faculty members. This new tool was useful in addition to the The Data Curation Profile tool.

Digital Preservation Matters.

Monday, August 10, 2015

DMPonline: recent updates and new functionality

DMPonline: recent updates and new functionality. Sarah Jones. DCC curation webinars. 27 May 2015. [Video and slides].
This webinar provides an overview of the new functionality in the UK DMPonline tool for data management plans and the development road-map through January 2016. This can be customized and internationalized. The user group aims to make it easier for the community to feed in ideas and direct the development. There is also a reference to the DMP guidance, tools and resources page which is very useful. It contains:

DMPonline: A web-based tool to assist users to create personalised plans
Funders’ data plan requirements: Summary of funders' expectations for data management and sharing plans
Checklist for a Data Management Plan: Useful questions and guidance for writing data management and sharing plans
DMP checklist leaflet: A fold-out summary of the Checklist
FAQ on Data Management Plans: A short list of key questions pertaining to Data Management Plans
How to Develop a Data Management and Sharing Plan: A guide on how to meet UK funder expectations for DMPs
Guidance and examples: Advice to help you write your data management and sharing plan

Related posts:

Digital Preservation Matters.

Saturday, August 08, 2015

Where Should You Keep Your Data?

Where Should You Keep Your Data? Karen M. Markin. The Chronicle of Higher Education. June 23, 2015.
Federal funding agencies have made it clear that grant proposals must include plans for sharing research data with other scientists. What has not been clear is how and where researchers should store their data, which can range from sensitive personal medical information to enormous troves of satellite imagery. Although data-sharing requirements have been in place for years, universities have been slow to assist principal investigators make that happen. Now if you don’t comply with the new policies, you might be prohibited from receiving additional grant money. Funding can be withheld from researchers who don’t comply. Principal investigators are urged to place their data in existing publicly accessible repositories and the NIH has a list of repositories. The NSF directs researchers to specific repositories.
The "DMP Tool," hosted by the University of California, provides a free, interactive form that walks you through the preparation of a data-management plan for more than a dozen organizations.

Many libraries are playing a role in this effort and researchers should check with reference librarians for help on this. Data storage and preparation can get complicated and it’s useful to have someone to guide you through the process. Federal agencies plan to establish standards for these so-called "metadata."

Related posts:

Digital Preservation Matters.

Friday, July 10, 2015

Track the Impact of Research Data with Metrics; Gauge Archive Capacity

How to Track the Impact of Research Data with Metrics. Alex Ball, Monica Duke. Digital Curation Centre. 29 June 2015.
This guide from the DCC provides help on how to track and measure the impact of research data. It provides:

impact measurement concepts, services and tools for measuring impact
tips on increasing the impact of your data
how institutions can benefit from data usage monitoring
help to gauge capacity requirements in storage, archival and network systems
information on setting up promotional activities

Institutions can benefit from data usage monitoring as they:

monitor the success of the infrastructure providing access to the data
gauge capacity requirements in storage, archival and network systems
create promotional activities around the data, sharing and re-use
create special collections around datasets;
meet funder requirements to safeguard data for the established lifespan

Tips for raising research data impact

deposit data in a trustworthy repository
provide appropriate metadata
enable open access
apply a license to the data about what uses are permitted
raise awareness to ensure it is visible (citations, publication, provide the dataset identifier, etc)

Digital Preservation Matters.

Monday, June 29, 2015

SIRF: Self-contained Information Retention Format

SIRF: Self-contained Information Retention Format. Sam Fineberg,et al. SNIA Tutorial. 2015. [PDF]
Generating and collecting very large data sets that need to be kept for long periods is a necessity for many organizations, included sciences, archives, commerce. The presentation describes the challenges with keeping data long term with Linear Tape File System (LTFS) technology and a Self-contained Information Retention Format (SIRF). The top external factors driving long-term retention requirements are: Legal risk, compliance regulations, business risk, and security risk.

What does long-term mean? Retention of 20 years or more is required by 70% of the responses in a poll.

100 years: 38.8%
50-100 years: 18.3%
21-50 years: 31.1%
11-20 years: 15.7%
7-10 years: 12.3%
3-5 years: 1.9%

The need for digital preservation:

Regulatory compliance and legal issues
Emerging web services and applications
Many other fixed-content repositories (Scientific data, libraries, movies, music, etc.)

Data stored should remain accessible, undamaged, and usable for as long as desired and at an affordable cost. Affordable depends on the "perceived future value of information". There are problems with verifying the correctness and authenticity of semantic information over time. SIRF is the digital equivalent of a self contained archival box. It contains:

set of preservation objects and a catalog (logical or physical)
metadata about the contents and individual objects
self describing standard catalog information so it can all be maintained
a "magic object" that identifies the container and version

The metadata contains basic information that can vary depending on the preservation needs. It allows a deeper description of t he objects along with the content meaning and the relationship between the objects.

When preserving objects, we need to keep all the information to make them fully usable in the future. No single technology will be "usable over the time-spans mandated by current digital preservation needs". LTFS technologies are "good for perhaps 10-20 years".

Digital Preservation Matters.

Tuesday, April 28, 2015

Database Preservation Toolkit

Database Preservation Toolkit. Website. April 2015.
The Database Preservation Toolkit uses input and output modules and allows conversion between database formats, including connection to live systems. It allows conversion of live or backed-up databases into preservation formats such as DBML, SIARD, or XML-based formats created for the purpose of database preservation.

This toolkit was part of the RODA project and now has been released as a separate project. The site includes download links and related publications and presentations.

Digital Preservation Matters.

Saturday, April 18, 2015

Digital Curation and Doctoral Research: Current Practice

Digital Curation and Doctoral Research: Current Practice. Daisy Abbott. International Journal of Digital Curation. 10 February 2015.[PDF]
More doctoral students are engaging in research data creation, processing, use, management, and preservation activities (digital duration) than ever before. Digital curation is an intrinsic part of the skills that students are expected to acquire.

Training in research skills and techniques is the key element in the development of a research student. The integration of digital curation into expected research skills is essential. Doctoral supervisors "should discuss and review research data management annually, addressing issues of the capture, management, integrity, confidentiality, security, selection, preservation and disposal, commercialization, costs, sharing and publication of research data and the production of descriptive metadata to aid discovery and re-use when relevant." Those supervisors may not necessarily have those skills themselves. And there is a gap in the literature about why and how to manage, curate, and preserve digital data as part of a PhD program.

While both doctoral students and supervisors can benefit from traditional resources on the topic, the majority of guidance on digital curation takes the form of online resources and training programs. In a survey,

over 50% of PhD holders consider long-term preservation to be extremely important.
under 40% of students consider long-term preservation to be extremely important.
90% of doctoral students and supervisors consider digital curation to be moderately to extremely important.
Yet 74% of respondents stated that they had limited or no skills in digital curation and only 10% stated that they were “fairly skilled” or “expert”.

And generally researchers were not are of the digital curation support services that are available. The relatively recent emphasis on digital curation in research nature of or the processes, present problems for supervisors. Developing the appropriate skills and knowledge to create, access, use, manage, store and preserve data should therefore be considered an important part of any researcher’s development. Efforts should be taken to

Ensure practical digital curation is understood
Encourage responsibility for digital curation activities in institutional support structures
Increase the discoverability and availability of digital curation support services

Digital Preservation Matters.

Thursday, April 09, 2015

Digital Preservation: We Know What it Means Today, But What Does Tomorrow Bring?

Digital Preservation: We Know What it Means Today, But What Does Tomorrow Bring? Celia Jenkins on Randy Kiefer's presentation. UKSGLive. April 3, 2015.
Long-term preservation refers to processes and procedures required to ensure content remains accessible well into the future. Publishers want to be good stewards of their content. Digital Preservation is an "insurance policy" for e-resources. Commercial hosting platforms and aggregators are not preservation archives! They can remove discontinued content from their system, and commercial businesses may disappear, as with Metapress. There are global digital preservation archives, such as CLOCKSS and Portico, and regional archives, such as National Libraries.
The biggest challenges in the future are: formats (especially presentation of content; and what to do with databases, datasets and supplementary materials. "Any format can be preserved, including video. The issue is that of space, cost and presentation (especially if the format is now not in use/supported)." There are legal issues with cloud based preservation systems. There is no legal precedent with a cloud-based preservation system, and no protection with regards to security.

Digital Preservation Matters.

Friday, March 27, 2015

National Institutes of Health: Plan for Increasing Access to Scientific Publications and Digital Scientific Data from NIH Funded Scientific Research

National Institutes of Health: Plan for Increasing Access to Scientific Publications and Digital Scientific Data from NIH Funded Scientific Research. February 2015.

This document describes NIH’s plans to build upon and enhance its longstanding efforts to increase access to scholarly publications and digital data resulting from NIH-funded research. Sections relevant to digital preservation and long term management:

NIH intends to make public access to digital scientific data the standard for all NIH funded research. Following adoption of the final plan, NIH will:

Explore steps to require data sharing.
Ensure that all NIH-funded researchers prepare data management plans and that the plans are evaluated during peer review.
Develop additional data management policies to increase public access to designated types of biomedical research data.
Encourage the use of established public repositories and community-based standards.
Develop approaches to ensure the discoverability of data sets resulting from NIH-funded research to make them findable, accessible, and citable.
Promote interoperability and openness of digital scientific data generated or managed by NIH.
Explore the development of a data commons. NIH will explore the development of a commons, a shared space for basic and clinical research output including data, software, and narrative, that follows the FAIR principles of Find, Access, Interoperate and Reuse.

Preservation
Preservation is one of the Public Access Policy’s primary objectives. It wants to ensure that publications and metadata are stored in an archival solution that:

provides for long-term preservation and access to the content without charge;
uses standards, widely available and, to the extent possible, nonproprietary archival formats for text and associated content (e.g., images, video, supporting data);
provides access for persons with disabilities

The content in the NIH database is actively curated using XML records which is future proof, in that XML is technology independent and can be easily and reliably migrated as technology evolves.

The first principle behind the plan for increasing access to digital scientific data is: The sharing and preservation of data advances science by broadening the value of research data across disciplines and to society at large, protecting the integrity of science by facilitating the validation of results, and increasing the return on investment of scientific research.

Data Management Plans
Data management planning should be an integral part of research planning. NIH wants to ensure that all extramural researchers receiving Federal grants and contracts for scientific research and intramural researchers develop data management plans describing how they will provide for long-term preservation of, and access to, scientific data in digital formats resulting from federally funded research, or explaining why long-term preservation and access cannot be justified. In order to preserve the balance between the relative benefits of long-term preservation and access and the associated cost and administrative burden, NIH will continue to expect researchers to consider the benefits of long-term preservation of data against the costs of maintaining and sharing the data.

NIH will assess whether the appropriate balance has been achieved in data management plans between the relative benefits of long-term preservation and access and the associated cost and administrative burden. It will also develop guidance with the scientific community to decide which data should be prioritized for long-term preservation and access. NIH will also explore and fund innovative tools and services that improve search, archiving, and disseminating of data, while ensuring long-term stewardship and usability.

Assessing Long-Term Preservation Needs
NIH will provide for the preservation of scientific data and outline options for developing and sustaining repositories for scientific data in digital formats. The policies expect long-term preservation of data.
Long-term preservation and sustainability will be included in data management plans and will collaborate with other agencies on how best to develop and sustain repositories for digital scientific data.

Digital Preservation Matters.

Subscribe to: Posts (Atom)