Showing posts with label data management. Show all posts
Showing posts with label data management. Show all posts

Wednesday, November 16, 2016

A Doomsday Scenario: Exporting CONTENTdm Records to XTF

A Doomsday Scenario: Exporting CONTENTdm Records to XTF. Andrew Bullen. D-Lib Magazine. November/December 2016.
     Because of budgetary concerns, the Illinois State Library asked Andrew Bullen to explore how their CONTENTdm collections could be migrated to another platform. (The Illinois Digital Archives repository is based on CONTENTdm). He chose methods that would allow him to quickly migrate the collections using existing tools, particularly PHP, Perl, and XTF which they use as the platform for a digital collection of electronic Illinois state documents. The article shows the perl code written, metadata, record examples, and walks through the process. He started A Cookbook of Methods for Using CONTENTdm APIs. Each collection presented different challenges and required custom programming. He recommends reviewing the metadata elements of each collection and normalizing like elements as much as possible, and plan what elements can be indexed and how faceted browsing could be implemented. The test was to see if the data could be reasonably converted so not all parts were implemented. In a real migration, CONTENTdm's APIs could be used as a data transfer medium.

Saturday, October 15, 2016

DPTP: Introduction to Digital Preservation Planning for Research Managers

DPTP: Introduction to Digital Preservation Planning for Research Managers. Ed Pinsent, Steph Taylor. ULCC. 15 October 2016.
     Today I saw this course offered and thought it looked interesting (wish I were in London to attend).  It is a one-day introduction to digital preservation and is designed specifically to look at preservation planning from the perspective of the research data manager. Digital preservation, the management and safeguarding of digital content for the long-term, is becoming more important for research data managers to make sure  content remains accessible and authentic over time.  The learning outcomes are:
  • Understand what digital preservation means and how it can help research managers
  • How to assess content for preservation
  • How to integrate preservation planning into a research data management plan
  • How to plan for preservation interventions
  • How to identify reasons and motivations for preservation for individual projects
  • What storage means, and the storage options that are available
  • How to select appropriate approaches and methods to support the needs of projects
  • How to prepare a business case for digital preservation
The course contains eight modules, which are:
  1. Find out about digital preservation and how and why it is important in RDM.
  2. Assessing research data and understanding how to preserve them for the longer term, and understanding your users.
  3. Learn how a RDM plan can include preservation actions. 
  4. Managing data beyond the life of projects, planning the management of storage and drafting a selection policy.
  5. Understanding individual institutions, stakeholders and requirements and risk assessment.
  6. Understand why preservation storage has extra requirements, considering‘the Cloud’
  7. The strategy of migrating formats, including databases; risks and benefits, and tools you can use. 
  8. Making a business case (Benefits; Risks; Costs) to persuade your institution why digital preservation is important

Thursday, March 24, 2016

The FAIR Guiding Principles for scientific data management and stewardship

The FAIR Guiding Principles for scientific data management and stewardship. Mark D. Wilkinson, et al. Nature. 15 March 2016. [PDF]
     "There is an urgent need to improve the infrastructure supporting the reuse of scholarly data." Good data management is not a goal in itself, but a conduit leading to knowledge discovery, innovation and the reuse of the data. The current digital ecosystem prevents this, which is why the funding and publishing community is beginning to require data management and stewardship plans. "Beyond proper collection, annotation, and archival, data stewardship includes the notion of ‘long-term care’ of valuable digital assets" so they can be discovered and re-used for new investigations.

This article describes four foundational principles (FAIR) to guide data producers and publishers:
  • Findability,
    • assigned a globally unique and persistent identifier
    • data are described with rich metadata
    • metadata clearly include the identifier of the data it describes
    • data are registered or indexed in a searchable resource
  • Accessibility,
    • data are retrievable by their identifier using a standardized communications protocol
    • the protocol is open, free, and universally implementable
    • the protocol allows for an authentication and authorization procedure,
    • metadata are accessible, even when the data are no longer available
  • Interoperability, 
    • data use a formal, accessible, shared, and broadly applicable language for knowledge representation
    • data use vocabularies that follow FAIR principles
    • data include qualified references to other (meta)data
  • Reusability
    • meta(data) are richly described
    • (meta)data have a clear data usage license
    • (meta)data have a detailed provenance
    • (meta)data meet community standards
These FAIR principles guide data publishers and stewards in evaluating their implementation choices. They are a prerequisite for proper data management and data stewardship. Achieving these goals requires working together with shared goals and principles.

Friday, December 04, 2015

March 2015 PASIG Meeting Presentations and Recent Webinars

March 2015 PASIG Meeting Presentations and Recent Webinars . Preservation and Archiving Special Interest Group (PASIG). March 11-13, 2015.
  Recent presentations and webinars from PASIG and ASIS&T are available on the PASIG site. These include:
  • March 2015 PASIG Meeting Presentations 
  • Tiered Adaptive Storage for Big Data and Supercomputing. Jason Goodman
  • Video Surveillance: Consuming I.T. Capacity At Significant Rates. Jay Jason Bartlett
  • Archive and Preservation for Collections Leveraging Standards Based Technologies and the Cloud. Brian Campanotti
  • What Would an Ideal Digital Preservation Technical Registry Look Like?. Steve Knight and Peter McKinney
  • Three Critical Elements of Long-Term Storage in the Cloud. Amir Kapadia
  • Policy Based Data Management. Reagan Moore
  • Digital Forensics and BitCurator. Christopher (Cal) Lee
  • The Essential Elements of Intelligently Managed Tiered Storage Infrastructures. Raymond Clarke
  • Implementing Sustainable Digital Preservation.
  • How to Access Your Digital Value at Risk:  An Introduction to the Digital Value at Risk.
  • Building Communities and Services in Support of Data-Intensive Research. Stephen Abrams
  • Storage Technology Trends for Archiving. Tom Wultich and Bob Raymond
  • Stewarding Research Data with Fedora and Islandora. Mark Leggott
  • Challenges of Digital Media Preservation in an Active Archive. Karen Cariani, David W. MacCarn
  • An Introduction to the National Digital Information Infrastructure and Preservation Program (NDIIPP) and its Digital Preservation Initiatives. Leslie Johnston
  • Digital Preservation in Theory and Practice:  A Preservation and Archiving Special Interest Group (PASIG) Boot Camp Webinar. Tom Cramer

Monday, November 23, 2015

Introduction to Metadata Power Tools for the Curious Beginner

Introduction to Metadata Power Tools for the Curious Beginner. Maureen Callahan, Regine Heberlein, Dallas Pillen. SAA Archives 2015. August 20, 2015.   PowerPoint  Google Doc 
      "At some point in his or her career, EVERY archivist will have to clean up messy data, a task which can be difficult and tedious without the right set of tools." A few notes from the excellent slides and document:

Basic Principles of Working with Power Tools
  • Create a Sandbox Environment: have backups. It is ok to break things
  • Think Algorithmically: Break a big problem down into smaller steps
  • Choosing a Tool: The best tools, works for your problem and skill set
  • Document: Successes, failures, procedures
Dare to Make Mistakes
  • as long as you know how to recognize and undo them!
  • view mistakes as an opportunity
  • mistakes can teach you as much about your data as about your tool
  • share your mistakes so others may benefit
  • realize that everybody makes them
General Principles
  • Know the applicable standards
  • Know your data
  • Know what you want
  • Normalize your data before you start a big project
  • The problem is intellectual, not technical
  • Use the tools available to you
  • Don’t do what a machine can do for you
  • Think about one-off operations vs. tools you might re-use or re-purpose
  • Think about learning tools in terms of raising the level of staff skill
Tools
  • XPath
  • Regex
  • XQuery
  • XQuery Update
  • XSLT
  • batch
  • Linux command line
  • Python
  • AutoIt

Thursday, October 01, 2015

Towards Sustainable Curation and Preservation: The SEAD Project’s Data Services Approach

Towards Sustainable Curation and Preservation: The SEAD Project’s Data Services Approach. James Myers, et al. IEEE International Conference on eScience. September 3, 2015. [PDF]
  This is a preview of a paper that will be presented at the conference on the Sustainable Environment: Actionable Data (SEAD). It details efforts to develop data management and curation services and to make those services available for active research groups to use. The introduction raises an apparent paradox: researchers face data management challenges yet curation practices that could help are used only after research work is completed (if at all). Adding data and metadata incrementally as the data are produced, the metadata could be used to help organize data during research.

If the system that preserved the data also generated citable persistent identifiers and dynamically updated the project’s web site with those citations, then completing the publication process would be in the best interest of the researcher. The discussions have revolved around two general areas that have been termed Active and Social Curation:
  1. Active Curation: focus primarily on the activities of data producers and curators working during research projects to produce published data collections. 
  2. Social Curation: explores how the  actions of the user community can be leveraged to provide further value. This could involve the ability of research groups to 
    1. publish derived value-added data products, 
    2. notify researchers when revisions or derived products appear, 
    3. monitor the mix of file formats and metadata to help determine migration strategies
SEAD’s initial capabilities are provided by three primary interacting components:
  1. Project Spaces: secure, self-managed storage and toolsto work with data resources
  2. Virtual Archive: a service that manages publication of data collections from Project Spaces to long-term repositories
  3. Researcher Network: personal and organizational profiles that can include literature and data publications.
SEAD has developed the ability to manage, curate, and publish to sustainability science projects data through hosted project spaces. This is a new option for projects that is more powerful than just using a shared file system and that is also more cost effective than a custom project solution.


Tuesday, September 29, 2015

Do You Have an Institutional Data Policy?

Do You Have an Institutional Data Policy? A Review of the Current Landscape of Library Data Services and Institutional Data Policies. Kristin Briney, Abigail Goben, Lisa Zilinski. Journal of Librarianship and Scholarly Communication. 22 Sep 2015.  [PDF]
     This study was to look at a correlation between policy existence and either library data services or the presence of a data librarian. Data services in libraries are becoming mainstream and librarians have an opportunity to work with researchers at their institutions and help them understand the policies in place or to work toward a policy. Some items of note from the article:
  • Fewer universities have a data librarian on staff (37%) than offer data services.
  • Many libraries (65%) have a research data repository, either in an IR or in a repository specifically for data. 
  • Fewer universities (11%) have dedicated data repositories as compared with IRs that accept data (58%).
  • All universities with over $1 billion per year in research expenditures offer data services and a place to host data. Most (89%) of these institutions also have a data librarian. And (33%) have a data repository
  • Nearly half (44%) of all universities studied have some type of policy covering research data
    • Half of the policies designated an owner of university research data (67%) 
    • Data is required to be retained for some period of time (52%)
Standalone data policies covered many topics:
  • defined data (61%), 
  • identified a data owner (62%), 
  • state a specific retention time (62%), 
  • identified who can have access to the data (52%), 
  • described disposition of the data when a researcher leaves the university (64%) 
  • designate a data steward (46%) 
Data services are becoming a standard at major research institutions. However, institutional data policies are often difficult to identify and may be confusing for researchers. The trend of libraries having a data policy, offering data services, and having a data librarian will become typical at major research institutions. 

Friday, September 25, 2015

Data Management Practices Across an Institution: Survey and Report

Data Management Practices Across an Institution: Survey and Report. Cunera Buys, Pamela Shaw. Journal of Librarianship and Scholarly Communication. 22 Sep 2015.
     Data management is becoming increasingly important to researchers in all fields. The results of a survey show that both short and long term storage and preservation solutions are needed. When asked, 31% of respondents did not know how much storage they will need, which makes establishing a correctly sized research data storage service difficult. This study presents results from a survey of digital data management practices across all disciplines at a university. In the survey, 65% of faculty said it was important to share data, but less than half of the them "reported that they 'always' or 'frequently' shared their data openly, despite their belief in the importance of sharing".

Researchers produce a wide variety of data types and sizes, but most create no metadata or do not use metadata standards and most researchers were uncertain about how to meet the NSF data management plan requirements (only 45% had a plan). A study in 2011 of data storage and management needs across several academic institutions and found many researchers were satisfied with short-term data storage and management practices, but not satisfied with long-term data storage options. Researchers in the same study did not believe their institutions provided adequate funds, resources, or instruction on good data management practices. When asked about where research data is stored:
  • Sixty-six percent use computer hard drives
  • 47% use external hard drives
  • 50% use departmental or school servers
  • 38% store data on the instrument that generated the data
  • 31% use cloud-based storage services
    •  Dropbox was the most popular service at 63%
  • 27% use flash drives
  • 6% use external data repositories.

Most researchers expected to store raw and published data, “indefinitely”. Many respondents also selected 5-10 years, and very few said they keep data for less than one year. All schools all schools suggest that data are  relevant for long periods of time or indefinitely. Specific retention preferences by school were:
  • The college of arts and sciences prefers “indefinitely” for ALL data types
  • Published data: All schools prefer “indefinitely” for published data except
    • The law school prefers 1-5 years for published data
  • Other data:
    • The school of medicine prefers 5-10 years for all other data types
    • The school of engineering prefers 1-5 years for all other data types
    • The college of arts and sciences “Indefinitely” for raw data
    • The school of management “Indefinitely” for raw data

Keeping raw data / source material was useful since researchers may use it for
  • future / new studies (77 responses), 
  • utilize it for longitudinal studies (9 responses)
  • share it with colleagues (6 responses). 
  • valuable for replicating study results (10 responses), 
  • responding to challenges of published results, 
  • data would be difficult or costly to replicate 
  • simply stated that it is good scientific practice to retain data (4 responses).

When asked, 66% indicated they would need additional storage; most said 1-500 gigabytes or  “don’t know.” Also, when asked what services would be useful in managing research data the top responses were:
  • long term data access and preservation (63%), 
  • services for data storage and backup during active projects(60%), 
  • information regarding data best practices (58%), 
  • information about developing data management plans or other data policies (52%), 
  • assistance with data sharing/management requirements of funding agencies (48%), and 
  • tools for sharing research (48%).
Since most respondents said they planned to keep their data indefinitely, that means that institutional storage solutions would need to accommodate "many data types and uncertain storage capacity needs over long periods of time". The university studied lacks a long term storage solution for large data, but has short term storage available. Since many researchers store data on personal or laboratory computers, laboratory equipment, and USB drives, there is a greater risk of data loss. There appears to be a need to educate researchers on best practices for data storage and backup.

There appears to be a need to educate researchers on external data repositories that are available and on funding agencies’ requirements for data retention. The library decided to provide a clear set of  funder data retention policies linked from the library’s data management web guide. Long-term storage of data is a problem for researchers because of the data and the lack of stable storage solutions and that limits data retention and sharing.

Wednesday, September 23, 2015

A Selection of Research Data Management Tools Throughout the Data Lifecycle

A Selection of Research Data Management Tools Throughout the Data Lifecycle. Jan Krause. Ecole polytechnique fédérale de Lausanne. September 9, 2015. [PDF]
     This article looks at the data lifecycle management phases and the many tools that exist to help manage data throughout the process. These tools will help researchers make the most out of their data, save time in the long run, promote reproducible research, and minimize the risks with the data. The lifecycle management phases are: discovery, acquisition, analysis, collaboration, writing, publication and deposit in trusted data repositories.  There are tools in each of the areas. A few of the many tools listed are:
It is important to use appropriated data and metadata standards, especially data formats, which should happen at the beginning since these are difficult to after the project is started.

Tuesday, August 25, 2015

What is actually happening out there in terms of institutional data repositories?

What is actually happening out there in terms of institutional data repositories?  Ricky Erway. OCLC Research. July 27, 2015.
     Academic libraries are talking about providing data curation services for their researchers.  In most cases they offer just training and advice, but not actual data management services. While technical, preservation, and service issues can be challenging, the funding issues are probably the thing that inhibits this service most. This is an important service that supports the university research mission.

The survey shows of the 22 institutions that answered the survey:
  • stand-alone data repository: 8
  • combination institutional repository and data repository: 12
  • DSpace: 6
  • Hydra/Fedora systems: 6
  • locally developed systems: 4
  • Rosetta, Dataverse, SobekCM, and HUBzero: 1 each
For preservation services:
  • all provide integrity checks except 1
  • keep offsite backup copies: 17
  • provide format migration: 12
  • put master files in a dark archive: 10
For funding:
  • the library’s base budget covered at least some of the expenses: 18
  • the library budget the only source of funding: 7
  • receive fees from researchers: 7
  • receive fees from departments: 4
  • receive institutional funding specifically for data management: 5
  • receive money from the IT budget: 4
  • receive direct funds from grant-funded projects: 1
  • receive indirect funds from grant-funded projects: 1

Monday, August 24, 2015

University Data Policies and Library Data Services: Who Owns Your Data?

University Data Policies and Library Data Services: Who Owns Your Data? Lisa D. Zilinski, Abigail Gobens and Kristin Briney. Bulletin of the Association for Information Science and Technology. August 2015. [PDF]
     'Who owns the data' is an important question, but the answer is often unclear, especially for unfunded research and pilot projects. Other questions that need to be asked are:
  • What happens if a researcher leaves the institution?
  • What if someone needs access to the data?
  • How long do I have to keep them and how should I discard them?
  • What happens if there is no policy? How should policies be determined?
  • If the data is part of a collaborating project then which policy takes precedence?
From the study, the author report that approximately
  • 50% of the libraries surveyed offer some form of data services beyond a resource guide. 
  • 40% of the libraries have a staff member (often the science librarian) assigned to research data management initiatives. 
  • 10% have a dedicated data repository.
This study points out the challenges that researchers, librarians and institutions face when trying to meet funding or journal requirements on public access. This study also found that top research institutions almost universally offer research data services. Libraries are developing programs and services aimed at the entire data life cycle while ownership of the data and other legal concerns are of highest significance to the universities. This provides an opportunity for librarians to lead policy development; educate faculty and administrators about best practices; and determine how to navigate the numerous policies from funding groups, academic journals, and collaborating institutions.

Data Management Outreach to Junior Faculty Members: A Case Study

Data Management Outreach to Junior Faculty Members: A Case Study. Megan Sapp Nelson. Journal of eScience Librarianship. August 21, 2015.
     Data management is generally not addressed with new career faculty and it is either over looked or assumed that these faculty will figure it out on their own. A brownbag and workshop outreach program was developed and presented junior faculty early in their career to introduce them to potential issues and solutions of data management. This gave them an opportunity to brainstorm with more experienced faculty members. Objectives for the workshop included that the Faculty will:
  • Evaluate the current state of their data management practices.
  • Develop a prioritized list of actions that can be put into place.
  • Understand how those actions can be transmitted to a research group.
The case study and additional files are available at this link. Graduate students have a different perspective from the faculty.  Graduate students should:
  • Focus on mechanics over deeper understanding of concepts.
  • Learn data management from faculty within the context of an immediate
  • problem and therefore don’t necessarily get broad training in the full lifecycle of data management.
  • Figure out data management on their own, and figure it out differently from everyone else in the lab unless a protocol is put in place.
  • Have a wide spectrum of expertise.
  • Frequently suggest and adopt the data analysis tools used in labs which leads to fragmented data management for the professor over time.
The key to the success of the workshop was to involve the junior faculty peer group along with their mentor faculty members. This new tool was useful in addition to the The Data Curation Profile tool.

Monday, August 10, 2015

DMPonline: recent updates and new functionality

DMPonline: recent updates and new functionality. Sarah Jones. DCC curation webinars. 27 May 2015. [Video and slides].
     This webinar provides an overview of the new functionality in the UK DMPonline tool for data management plans and the development road-map through January 2016. This can be customized and internationalized. The user group aims to make it easier for the community to feed in ideas and direct the development. There is also a reference to the DMP guidance, tools and resources page which is very useful. It contains:

Related posts:

Saturday, August 08, 2015

Where Should You Keep Your Data?

Where Should You Keep Your Data? Karen M. Markin. The Chronicle of Higher Education. June 23, 2015.
     Federal funding agencies have made it clear that grant proposals must include plans for sharing research data with other scientists. What has not been clear is how and where researchers should store their data, which can range from sensitive personal medical information to enormous troves of satellite imagery.  Although data-sharing requirements have been in place for years, universities have been slow to assist principal investigators make that happen. Now if you don’t comply with the new policies, you might be prohibited from receiving additional grant money. Funding can be withheld from researchers who don’t comply. Principal investigators are urged to place their data in existing publicly accessible repositories and the NIH has a list of repositories. The NSF directs researchers to specific repositories.
The "DMP Tool," hosted by the University of California, provides a free, interactive form that walks you through the preparation of a data-management plan for more than a dozen organizations.

Many libraries are playing a role in this effort and researchers should check with reference librarians for help on this. Data storage and preparation can get complicated and it’s useful to have someone to guide you through the process. Federal agencies plan to establish standards for these so-called "metadata."

Related posts:

Friday, July 10, 2015

Track the Impact of Research Data with Metrics; Gauge Archive Capacity

How to Track the Impact of Research Data with Metrics. Alex Ball, Monica Duke.  Digital Curation Centre. 29 June 2015.
   This guide from the DCC provides help on how to track and measure the impact of research data. It provides:
  • impact measurement concepts, services and tools for measuring impact
  • tips on increasing the impact of your data 
  • how institutions can benefit from data usage monitoring  
  • help to gauge capacity requirements in storage, archival and network systems
  • information on setting up promotional activities 
Institutions can benefit from data usage monitoring as they:
  • monitor the success of the infrastructure providing access to the data
  • gauge capacity requirements in storage, archival and network systems
  • create promotional activities around the data, sharing and re-use
  • create special collections around datasets;
  • meet funder requirements to safeguard data for the established lifespan
Tips for raising research data impact
  • deposit data in a trustworthy repository
  • provide appropriate metadata
  • enable open access
  • apply a license to the data about what uses are permitted
  • raise awareness to ensure it is visible (citations, publication, provide the dataset identifier, etc)

Monday, June 29, 2015

SIRF: Self-contained Information Retention Format

SIRF: Self-contained Information Retention Format. Sam Fineberg,et al. SNIA Tutorial. 2015. [PDF]
Generating and collecting very large data sets that need to be kept for long periods is a necessity for many organizations, included sciences, archives, commerce. The presentation describes the challenges with keeping data long term with Linear Tape File System (LTFS) technology and a Self-contained Information Retention Format (SIRF). The top external factors driving long-term retention requirements are: Legal risk, compliance regulations, business risk, and security risk.

What does long-term mean? Retention of 20 years or more is required by 70% of the responses in a poll.
  • 100 years: 38.8%
  • 50-100 years: 18.3%
  • 21-50 years: 31.1%
  • 11-20 years: 15.7%
  • 7-10 years: 12.3%
  • 3-5 years: 1.9%
The need for digital preservation:
  • Regulatory compliance and legal issues
  • Emerging web services and applications
  • Many other fixed-content repositories (Scientific data, libraries, movies, music, etc.)
Data stored should remain accessible, undamaged, and usable for as long as desired and at an affordable cost. Affordable depends on the "perceived future value of information". There are problems with verifying the correctness and authenticity of semantic information over time. SIRF is the digital equivalent of a self contained archival box. It contains:
  • set of preservation objects and a catalog (logical or physical)
  • metadata about the contents and individual objects
  • self describing standard catalog information so it can all be maintained
  • a "magic object" that identifies the container and version
The metadata contains basic information that can vary depending on the preservation needs. It allows a deeper description of t he objects along with the content meaning and the relationship between the objects.

When preserving objects, we need to keep all the information to make them fully usable in the future. No single technology will be "usable over the time-spans mandated by current digital preservation needs". LTFS technologies are "good for perhaps 10-20 years".

Tuesday, April 28, 2015

Database Preservation Toolkit

Database Preservation Toolkit. Website. April 2015.
The Database Preservation Toolkit uses input and output modules and allows conversion between database formats, including connection to live systems. It allows conversion of live or backed-up databases into preservation formats such as DBML, SIARD, or XML-based formats created for the purpose of database preservation.

This toolkit was part of the RODA project and now has been released as a separate project. The site includes download links and related publications and presentations.

Saturday, April 18, 2015

Digital Curation and Doctoral Research: Current Practice

Digital Curation and Doctoral Research: Current Practice. Daisy Abbott. International Journal of Digital Curation. 10 February 2015.[PDF]
More doctoral students are engaging in research data creation, processing, use, management, and preservation activities (digital duration) than ever before. Digital curation is an intrinsic part of the skills that students are expected to acquire.

Training in research skills and techniques is the key element in the development of a research student. The integration of digital curation into expected research skills is essential. Doctoral supervisors "should discuss and review research data management annually, addressing issues of the capture, management, integrity, confidentiality, security, selection, preservation and disposal, commercialization, costs, sharing and publication of research data and the production of descriptive metadata to aid discovery and re-use when relevant." Those supervisors may not necessarily have those skills themselves. And there is a gap in the literature about why and how to manage, curate, and preserve digital data as part of a PhD program.

While both doctoral students and supervisors can benefit from traditional resources on the topic, the majority of guidance on digital curation takes the form of online resources and training programs. In a survey,
  • over 50% of PhD holders consider long-term preservation to be extremely important. 
  • under 40% of students consider long-term preservation to be extremely important.
  • 90% of doctoral students and supervisors consider digital curation to be moderately to extremely important. 
  • Yet 74% of respondents stated that they had limited or no skills in digital curation and only 10% stated that they were “fairly skilled” or “expert”. 
And generally researchers were not are of the digital curation support services that are available. The relatively recent emphasis on digital curation in research nature of or the processes, present problems for supervisors. Developing the appropriate skills and knowledge to create, access, use, manage, store and preserve data should therefore be considered an important part of any researcher’s development. Efforts should be taken to
  • Ensure practical digital curation is understood
  • Encourage responsibility for digital curation activities in institutional support structures
  • Increase the discoverability and availability of digital curation support services

Thursday, April 09, 2015

Digital Preservation: We Know What it Means Today, But What Does Tomorrow Bring?

Digital Preservation: We Know What it Means Today, But What Does Tomorrow Bring? Randy Kiefer's presentation.  UKSGLive. April 3, 2015.
Long-term preservation refers to processes and procedures required to ensure content remains accessible well into the future. Publishers want to be good stewards of their content. Digital Preservation is an "insurance policy" for e-resources. Commercial hosting platforms and aggregators are not preservation archives! They can remove discontinued content from their system, and commercial businesses may disappear, as with Metapress. There are global digital preservation archives, such as CLOCKSS and Portico, and regional archives, such as National Libraries.
The biggest challenges in the future are: formats (especially presentation of content; and what to do with databases, datasets and supplementary materials. "Any format can be preserved, including video. The issue is that of space, cost and presentation (especially if the format is now not in use/supported)." There are legal issues with cloud based preservation systems. There is no legal precedent with a cloud-based preservation system, and no protection with regards to security.



Friday, March 27, 2015

National Institutes of Health: Plan for Increasing Access to Scientific Publications and Digital Scientific Data from NIH Funded Scientific Research

National Institutes of Health: Plan for Increasing Access to Scientific Publications and Digital Scientific Data from NIH Funded Scientific Research. February 2015.

This document describes NIH’s plans to build upon and enhance its longstanding efforts to increase access to scholarly publications and digital data resulting from NIH-funded research. Sections relevant to digital preservation and long term management:

NIH intends to make public access to digital scientific data the standard for all NIH funded research. Following adoption of the final plan, NIH will:
  • Explore steps to require data sharing.
  • Ensure that all NIH-funded researchers prepare data management plans and that the plans are evaluated during peer review.
  • Develop additional data management policies to increase public access to designated types of biomedical research data.
  • Encourage the use of established public repositories and community-based standards.
  • Develop approaches to ensure the discoverability of data sets resulting from NIH-funded research to make them findable, accessible, and citable.
  • Promote interoperability and openness of digital scientific data generated or managed by NIH.
  • Explore the development of a data commons. NIH will explore the development of a commons, a shared space for basic and clinical research output including data, software, and narrative, that follows the FAIR principles of Find, Access, Interoperate and Reuse.

Preservation
Preservation is one of the Public Access Policy’s primary objectives. It wants to ensure that publications and metadata are stored in an archival solution that:
  • provides for long-term preservation and access to the content without charge; 
  • uses standards, widely available and, to the extent possible, nonproprietary archival formats for text and associated content (e.g., images, video, supporting data); 
  • provides access for persons with disabilities
The content in the NIH database is actively curated  using XML records which is future proof, in that XML is technology independent and can be easily and reliably migrated as technology evolves. 

The first principle behind the plan for increasing access to digital scientific data is: The sharing and preservation of data advances science by broadening the value of research data across disciplines and to society at large, protecting the integrity of science by facilitating the validation of results, and increasing the return on investment of scientific research.

Data Management Plans
Data management planning should be an integral part of research planning.  NIH wants to ensure that all extramural researchers receiving Federal grants and contracts for scientific research and intramural researchers develop data management plans describing how they will provide for long-term preservation of, and access to, scientific data in digital formats resulting from federally funded research, or explaining why long-term preservation and access cannot be justified. In order to preserve the balance between the relative benefits of long-term preservation and access and the associated cost and administrative burden, NIH will continue to expect researchers to consider the benefits of long-term preservation of data against the costs of maintaining and sharing the data.

NIH will assess whether the appropriate balance has been achieved in data management plans between the relative benefits of long-term preservation and access and the associated cost and administrative burden. It will also develop guidance with the scientific community to decide which data should be prioritized for long-term preservation and access. NIH will also explore and fund innovative tools and services that improve search, archiving, and disseminating of data, while ensuring long-term stewardship and usability.

Assessing Long-Term Preservation Needs
NIH will provide for the preservation of scientific data and outline options for developing and sustaining repositories for scientific data in digital formats.  The policies expect long-term preservation of data.
Long-term preservation and sustainability will be included in data management plans and will collaborate with other agencies on how best to develop and sustain repositories for digital scientific data.