A Doomsday Scenario: Exporting CONTENTdm Records to XTF. Andrew Bullen. D-Lib Magazine. November/December 2016.
Because of budgetary concerns, the Illinois State Library asked Andrew Bullen to explore how their CONTENTdm collections could be migrated to another platform. (The Illinois Digital Archives repository is based on CONTENTdm). He chose methods that would allow him to quickly migrate the collections using existing tools, particularly PHP, Perl, and XTF which they use as the platform for a digital collection of electronic Illinois state documents. The article shows the perl code written, metadata, record examples, and walks through the process. He started A Cookbook of Methods for Using CONTENTdm APIs. Each collection presented different challenges and required custom programming. He recommends reviewing the metadata elements of each collection and normalizing like elements as much as possible, and plan what elements can be indexed and how faceted browsing could be implemented. The test was to see if the data could be reasonably converted so not all parts were implemented. In a real migration, CONTENTdm's APIs could be used as a data transfer medium.
This blog contains information related to digital preservation, long term access, digital archiving, digital curation, institutional repositories, and digital or electronic records management. These are my notes on what I have read or been working on. I enjoyed learning about Digital Preservation but have since retired and I am no longer updating the blog.
Showing posts with label data management. Show all posts
Showing posts with label data management. Show all posts
Wednesday, November 16, 2016
Saturday, October 15, 2016
DPTP: Introduction to Digital Preservation Planning for Research Managers
DPTP: Introduction to Digital Preservation Planning for Research Managers. Ed Pinsent, Steph Taylor. ULCC. 15 October 2016.
Today I saw this course offered and thought it looked interesting (wish I were in London to attend). It is a one-day introduction to digital preservation and is designed specifically to look at preservation planning from the perspective of the research data manager. Digital preservation, the management and safeguarding of digital content for the long-term, is becoming more important for research data managers to make sure content remains accessible and authentic over time. The learning outcomes are:
Today I saw this course offered and thought it looked interesting (wish I were in London to attend). It is a one-day introduction to digital preservation and is designed specifically to look at preservation planning from the perspective of the research data manager. Digital preservation, the management and safeguarding of digital content for the long-term, is becoming more important for research data managers to make sure content remains accessible and authentic over time. The learning outcomes are:
- Understand what digital preservation means and how it can help research managers
- How to assess content for preservation
- How to integrate preservation planning into a research data management plan
- How to plan for preservation interventions
- How to identify reasons and motivations for preservation for individual projects
- What storage means, and the storage options that are available
- How to select appropriate approaches and methods to support the needs of projects
- How to prepare a business case for digital preservation
- Find out about digital preservation and how and why it is important in RDM.
- Assessing research data and understanding how to preserve them for the longer term, and understanding your users.
- Learn how a RDM plan can include preservation actions.
- Managing data beyond the life of projects, planning the management of storage and drafting a selection policy.
- Understanding individual institutions, stakeholders and requirements and risk assessment.
- Understand why preservation storage has extra requirements, considering‘the Cloud’
- The strategy of migrating formats, including databases; risks and benefits, and tools you can use.
- Making a business case (Benefits; Risks; Costs) to persuade your institution why digital preservation is important
Thursday, March 24, 2016
The FAIR Guiding Principles for scientific data management and stewardship
The FAIR Guiding Principles for scientific data management and stewardship. Mark D. Wilkinson, et al. Nature. 15 March 2016. [PDF]
"There is an urgent need to improve the infrastructure supporting the reuse of scholarly data." Good data management is not a goal in itself, but a conduit leading to knowledge discovery, innovation and the reuse of the data. The current digital ecosystem prevents this, which is why the funding and publishing community is beginning to require data management and stewardship plans. "Beyond proper collection, annotation, and archival, data stewardship includes the notion of ‘long-term care’ of valuable digital assets" so they can be discovered and re-used for new investigations.
This article describes four foundational principles (FAIR) to guide data producers and publishers:
"There is an urgent need to improve the infrastructure supporting the reuse of scholarly data." Good data management is not a goal in itself, but a conduit leading to knowledge discovery, innovation and the reuse of the data. The current digital ecosystem prevents this, which is why the funding and publishing community is beginning to require data management and stewardship plans. "Beyond proper collection, annotation, and archival, data stewardship includes the notion of ‘long-term care’ of valuable digital assets" so they can be discovered and re-used for new investigations.
This article describes four foundational principles (FAIR) to guide data producers and publishers:
- Findability,
- assigned a globally unique and persistent identifier
- data are described with rich metadata
- metadata clearly include the identifier of the data it describes
- data are registered or indexed in a searchable resource
- Accessibility,
- data are retrievable by their identifier using a standardized communications protocol
- the protocol is open, free, and universally implementable
- the protocol allows for an authentication and authorization procedure,
- metadata are accessible, even when the data are no longer available
- Interoperability,
- data use a formal, accessible, shared, and broadly applicable language for knowledge representation
- data use vocabularies that follow FAIR principles
- data include qualified references to other (meta)data
- Reusability
- meta(data) are richly described
- (meta)data have a clear data usage license
- (meta)data have a detailed provenance
- (meta)data meet community standards
Friday, December 04, 2015
March 2015 PASIG Meeting Presentations and Recent Webinars
March 2015 PASIG Meeting Presentations and Recent Webinars . Preservation and Archiving Special Interest Group (PASIG). March 11-13, 2015.
Recent presentations and webinars from PASIG and ASIS&T are available on the PASIG site. These include:
Recent presentations and webinars from PASIG and ASIS&T are available on the PASIG site. These include:
- March 2015 PASIG Meeting Presentations
- Tiered Adaptive Storage for Big Data and Supercomputing. Jason Goodman
- Video Surveillance: Consuming I.T. Capacity At Significant Rates. Jay Jason Bartlett
- Archive and Preservation for Collections Leveraging Standards Based Technologies and the Cloud. Brian Campanotti
- What Would an Ideal Digital Preservation Technical Registry Look Like?. Steve Knight and Peter McKinney
- Three Critical Elements of Long-Term Storage in the Cloud. Amir Kapadia
- Policy Based Data Management. Reagan Moore
- Digital Forensics and BitCurator. Christopher (Cal) Lee
- The Essential Elements of Intelligently Managed Tiered Storage Infrastructures. Raymond Clarke
- Implementing Sustainable Digital Preservation.
- How to Access Your Digital Value at Risk: An Introduction to the Digital Value at Risk.
- Building Communities and Services in Support of Data-Intensive Research. Stephen Abrams
- Storage Technology Trends for Archiving. Tom Wultich and Bob Raymond
- Stewarding Research Data with Fedora and Islandora. Mark Leggott
- Challenges of Digital Media Preservation in an Active Archive. Karen Cariani, David W. MacCarn
- An Introduction to the National Digital Information Infrastructure and Preservation Program (NDIIPP) and its Digital Preservation Initiatives. Leslie Johnston
- Digital Preservation in Theory and Practice: A Preservation and Archiving Special Interest Group (PASIG) Boot Camp Webinar. Tom Cramer
Monday, November 23, 2015
Introduction to Metadata Power Tools for the Curious Beginner
Introduction to Metadata Power Tools for the Curious Beginner. Maureen Callahan, Regine Heberlein, Dallas Pillen. SAA Archives 2015. August 20, 2015. PowerPoint Google Doc
"At some point in his or her career, EVERY archivist will have to clean up messy data, a task which can be difficult and tedious without the right set of tools." A few notes from the excellent slides and document:
Basic Principles of Working with Power Tools
"At some point in his or her career, EVERY archivist will have to clean up messy data, a task which can be difficult and tedious without the right set of tools." A few notes from the excellent slides and document:
Basic Principles of Working with Power Tools
- Create a Sandbox Environment: have backups. It is ok to break things
- Think Algorithmically: Break a big problem down into smaller steps
- Choosing a Tool: The best tools, works for your problem and skill set
- Document: Successes, failures, procedures
- as long as you know how to recognize and undo them!
- view mistakes as an opportunity
- mistakes can teach you as much about your data as about your tool
- share your mistakes so others may benefit
- realize that everybody makes them
- Know the applicable standards
- Know your data
- Know what you want
- Normalize your data before you start a big project
- The problem is intellectual, not technical
- Use the tools available to you
- Don’t do what a machine can do for you
- Think about one-off operations vs. tools you might re-use or re-purpose
- Think about learning tools in terms of raising the level of staff skill
- XPath
- Regex
- XQuery
- XQuery Update
- XSLT
- batch
- Linux command line
- Python
- AutoIt
Thursday, October 01, 2015
Towards Sustainable Curation and Preservation: The SEAD Project’s Data Services Approach
Towards Sustainable Curation and Preservation: The SEAD Project’s Data Services Approach. James Myers, et al. IEEE International Conference on eScience. September 3, 2015. [PDF]
This is a preview of a paper that will be presented at the conference on the Sustainable Environment: Actionable Data (SEAD). It details efforts to develop data management and curation services and to make those services available for active research groups to use. The introduction raises an apparent paradox: researchers face data management challenges yet curation practices that could help are used only after research work is completed (if at all). Adding data and metadata incrementally as the data are produced, the metadata could be used to help organize data during research.
If the system that preserved the data also generated citable persistent identifiers and dynamically updated the project’s web site with those citations, then completing the publication process would be in the best interest of the researcher. The discussions have revolved around two general areas that have been termed Active and Social Curation:
This is a preview of a paper that will be presented at the conference on the Sustainable Environment: Actionable Data (SEAD). It details efforts to develop data management and curation services and to make those services available for active research groups to use. The introduction raises an apparent paradox: researchers face data management challenges yet curation practices that could help are used only after research work is completed (if at all). Adding data and metadata incrementally as the data are produced, the metadata could be used to help organize data during research.
If the system that preserved the data also generated citable persistent identifiers and dynamically updated the project’s web site with those citations, then completing the publication process would be in the best interest of the researcher. The discussions have revolved around two general areas that have been termed Active and Social Curation:
- Active Curation: focus primarily on the activities of data producers and curators working during research projects to produce published data collections.
- Social Curation: explores how the actions of the user community can be leveraged to provide further value. This could involve the ability of research groups to
- publish derived value-added data products,
- notify researchers when revisions or derived products appear,
- monitor the mix of file formats and metadata to help determine migration strategies
- Project Spaces: secure, self-managed storage and toolsto work with data resources
- Virtual Archive: a service that manages publication of data collections from Project Spaces to long-term repositories
- Researcher Network: personal and organizational profiles that can include literature and data publications.
Tuesday, September 29, 2015
Do You Have an Institutional Data Policy?
Do You Have an Institutional Data Policy? A Review of the Current Landscape of Library Data Services and Institutional Data Policies. Kristin Briney, Abigail Goben, Lisa Zilinski. Journal of Librarianship and Scholarly Communication. 22 Sep 2015. [PDF]
This study was to look at a correlation between policy existence and either library data services or the presence of a data librarian. Data services in libraries are becoming mainstream and librarians have an opportunity to work with researchers at their institutions and help them understand the policies in place or to work toward a policy. Some items of note from the article:
This study was to look at a correlation between policy existence and either library data services or the presence of a data librarian. Data services in libraries are becoming mainstream and librarians have an opportunity to work with researchers at their institutions and help them understand the policies in place or to work toward a policy. Some items of note from the article:
- Fewer universities have a data librarian on staff (37%) than offer data services.
- Many libraries (65%) have a research data repository, either in an IR or in a repository specifically for data.
- Fewer universities (11%) have dedicated data repositories as compared with IRs that accept data (58%).
- All universities with over $1 billion per year in research expenditures offer data services and a place to host data. Most (89%) of these institutions also have a data librarian. And (33%) have a data repository
- Nearly half (44%) of all universities studied have some type of policy covering research data
- Half of the policies designated an owner of university research data (67%)
- Data is required to be retained for some period of time (52%)
- defined data (61%),
- identified a data owner (62%),
- state a specific retention time (62%),
- identified who can have access to the data (52%),
- described disposition of the data when a researcher leaves the university (64%)
- designate a data steward (46%)
Friday, September 25, 2015
Data Management Practices Across an Institution: Survey and Report
Data Management Practices Across an Institution: Survey and Report. Cunera Buys, Pamela Shaw. Journal of Librarianship and Scholarly Communication. 22 Sep 2015.
Data management is becoming increasingly important to researchers in all fields. The results of a survey show that both short and long term storage and preservation solutions are needed. When asked, 31% of respondents did not know how much storage they will need, which makes establishing a correctly sized research data storage service difficult. This study presents results from a survey of digital data management practices across all disciplines at a university. In the survey, 65% of faculty said it was important to share data, but less than half of the them "reported that they 'always' or 'frequently' shared their data openly, despite their belief in the importance of sharing".
Researchers produce a wide variety of data types and sizes, but most create no metadata or do not use metadata standards and most researchers were uncertain about how to meet the NSF data management plan requirements (only 45% had a plan). A study in 2011 of data storage and management needs across several academic institutions and found many researchers were satisfied with short-term data storage and management practices, but not satisfied with long-term data storage options. Researchers in the same study did not believe their institutions provided adequate funds, resources, or instruction on good data management practices. When asked about where research data is stored:
Most researchers expected to store raw and published data, “indefinitely”. Many respondents also selected 5-10 years, and very few said they keep data for less than one year. All schools all schools suggest that data are relevant for long periods of time or indefinitely. Specific retention preferences by school were:
Keeping raw data / source material was useful since researchers may use it for
When asked, 66% indicated they would need additional storage; most said 1-500 gigabytes or “don’t know.” Also, when asked what services would be useful in managing research data the top responses were:
There appears to be a need to educate researchers on external data repositories that are available and on funding agencies’ requirements for data retention. The library decided to provide a clear set of funder data retention policies linked from the library’s data management web guide. Long-term storage of data is a problem for researchers because of the data and the lack of stable storage solutions and that limits data retention and sharing.
Data management is becoming increasingly important to researchers in all fields. The results of a survey show that both short and long term storage and preservation solutions are needed. When asked, 31% of respondents did not know how much storage they will need, which makes establishing a correctly sized research data storage service difficult. This study presents results from a survey of digital data management practices across all disciplines at a university. In the survey, 65% of faculty said it was important to share data, but less than half of the them "reported that they 'always' or 'frequently' shared their data openly, despite their belief in the importance of sharing".
Researchers produce a wide variety of data types and sizes, but most create no metadata or do not use metadata standards and most researchers were uncertain about how to meet the NSF data management plan requirements (only 45% had a plan). A study in 2011 of data storage and management needs across several academic institutions and found many researchers were satisfied with short-term data storage and management practices, but not satisfied with long-term data storage options. Researchers in the same study did not believe their institutions provided adequate funds, resources, or instruction on good data management practices. When asked about where research data is stored:
- Sixty-six percent use computer hard drives
- 47% use external hard drives
- 50% use departmental or school servers
- 38% store data on the instrument that generated the data
- 31% use cloud-based storage services
- Dropbox was the most popular service at 63%
- 27% use flash drives
- 6% use external data repositories.
Most researchers expected to store raw and published data, “indefinitely”. Many respondents also selected 5-10 years, and very few said they keep data for less than one year. All schools all schools suggest that data are relevant for long periods of time or indefinitely. Specific retention preferences by school were:
- The college of arts and sciences prefers “indefinitely” for ALL data types
- Published data: All schools prefer “indefinitely” for published data except
- The law school prefers 1-5 years for published data
- Other data:
- The school of medicine prefers 5-10 years for all other data types
- The school of engineering prefers 1-5 years for all other data types
- The college of arts and sciences “Indefinitely” for raw data
- The school of management “Indefinitely” for raw data
Keeping raw data / source material was useful since researchers may use it for
- future / new studies (77 responses),
- utilize it for longitudinal studies (9 responses)
- share it with colleagues (6 responses).
- valuable for replicating study results (10 responses),
- responding to challenges of published results,
- data would be difficult or costly to replicate
- simply stated that it is good scientific practice to retain data (4 responses).
When asked, 66% indicated they would need additional storage; most said 1-500 gigabytes or “don’t know.” Also, when asked what services would be useful in managing research data the top responses were:
- long term data access and preservation (63%),
- services for data storage and backup during active projects(60%),
- information regarding data best practices (58%),
- information about developing data management plans or other data policies (52%),
- assistance with data sharing/management requirements of funding agencies (48%), and
- tools for sharing research (48%).
There appears to be a need to educate researchers on external data repositories that are available and on funding agencies’ requirements for data retention. The library decided to provide a clear set of funder data retention policies linked from the library’s data management web guide. Long-term storage of data is a problem for researchers because of the data and the lack of stable storage solutions and that limits data retention and sharing.
Wednesday, September 23, 2015
A Selection of Research Data Management Tools Throughout the Data Lifecycle
A Selection of Research Data Management Tools Throughout the Data Lifecycle. Jan Krause. Ecole polytechnique fédérale de Lausanne. September 9, 2015. [PDF]
This article looks at the data lifecycle management phases and the many tools that exist to help manage data throughout the process. These tools will help researchers make the most out of their data, save time in the long run, promote reproducible research, and minimize the risks with the data. The lifecycle management phases are: discovery, acquisition, analysis, collaboration, writing, publication and deposit in trusted data repositories. There are tools in each of the areas. A few of the many tools listed are:
This article looks at the data lifecycle management phases and the many tools that exist to help manage data throughout the process. These tools will help researchers make the most out of their data, save time in the long run, promote reproducible research, and minimize the risks with the data. The lifecycle management phases are: discovery, acquisition, analysis, collaboration, writing, publication and deposit in trusted data repositories. There are tools in each of the areas. A few of the many tools listed are:
- How to Develop a Data Management and Sharing Plan
- DMPonline and DMPTool to help write data management plans.
- Recommended Data Formats
- ownCloud synchronizes and shares data on several computers
- re3data global registry of 1,200+ research data repositories in different disciplines
Tuesday, August 25, 2015
What is actually happening out there in terms of institutional data repositories?
What is actually happening out there in terms of institutional data repositories? Ricky Erway. OCLC Research. July 27, 2015.
Academic libraries are talking about providing data curation services for their researchers. In most cases they offer just training and advice, but not actual data management services. While technical, preservation, and service issues can be challenging, the funding issues are probably the thing that inhibits this service most. This is an important service that supports the university research mission.
The survey shows of the 22 institutions that answered the survey:
Academic libraries are talking about providing data curation services for their researchers. In most cases they offer just training and advice, but not actual data management services. While technical, preservation, and service issues can be challenging, the funding issues are probably the thing that inhibits this service most. This is an important service that supports the university research mission.
The survey shows of the 22 institutions that answered the survey:
- stand-alone data repository: 8
- combination institutional repository and data repository: 12
- DSpace: 6
- Hydra/Fedora systems: 6
- locally developed systems: 4
- Rosetta, Dataverse, SobekCM, and HUBzero: 1 each
- all provide integrity checks except 1
- keep offsite backup copies: 17
- provide format migration: 12
- put master files in a dark archive: 10
- the library’s base budget covered at least some of the expenses: 18
- the library budget the only source of funding: 7
- receive fees from researchers: 7
- receive fees from departments: 4
- receive institutional funding specifically for data management: 5
- receive money from the IT budget: 4
- receive direct funds from grant-funded projects: 1
- receive indirect funds from grant-funded projects: 1
Monday, August 24, 2015
University Data Policies and Library Data Services: Who Owns Your Data?
University Data Policies and Library Data Services: Who Owns Your Data? Lisa D. Zilinski, Abigail Gobens and Kristin Briney. Bulletin of the Association for Information Science and Technology. August 2015. [PDF]
'Who owns the data' is an important question, but the answer is often unclear, especially for unfunded research and pilot projects. Other questions that need to be asked are:
'Who owns the data' is an important question, but the answer is often unclear, especially for unfunded research and pilot projects. Other questions that need to be asked are:
- What happens if a researcher leaves the institution?
- What if someone needs access to the data?
- How long do I have to keep them and how should I discard them?
- What happens if there is no policy? How should policies be determined?
- If the data is part of a collaborating project then which policy takes precedence?
- 50% of the libraries surveyed offer some form of data services beyond a resource guide.
- 40% of the libraries have a staff member (often the science librarian) assigned to research data management initiatives.
- 10% have a dedicated data repository.
Data Management Outreach to Junior Faculty Members: A Case Study
Data Management Outreach to Junior Faculty Members: A Case Study. Megan Sapp Nelson. Journal of eScience Librarianship. August 21, 2015.
Data management is generally not addressed with new career faculty and it is either over looked or assumed that these faculty will figure it out on their own. A brownbag and workshop outreach program was developed and presented junior faculty early in their career to introduce them to potential issues and solutions of data management. This gave them an opportunity to brainstorm with more experienced faculty members. Objectives for the workshop included that the Faculty will:
Data management is generally not addressed with new career faculty and it is either over looked or assumed that these faculty will figure it out on their own. A brownbag and workshop outreach program was developed and presented junior faculty early in their career to introduce them to potential issues and solutions of data management. This gave them an opportunity to brainstorm with more experienced faculty members. Objectives for the workshop included that the Faculty will:
- Evaluate the current state of their data management practices.
- Develop a prioritized list of actions that can be put into place.
- Understand how those actions can be transmitted to a research group.
- Focus on mechanics over deeper understanding of concepts.
- Learn data management from faculty within the context of an immediate
- problem and therefore don’t necessarily get broad training in the full lifecycle of data management.
- Figure out data management on their own, and figure it out differently from everyone else in the lab unless a protocol is put in place.
- Have a wide spectrum of expertise.
- Frequently suggest and adopt the data analysis tools used in labs which leads to fragmented data management for the professor over time.
Monday, August 10, 2015
DMPonline: recent updates and new functionality
DMPonline: recent updates and new functionality. Sarah Jones. DCC curation webinars. 27 May 2015. [Video and slides].
This webinar provides an overview of the new functionality in the UK DMPonline tool for data management plans and the development road-map through January 2016. This can be customized and internationalized. The user group aims to make it easier for the community to feed in ideas and direct the development. There is also a reference to the DMP guidance, tools and resources page which is very useful. It contains:
Related posts:
This webinar provides an overview of the new functionality in the UK DMPonline tool for data management plans and the development road-map through January 2016. This can be customized and internationalized. The user group aims to make it easier for the community to feed in ideas and direct the development. There is also a reference to the DMP guidance, tools and resources page which is very useful. It contains:
- DMPonline: A web-based tool to assist users to create personalised plans
- Funders’ data plan requirements: Summary of funders' expectations for data management and sharing plans
- Checklist for a Data Management Plan: Useful questions and guidance for writing data management and sharing plans
- DMP checklist leaflet: A fold-out summary of the Checklist
- FAQ on Data Management Plans: A short list of key questions pertaining to Data Management Plans
- How to Develop a Data Management and Sharing Plan: A guide on how to meet UK funder expectations for DMPs
- Guidance and examples: Advice to help you write your data management and sharing plan
Related posts:
Saturday, August 08, 2015
Where Should You Keep Your Data?
Where Should You Keep Your Data? Karen M. Markin. The Chronicle of Higher Education. June 23, 2015.
Federal funding agencies have made it clear that grant proposals must include plans for sharing research data with other scientists. What has not been clear is how and where researchers should store their data, which can range from sensitive personal medical information to enormous troves of satellite imagery. Although data-sharing requirements have been in place for years, universities have been slow to assist principal investigators make that happen. Now if you don’t comply with the new policies, you might be prohibited from receiving additional grant money. Funding can be withheld from researchers who don’t comply. Principal investigators are urged to place their data in existing publicly accessible repositories and the NIH has a list of repositories. The NSF directs researchers to specific repositories.
The "DMP Tool," hosted by the University of California, provides a free, interactive form that walks you through the preparation of a data-management plan for more than a dozen organizations.
Many libraries are playing a role in this effort and researchers should check with reference librarians for help on this. Data storage and preparation can get complicated and it’s useful to have someone to guide you through the process. Federal agencies plan to establish standards for these so-called "metadata."
Related posts:
Federal funding agencies have made it clear that grant proposals must include plans for sharing research data with other scientists. What has not been clear is how and where researchers should store their data, which can range from sensitive personal medical information to enormous troves of satellite imagery. Although data-sharing requirements have been in place for years, universities have been slow to assist principal investigators make that happen. Now if you don’t comply with the new policies, you might be prohibited from receiving additional grant money. Funding can be withheld from researchers who don’t comply. Principal investigators are urged to place their data in existing publicly accessible repositories and the NIH has a list of repositories. The NSF directs researchers to specific repositories.
The "DMP Tool," hosted by the University of California, provides a free, interactive form that walks you through the preparation of a data-management plan for more than a dozen organizations.
Many libraries are playing a role in this effort and researchers should check with reference librarians for help on this. Data storage and preparation can get complicated and it’s useful to have someone to guide you through the process. Federal agencies plan to establish standards for these so-called "metadata."
Related posts:
- Plan for Increasing Access to Scientific Publications and Digital Scientific Data from NIH Funded Scientific Research
- Data Management Plans
- AHRQ Public Access to Federally Funded Research
- Got Data? A Guide to Data Preservation in the Information Age
Friday, July 10, 2015
Track the Impact of Research Data with Metrics; Gauge Archive Capacity
How to Track the Impact of Research Data with Metrics. Alex Ball, Monica Duke. Digital Curation Centre. 29 June 2015.
This guide from the DCC provides help on how to track and measure the impact of research data. It provides:
This guide from the DCC provides help on how to track and measure the impact of research data. It provides:
- impact measurement concepts, services and tools for measuring impact
- tips on increasing the impact of your data
- how institutions can benefit from data usage monitoring
- help to gauge capacity requirements in storage, archival and network systems
- information on setting up promotional activities
- monitor the success of the infrastructure providing access to the data
- gauge capacity requirements in storage, archival and network systems
- create promotional activities around the data, sharing and re-use
- create special collections around datasets;
- meet funder requirements to safeguard data for the established lifespan
- deposit data in a trustworthy repository
- provide appropriate metadata
- enable open access
- apply a license to the data about what uses are permitted
- raise awareness to ensure it is visible (citations, publication, provide the dataset identifier, etc)
Monday, June 29, 2015
SIRF: Self-contained Information Retention Format
SIRF: Self-contained Information Retention Format. Sam Fineberg,et al. SNIA Tutorial. 2015. [PDF]
Generating and collecting very large data sets that need to be kept for long periods is a necessity for many organizations, included sciences, archives, commerce. The presentation describes the challenges with keeping data long term with Linear Tape File System (LTFS) technology and a Self-contained Information Retention Format (SIRF). The top external factors driving long-term retention requirements are: Legal risk, compliance regulations, business risk, and security risk.
What does long-term mean? Retention of 20 years or more is required by 70% of the responses in a poll.
When preserving objects, we need to keep all the information to make them fully usable in the future. No single technology will be "usable over the time-spans mandated by current digital preservation needs". LTFS technologies are "good for perhaps 10-20 years".
Generating and collecting very large data sets that need to be kept for long periods is a necessity for many organizations, included sciences, archives, commerce. The presentation describes the challenges with keeping data long term with Linear Tape File System (LTFS) technology and a Self-contained Information Retention Format (SIRF). The top external factors driving long-term retention requirements are: Legal risk, compliance regulations, business risk, and security risk.
What does long-term mean? Retention of 20 years or more is required by 70% of the responses in a poll.
- 100 years: 38.8%
- 50-100 years: 18.3%
- 21-50 years: 31.1%
- 11-20 years: 15.7%
- 7-10 years: 12.3%
- 3-5 years: 1.9%
- Regulatory compliance and legal issues
- Emerging web services and applications
- Many other fixed-content repositories (Scientific data, libraries, movies, music, etc.)
- set of preservation objects and a catalog (logical or physical)
- metadata about the contents and individual objects
- self describing standard catalog information so it can all be maintained
- a "magic object" that identifies the container and version
When preserving objects, we need to keep all the information to make them fully usable in the future. No single technology will be "usable over the time-spans mandated by current digital preservation needs". LTFS technologies are "good for perhaps 10-20 years".
Tuesday, April 28, 2015
Database Preservation Toolkit
Database Preservation Toolkit. Website. April 2015.
The Database Preservation Toolkit uses input and output modules and allows conversion between database formats, including connection to live systems. It allows conversion of live or backed-up databases into preservation formats such as DBML, SIARD, or XML-based formats created for the purpose of database preservation.
This toolkit was part of the RODA project and now has been released as a separate project. The site includes download links and related publications and presentations.
The Database Preservation Toolkit uses input and output modules and allows conversion between database formats, including connection to live systems. It allows conversion of live or backed-up databases into preservation formats such as DBML, SIARD, or XML-based formats created for the purpose of database preservation.
This toolkit was part of the RODA project and now has been released as a separate project. The site includes download links and related publications and presentations.
Saturday, April 18, 2015
Digital Curation and Doctoral Research: Current Practice
Digital Curation and Doctoral Research: Current Practice. Daisy Abbott. International Journal of Digital Curation. 10 February 2015.[PDF]
More doctoral students are engaging in research data creation, processing, use, management, and preservation activities (digital duration) than ever before. Digital curation is an intrinsic part of the skills that students are expected to acquire.
Training in research skills and techniques is the key element in the development of a research student. The integration of digital curation into expected research skills is essential. Doctoral supervisors "should discuss and review research data management annually, addressing issues of the capture, management, integrity, confidentiality, security, selection, preservation and disposal, commercialization, costs, sharing and publication of research data and the production of descriptive metadata to aid discovery and re-use when relevant." Those supervisors may not necessarily have those skills themselves. And there is a gap in the literature about why and how to manage, curate, and preserve digital data as part of a PhD program.
While both doctoral students and supervisors can benefit from traditional resources on the topic, the majority of guidance on digital curation takes the form of online resources and training programs. In a survey,
More doctoral students are engaging in research data creation, processing, use, management, and preservation activities (digital duration) than ever before. Digital curation is an intrinsic part of the skills that students are expected to acquire.
Training in research skills and techniques is the key element in the development of a research student. The integration of digital curation into expected research skills is essential. Doctoral supervisors "should discuss and review research data management annually, addressing issues of the capture, management, integrity, confidentiality, security, selection, preservation and disposal, commercialization, costs, sharing and publication of research data and the production of descriptive metadata to aid discovery and re-use when relevant." Those supervisors may not necessarily have those skills themselves. And there is a gap in the literature about why and how to manage, curate, and preserve digital data as part of a PhD program.
While both doctoral students and supervisors can benefit from traditional resources on the topic, the majority of guidance on digital curation takes the form of online resources and training programs. In a survey,
- over 50% of PhD holders consider long-term preservation to be extremely important.
- under 40% of students consider long-term preservation to be extremely important.
- 90% of doctoral students and supervisors consider digital curation to be moderately to extremely important.
- Yet 74% of respondents stated that they had limited or no skills in digital curation and only 10% stated that they were “fairly skilled” or “expert”.
- Ensure practical digital curation is understood
- Encourage responsibility for digital curation activities in institutional support structures
- Increase the discoverability and availability of digital curation support services
Thursday, April 09, 2015
Digital Preservation: We Know What it Means Today, But What Does Tomorrow Bring?
Digital Preservation: We Know What it Means Today, But What Does Tomorrow Bring?
Celia Jenkins on Randy Kiefer's presentation. UKSGLive. April 3, 2015.
Long-term preservation refers to processes and procedures required to ensure content remains accessible well into the future. Publishers want to be good stewards of their content. Digital Preservation is an "insurance policy" for e-resources. Commercial hosting platforms and aggregators are not preservation archives! They can remove discontinued content from their system, and commercial businesses may disappear, as with Metapress. There are global digital preservation archives, such as CLOCKSS and Portico, and regional archives, such as National Libraries.
The biggest challenges in the future are: formats (especially presentation of content; and what to do with databases, datasets and supplementary materials. "Any format can be preserved, including video. The issue is that of space, cost and presentation (especially if the format is now not in use/supported)." There are legal issues with cloud based preservation systems. There is no legal precedent with a cloud-based preservation system, and no protection with regards to security.
Long-term preservation refers to processes and procedures required to ensure content remains accessible well into the future. Publishers want to be good stewards of their content. Digital Preservation is an "insurance policy" for e-resources. Commercial hosting platforms and aggregators are not preservation archives! They can remove discontinued content from their system, and commercial businesses may disappear, as with Metapress. There are global digital preservation archives, such as CLOCKSS and Portico, and regional archives, such as National Libraries.
The biggest challenges in the future are: formats (especially presentation of content; and what to do with databases, datasets and supplementary materials. "Any format can be preserved, including video. The issue is that of space, cost and presentation (especially if the format is now not in use/supported)." There are legal issues with cloud based preservation systems. There is no legal precedent with a cloud-based preservation system, and no protection with regards to security.
Friday, March 27, 2015
National Institutes of Health: Plan for Increasing Access to Scientific Publications and Digital Scientific Data from NIH Funded Scientific Research
National Institutes of Health: Plan for Increasing Access to Scientific Publications and Digital Scientific Data from NIH Funded Scientific Research. February 2015.
This document describes NIH’s plans to build upon and enhance its longstanding efforts to increase access to scholarly publications and digital data resulting from NIH-funded research. Sections relevant to digital preservation and long term management:
NIH intends to make public access to digital scientific data the standard for all NIH funded research. Following adoption of the final plan, NIH will:
Preservation
Preservation is one of the Public Access Policy’s primary objectives. It wants to ensure that publications and metadata are stored in an archival solution that:
The first principle behind the plan for increasing access to digital scientific data is: The sharing and preservation of data advances science by broadening the value of research data across disciplines and to society at large, protecting the integrity of science by facilitating the validation of results, and increasing the return on investment of scientific research.
Data Management Plans
Data management planning should be an integral part of research planning. NIH wants to ensure that all extramural researchers receiving Federal grants and contracts for scientific research and intramural researchers develop data management plans describing how they will provide for long-term preservation of, and access to, scientific data in digital formats resulting from federally funded research, or explaining why long-term preservation and access cannot be justified. In order to preserve the balance between the relative benefits of long-term preservation and access and the associated cost and administrative burden, NIH will continue to expect researchers to consider the benefits of long-term preservation of data against the costs of maintaining and sharing the data.
NIH will assess whether the appropriate balance has been achieved in data management plans between the relative benefits of long-term preservation and access and the associated cost and administrative burden. It will also develop guidance with the scientific community to decide which data should be prioritized for long-term preservation and access. NIH will also explore and fund innovative tools and services that improve search, archiving, and disseminating of data, while ensuring long-term stewardship and usability.
Assessing Long-Term Preservation Needs
NIH will provide for the preservation of scientific data and outline options for developing and sustaining repositories for scientific data in digital formats. The policies expect long-term preservation of data.
Long-term preservation and sustainability will be included in data management plans and will collaborate with other agencies on how best to develop and sustain repositories for digital scientific data.
This document describes NIH’s plans to build upon and enhance its longstanding efforts to increase access to scholarly publications and digital data resulting from NIH-funded research. Sections relevant to digital preservation and long term management:
NIH intends to make public access to digital scientific data the standard for all NIH funded research. Following adoption of the final plan, NIH will:
- Explore steps to require data sharing.
- Ensure that all NIH-funded researchers prepare data management plans and that the plans are evaluated during peer review.
- Develop additional data management policies to increase public access to designated types of biomedical research data.
- Encourage the use of established public repositories and community-based standards.
- Develop approaches to ensure the discoverability of data sets resulting from NIH-funded research to make them findable, accessible, and citable.
- Promote interoperability and openness of digital scientific data generated or managed by NIH.
- Explore the development of a data commons. NIH will explore the development of a commons, a shared space for basic and clinical research output including data, software, and narrative, that follows the FAIR principles of Find, Access, Interoperate and Reuse.
Preservation
Preservation is one of the Public Access Policy’s primary objectives. It wants to ensure that publications and metadata are stored in an archival solution that:
- provides for long-term preservation and access to the content without charge;
- uses standards, widely available and, to the extent possible, nonproprietary archival formats for text and associated content (e.g., images, video, supporting data);
- provides access for persons with disabilities
The first principle behind the plan for increasing access to digital scientific data is: The sharing and preservation of data advances science by broadening the value of research data across disciplines and to society at large, protecting the integrity of science by facilitating the validation of results, and increasing the return on investment of scientific research.
Data Management Plans
Data management planning should be an integral part of research planning. NIH wants to ensure that all extramural researchers receiving Federal grants and contracts for scientific research and intramural researchers develop data management plans describing how they will provide for long-term preservation of, and access to, scientific data in digital formats resulting from federally funded research, or explaining why long-term preservation and access cannot be justified. In order to preserve the balance between the relative benefits of long-term preservation and access and the associated cost and administrative burden, NIH will continue to expect researchers to consider the benefits of long-term preservation of data against the costs of maintaining and sharing the data.
NIH will assess whether the appropriate balance has been achieved in data management plans between the relative benefits of long-term preservation and access and the associated cost and administrative burden. It will also develop guidance with the scientific community to decide which data should be prioritized for long-term preservation and access. NIH will also explore and fund innovative tools and services that improve search, archiving, and disseminating of data, while ensuring long-term stewardship and usability.
Assessing Long-Term Preservation Needs
NIH will provide for the preservation of scientific data and outline options for developing and sustaining repositories for scientific data in digital formats. The policies expect long-term preservation of data.
Long-term preservation and sustainability will be included in data management plans and will collaborate with other agencies on how best to develop and sustain repositories for digital scientific data.
Subscribe to:
Posts (Atom)