How a Browser Extension Could Shake Up Academic Publishing. Lindsay McKenzie. The Chronicle of Higher Education. April 06, 2017
There are several open-access initiatives. One initiative, called Unpaywall, is a just a browser extension. Unpaywall is an open-source, nonprofit organization "dedicated to improving access to scholarly research". It has created a browser extension to hopefully do one thing really well: instantly deliver legal, open-access, full text as you browse. "When an Unpaywall user lands on the page of a research article, the software scours thousands of institutional repositories, preprint servers, and websites like PubMed Central to see if an open-access copy of the article is available. If it is, users can click a small green tab on the side of the screen to view a PDF." A legally uploaded open-access copy is delivered to users more than half the time.
"It’s the scientists who wrote the articles, it’s the scientists who uploaded them — we’re just doing that very small amount of work to connect what the scientists have done to the readers who need to read the science." Open-access papers have the information but don’t always look like the carefully formatted articles in academic journals. Some users might not feel comfortable citing preprints or open-access versions obtained through Unpaywall, "without the trappings and formatting of traditional paywalled publishing," even if the copy is credible.
This blog contains information related to digital preservation, long term access, digital archiving, digital curation, institutional repositories, and digital or electronic records management. These are my notes on what I have read or been working on. I enjoyed learning about Digital Preservation but have since retired and I am no longer updating the blog.
Showing posts with label scholarly communication. Show all posts
Showing posts with label scholarly communication. Show all posts
Friday, April 07, 2017
Monday, January 30, 2017
Born-digital news preservation in perspective
Born-digital news preservation in perspective. Clifford Lynch. RJI Online. January 26, 2017. [Video and transcript.]
The challenge with news and academic journals: how do you preserve this body of information. The journal community has working on that in a much more systematic way. There is a shared consensus among all players that preserving the record of scholarly journal publication is essential. Nobody wants their scholarship to be ephemeral so you have to tell people a convincing story about how their work will be preserved.
The primary responsibility for the active archive in most cases is the publisher, but there must be some kind of external fallback system so content will survive the failure of the publisher and the publisher’s archive. These are usually collaborative. Libraries have been the printed news archive, but that is changing. There is also a Keepers Registry so you can see how many keepers are preserving a given journal. The larger journals are well covered, but the smaller ones are really at risk, and a lot of these are small open source journals. "So, we need to be very mindful of those kinds of dynamics as we think about what to do about strategies for really handling the digital news at scale."
With the news, there are a few very large players, and a whole lot of other small news outlets of various kinds. Different strategies are needed for the two groups. We need to be very cautious about news boundaries. "Now in many, many cases, the journalism is built on top of and links to underlying evidence which at least in the short term is readily inspectable by anyone clicking on a link." But the links deteriorate and the material goes away and "preserving that evidence is really important." But it is unclear who is or should be preserving this. There are also questions about the news, the provenance, the motives, the accuracy, and these have to be handled in a more serious way.
"most social media is actually observation and testimony. Very little of it is synthesized news. It’s much more of the character of a set of testimonies or photographs or things like that. And collectively it can serve to give important documentation to an event, but often it is incomplete and otherwise problematic. We need to come to some kind of social consensus about how social media fits into the cultural record.
We need to devise some systematic approaches to this because the journalistic organizations really need help; "their archives are genuinely at risk" and in many cases the "long term organizational viability is at risk". We need a public consensus. "We need a recognition that responsible journalism implies a lasting public record of that work." The need for free press is recognized consitutionally. "We cannot, under current law, protect most of this material very effectively without the active collaboration of the content producers." This is too big a job for any single organization, and we don't want a single point of failure.
The challenge with news and academic journals: how do you preserve this body of information. The journal community has working on that in a much more systematic way. There is a shared consensus among all players that preserving the record of scholarly journal publication is essential. Nobody wants their scholarship to be ephemeral so you have to tell people a convincing story about how their work will be preserved.
The primary responsibility for the active archive in most cases is the publisher, but there must be some kind of external fallback system so content will survive the failure of the publisher and the publisher’s archive. These are usually collaborative. Libraries have been the printed news archive, but that is changing. There is also a Keepers Registry so you can see how many keepers are preserving a given journal. The larger journals are well covered, but the smaller ones are really at risk, and a lot of these are small open source journals. "So, we need to be very mindful of those kinds of dynamics as we think about what to do about strategies for really handling the digital news at scale."
With the news, there are a few very large players, and a whole lot of other small news outlets of various kinds. Different strategies are needed for the two groups. We need to be very cautious about news boundaries. "Now in many, many cases, the journalism is built on top of and links to underlying evidence which at least in the short term is readily inspectable by anyone clicking on a link." But the links deteriorate and the material goes away and "preserving that evidence is really important." But it is unclear who is or should be preserving this. There are also questions about the news, the provenance, the motives, the accuracy, and these have to be handled in a more serious way.
"most social media is actually observation and testimony. Very little of it is synthesized news. It’s much more of the character of a set of testimonies or photographs or things like that. And collectively it can serve to give important documentation to an event, but often it is incomplete and otherwise problematic. We need to come to some kind of social consensus about how social media fits into the cultural record.
We need to devise some systematic approaches to this because the journalistic organizations really need help; "their archives are genuinely at risk" and in many cases the "long term organizational viability is at risk". We need a public consensus. "We need a recognition that responsible journalism implies a lasting public record of that work." The need for free press is recognized consitutionally. "We cannot, under current law, protect most of this material very effectively without the active collaboration of the content producers." This is too big a job for any single organization, and we don't want a single point of failure.
Wednesday, October 12, 2016
Q&A with CNI’s Clifford Lynch: Time to re-think the institutional repository?
Q&A with CNI’s Clifford Lynch: Time to re-think the institutional repository? Richard Poynder. Blog: Open and Shut? September 22, 2016.
In 1999, a meeting was held to discuss scholarly archives and repositories and ways in which to make them interoperable and to avoid needlessly replicating each other’s content. This led to the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). One notion was that the individual archives "would be given easy-to-implement mechanisms for making information about what they held in their archives externally available". Open access advocates saw OAI-PMH as a way of aggregating content hosted in local archives, or institutional repositories. This would "encourage universities to create their own repositories and then instruct their researchers to deposit in them copies of all the papers they published in subscription journals."
The interoperability promised by OAI-PMH has not really materialised, and author self-archiving "has remained a minority sport, with researchers reluctant to take on the task of depositing their papers in their institutional repository". Some believe the "IR now faces an existential threat". The interview and additional information are available in a separate PDF. This file looks at whether the IR will survive, be "captured by commercial publishers" or "the research community will finally come together, agree on the appropriate role and purpose of the IR, and then implement a strategic plan that will see repositories filled with the target content."
In 1999, a meeting was held to discuss scholarly archives and repositories and ways in which to make them interoperable and to avoid needlessly replicating each other’s content. This led to the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). One notion was that the individual archives "would be given easy-to-implement mechanisms for making information about what they held in their archives externally available". Open access advocates saw OAI-PMH as a way of aggregating content hosted in local archives, or institutional repositories. This would "encourage universities to create their own repositories and then instruct their researchers to deposit in them copies of all the papers they published in subscription journals."
The interoperability promised by OAI-PMH has not really materialised, and author self-archiving "has remained a minority sport, with researchers reluctant to take on the task of depositing their papers in their institutional repository". Some believe the "IR now faces an existential threat". The interview and additional information are available in a separate PDF. This file looks at whether the IR will survive, be "captured by commercial publishers" or "the research community will finally come together, agree on the appropriate role and purpose of the IR, and then implement a strategic plan that will see repositories filled with the target content."
Tuesday, June 21, 2016
Vienna Principles: A Vision for Scholarly Communication
Vienna Principles: A Vision for Scholarly Communication. Peter Kraker, et al. June 2016.
The twelve principles of Scholarly Communication are:
The twelve principles of Scholarly Communication are:
- Accessibility: be immediately and openly accessible by anyone
- Discoverability: should facilitate search, exploration and discovery.
- Reusability: should enable others to effectively build on top of each other’s work.
- Reproducibility: should provide reproducible research results.
- Transparency: should provide open means for judging the credibility of a research result.
- Understandability: should provide research in an understandable way adjusted to different stakeholders.
- Collaboration: should foster collaboration and participation between researchers and their stakeholders.
- Quality Assurance: should provide transparent and competent review.
- Evaluation: should support fair evaluation.
- Validated Progress: should promote both the production of new knowledge and the validation of existing knowledge.
- Innovation: should embrace the possibilities of new technology.
- Public Good: should expand the knowledge commons.
Wednesday, April 20, 2016
On the Marginal Cost of Scholarly Communication
On the Marginal Cost of Scholarly Communication. Tiffany Bogich, et al. Science.ai by Standard Analytics. 18 April, 2016.
An article that looks at the marginal cost of scholarly communication from the perspective of an agent looking to start an independent, peer-reviewed scholarly journal. It found that vendors can accommodate all of the services required for scholarly communication for between $69 and $318 per article, and with alternate software solutions replacing the vendor services, the marginal cost of scholarly communication would drop to between $1.36 and $1.61 per article, almost all of which is the cost of DOI registration. The development of high quality “plug-and-play” open source software solutions would have a significant impact in reducing the marginal cost of scholarly communication, making it more open to experimentation and innovation. For the cost of long term journal preservation, the article looked at CLOCKSS and Portico.
An article that looks at the marginal cost of scholarly communication from the perspective of an agent looking to start an independent, peer-reviewed scholarly journal. It found that vendors can accommodate all of the services required for scholarly communication for between $69 and $318 per article, and with alternate software solutions replacing the vendor services, the marginal cost of scholarly communication would drop to between $1.36 and $1.61 per article, almost all of which is the cost of DOI registration. The development of high quality “plug-and-play” open source software solutions would have a significant impact in reducing the marginal cost of scholarly communication, making it more open to experimentation and innovation. For the cost of long term journal preservation, the article looked at CLOCKSS and Portico.
Wednesday, May 13, 2015
Robert Darnton closes the book
Robert Darnton closes the book. Corydon Ireland. Harvard Gazette. May 11, 2015.
Article about his retirement. Notes:
Article about his retirement. Notes:
- He and others discussed how to harness the Internet to create a digital library that would “get our cultural heritage available to everyone” for free, leading to the DPLA
- The goal of the free digital library “was a dream of philosophers of the Enlightenment. We can do what Jefferson only dreamed of. We have the Internet, and he only had the printing press.”
- Of digital and print: "Both are complementary means of knowledge dispersal and both are thriving."
- For libraries to prosper requires advancing on two fronts, analog and digital. “We must acquire everything important in all fields of scholarship" along with “electronic outputs of all kinds, partly in cooperation with other libraries.”
- The future of libraries will require “being connected, and cooperating on a very large scale” regarding acquisition, preservation, and storage.
- The library still pumps intellectual energy into every corner of campus.
Thursday, March 26, 2015
Sowing the seed: Incentives and Motivations for Sharing Research Data, a researcher's perspective
Sowing the seed: Incentives and Motivations for Sharing Research Data, a researcher's perspective. Knowledge Exchange. November 2014. PDF.
This study has gathered evidence, examples and opinions on incentives for research data sharing from the researchers’ point of view. Using this study will help provide recommendations on developing policies and best practices for data access, preservation, and re-use. A emerging theme today is to make it possible for all researchers to share data and to change the collective attitude towards sharing.
A DCC project, investigating researchers’ attitudes and approaches towards data deposit,
sharing, reuse, curation and preservation found that the data sharing requirements should be defined at the finer-grained level, such as the research group.When researchers talk about ‘data sharing’ there are different modes of data sharing, such as:
This study has gathered evidence, examples and opinions on incentives for research data sharing from the researchers’ point of view. Using this study will help provide recommendations on developing policies and best practices for data access, preservation, and re-use. A emerging theme today is to make it possible for all researchers to share data and to change the collective attitude towards sharing.
A DCC project, investigating researchers’ attitudes and approaches towards data deposit,
sharing, reuse, curation and preservation found that the data sharing requirements should be defined at the finer-grained level, such as the research group.When researchers talk about ‘data sharing’ there are different modes of data sharing, such as:
- private management sharing,
- collaborative sharing,
- peer exchange,
- sharing for transparent governance,
- community sharing and
- public sharing.
- When data sharing is an essential part of the research process;
- Direct career benefits (greater visibility and recognition of one’s work, reciprocal data)
- As a normal part of their research circle or discipline;
- Existing funder and publisher expectations, policies, infrastructure and data services
- Recognize and value data as part of research assessment and career advancement
- Set preservation standards for data formats, file formats, and documentation
- Develop clear policies on data sharing and preservation
- Provide training and support for researchers and students to manage and share data so it becomes part of standard research practice.
- Make all data related to a published manuscript available
- The Royal Netherlands Academy of Arts and Sciences requests its researchers to digitally preserve research data, ideally via deposit in recognised repositories, to make them openly accessible as much as possible; and to include a data section in every research plan stating how the data produced or collected during the project will be dealt with.
- The Alliance of German Science Organisations adopted principles for the handling of research data, supporting long-term preservation and open access to research data for the benefit of science.
- Research organizations receiving EPSRC funding will from May 2015 be expected to have appropriate policies, processes and infrastructure in place to preserve research data, to publish metadata for their research data holdings, and to provide access to research data securely for 10 years beyond the last data request.
- The European Commission has called for coordinated actions to drive forward open access, long-term preservation and capacity building to promote open science for all EC and national research funding.
- The UK Economic and Social Research Council has mandated the archiving of research data from all funded research projects. This policy goes hand in hand with the funding of supporting data infrastructure and services. The UK Data Service provides the data infrastructure to curate,
- preserve and disseminate research data, and provides training and support to researchers.
Wednesday, March 25, 2015
I tried to use the Internet to do historical research. It was nearly impossible.
I tried to use the Internet to do historical research. It was nearly impossible. Gareth Millward. Washington Post. February 17, 2015.
How do you organize so much information? So far, the Internet Archive has archived more than 430,000,000,000 web pages. It’s a rich and fantastic resource for historians of the near-past. Never before has humanity produced so much data about public and private lives – and never before have we been able to get at it in one place. In the past it was just a theoretical possibility, but now we have the computing power and a deep enough archive to try to use it.
But it’s a lot more difficult to understand than we thought. "The ways in which we attack this archive, then, are not the same as they would be for, say, the Library of Congress. There (and elsewhere), professional archivists have sorted and cataloged the material. We know roughly what the documents are talking about. We also know there are a finite number. And if the archive has chosen to keep them, they’re probably of interest to us. With the internet, we have everything. Nobody has – or can – read through it. And so what is “relevant” is completely in the eye of the beholder."
Historians must take new approaches to the data. No one can read everything, nor know what is even in the archive. Better sampling, specifically chosen for their historical importance, can give us a much better understanding. We need to ask better questions about how sites are constructed, what links exist between sites, and have more focused searches. And we need to know what questions to ask.
How do you organize so much information? So far, the Internet Archive has archived more than 430,000,000,000 web pages. It’s a rich and fantastic resource for historians of the near-past. Never before has humanity produced so much data about public and private lives – and never before have we been able to get at it in one place. In the past it was just a theoretical possibility, but now we have the computing power and a deep enough archive to try to use it.
But it’s a lot more difficult to understand than we thought. "The ways in which we attack this archive, then, are not the same as they would be for, say, the Library of Congress. There (and elsewhere), professional archivists have sorted and cataloged the material. We know roughly what the documents are talking about. We also know there are a finite number. And if the archive has chosen to keep them, they’re probably of interest to us. With the internet, we have everything. Nobody has – or can – read through it. And so what is “relevant” is completely in the eye of the beholder."
Historians must take new approaches to the data. No one can read everything, nor know what is even in the archive. Better sampling, specifically chosen for their historical importance, can give us a much better understanding. We need to ask better questions about how sites are constructed, what links exist between sites, and have more focused searches. And we need to know what questions to ask.
Tuesday, March 03, 2015
Oops! Article preserved, references gone
Oops! Article preserved, references gone. Digital Preservation Seeds. February16, 2015.
A blog post concerning the article Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot. References in academic publications justify the argument. Missing references are a significant problem with the scholarly record because arguments and conclusions cannot be verified. In addition, missing or incomplete resources and information will devalue national and academic collections. The Significance method can be used to determine the value of collections. There is currently no robust solution, but a robustify script can direct broken links to Memento. The missing references problem emphasizes that without proper context, preserved information is incomplete.
Sunday, May 12, 2013
ZENODO. Research. Shared.
ZENODO. Research. Shared. Website. May 12, 2013.
ZENODO is a new open digital repository repository service that enables researchers, scientists, projects and institutions to share and showcase multidisciplinary research results (data and publications) that are not part of existing institutional or subject-based repositories. The repository is created by OpenAIRE and CERN, and supported by the European Commission. It promotes peer-reviewed openly accessible research; all items have a DOI, so they are citable. All formats are allowed. There is a 1GB per file size constraint. Data files are versioned, but records are not. Files may be deposited under closed, open, embargoed or restricted access.
It is named after Zenodotus, the first librarian of the Ancient Library of Alexandria and father of the first recorded use of metadata, a landmark in library history. ZENODO is provided free of charge for educational and informational use.
ZENODO is a new open digital repository repository service that enables researchers, scientists, projects and institutions to share and showcase multidisciplinary research results (data and publications) that are not part of existing institutional or subject-based repositories. The repository is created by OpenAIRE and CERN, and supported by the European Commission. It promotes peer-reviewed openly accessible research; all items have a DOI, so they are citable. All formats are allowed. There is a 1GB per file size constraint. Data files are versioned, but records are not. Files may be deposited under closed, open, embargoed or restricted access.
It is named after Zenodotus, the first librarian of the Ancient Library of Alexandria and father of the first recorded use of metadata, a landmark in library history. ZENODO is provided free of charge for educational and informational use.
Monday, April 22, 2013
Stakeholder Benefits from Research Data Management: new document from Research360 project.
Stakeholder Benefits from Research Data Management: new document from Research360 project. Neil Beagrie, Catherine Pink. University of Bath. 26 Nov 2012.
The Research360 Project has released the summary stakeholder benefits analysis from the Research Data Management business case for the University of Bath. The 4 page document is available for download in PDF format.
Industry and private sector partnerships alongside public sector and voluntary sector partnerships are key elements of many university research programmes. Frequently partners sharing their practice, results data and laboratory methodologies can lead to vital knowledge transfer activities, improved services and products, creation of spin-out companies and further investment in the Higher Education sector. A summary list of stakeholder benefits that can arise from research data management in these collaborations. Benefits are listed for:
The Research360 Project has released the summary stakeholder benefits analysis from the Research Data Management business case for the University of Bath. The 4 page document is available for download in PDF format.
Industry and private sector partnerships alongside public sector and voluntary sector partnerships are key elements of many university research programmes. Frequently partners sharing their practice, results data and laboratory methodologies can lead to vital knowledge transfer activities, improved services and products, creation of spin-out companies and further investment in the Higher Education sector. A summary list of stakeholder benefits that can arise from research data management in these collaborations. Benefits are listed for:
- university community by its key stakeholder groups:
- academic staff and researchers, students, professional services, and the institution
- external partners:
- industry and commerce, public/voluntary sectors, government, and society
- Improve possibility of success in research funding by addressing any concerns around data management.
- Safeguarding your data against potential loss.
- Support in patent issues such as proof of provenance through improved use of version control.
- Enhanced global reputation through recognition of the quality of research outputs and data infrastructure.
- Attract new collaborators and accelerate deepening of existing relationships.
- Graduate employability increased through university partner connections and student data skills.
- Reduction of risk for sensitive data if data transfer is secure.
- Cost efficiencies from shared data services.
Wednesday, March 27, 2013
Supporting the Changing Research Practices of Chemists.
Supporting the Changing Research Practices of Chemists. February 25, 2013. Matthew P. Long, Roger C. Schonfeld. Ithaka S+R. February 26, 2013. [PDF]
This report, intended for those who support chemists, including librarians, is about the latest research methods, practices, and information services needs of academics chemists. Chemists need services to make their lives easier and their research groups more productive; this includes minimizing paperwork and administrative tasks. They value academic libraries primarily for the access that they provide to electronic journals and other online resources. Researchers are often frustrated by an inability to share large amounts of data with a collaborator. Few chemists visit the physical library, but they use the library digital collections heavily.
In the survey, fewer than 10% reported a research consultation with a librarian, asked for help with a data management, or asked for assistance on an issue related to publishing in the past year; they rarely reach out to the library to discuss issues or request support. The main search sites for chemists are Web of Knowledge/Web of Science, SciFinder, and PubMed. It would be helpful to have tools to help process all of this information, a pre-scan of announcements from journals, and organize their materials. Electronic Lab Notebooks (ELNs) make it easy to share, archive, and search through past lab notes, but are at risk in the lab. Labs generally do not have good data management infrastructure or proper external support for developing it, especially in sharing and preserving files.
It is difficult for academic chemists to coordinate the recording and preservation of data after the completion of a project. When data are saved, they are often held in unstable or at-risk formats or in formats where no one else can access or interpret them. Sometimes a large amount of potentially useful data is not shared or preserved in any durable way. One chemist invited the library to come and speak to the department about preservation and access. Chemists have a general lack of awareness of effective data curation and preservation. Data management and preservation is time-consuming and rarely straightforward; it requires expert advice and constant monitoring.
The findings:
This report, intended for those who support chemists, including librarians, is about the latest research methods, practices, and information services needs of academics chemists. Chemists need services to make their lives easier and their research groups more productive; this includes minimizing paperwork and administrative tasks. They value academic libraries primarily for the access that they provide to electronic journals and other online resources. Researchers are often frustrated by an inability to share large amounts of data with a collaborator. Few chemists visit the physical library, but they use the library digital collections heavily.
In the survey, fewer than 10% reported a research consultation with a librarian, asked for help with a data management, or asked for assistance on an issue related to publishing in the past year; they rarely reach out to the library to discuss issues or request support. The main search sites for chemists are Web of Knowledge/Web of Science, SciFinder, and PubMed. It would be helpful to have tools to help process all of this information, a pre-scan of announcements from journals, and organize their materials. Electronic Lab Notebooks (ELNs) make it easy to share, archive, and search through past lab notes, but are at risk in the lab. Labs generally do not have good data management infrastructure or proper external support for developing it, especially in sharing and preserving files.
It is difficult for academic chemists to coordinate the recording and preservation of data after the completion of a project. When data are saved, they are often held in unstable or at-risk formats or in formats where no one else can access or interpret them. Sometimes a large amount of potentially useful data is not shared or preserved in any durable way. One chemist invited the library to come and speak to the department about preservation and access. Chemists have a general lack of awareness of effective data curation and preservation. Data management and preservation is time-consuming and rarely straightforward; it requires expert advice and constant monitoring.
The findings:
- Chemists need better support in data management, sharing and preservation.
- Many researchers remain anxious about keeping up with the newest literature.
- They need new tools to stay aware of new research and also serendipitous discovery.
- Chemists require greater support in disseminating their research, including articles, data, and other materials.
We see some real potential for the academic library to stretch the definition of the services it offers to the academic chemist. The library may also have a role in working with other service providers and ensuring that academics are aware of the latest research tools. It is clear from this project that libraries must think strategically about whether and how to invest in services for chemists.
Saturday, March 23, 2013
Adding Value to Electronic Theses and Dissertations in Institutional Repositories
Adding Value to Electronic Theses and Dissertations in Institutional Repositories. Joachim Schöpfel. D-Lib Magazine. March/April 2013.
This paper looks at the differences with institutional repositories that contain electronic theses and dissertations (ETDs, particularly regarding metadata, policy, access restrictions, representativeness, file format, status, quality and related services. The intent is to improve the "quality of content and service provision in an open environment, in order to increase impact, traffic and usage". This paper shows five ways in which institutions can add value to the deposit and dissemination of electronic theses and dissertation:
This paper looks at the differences with institutional repositories that contain electronic theses and dissertations (ETDs, particularly regarding metadata, policy, access restrictions, representativeness, file format, status, quality and related services. The intent is to improve the "quality of content and service provision in an open environment, in order to increase impact, traffic and usage". This paper shows five ways in which institutions can add value to the deposit and dissemination of electronic theses and dissertation:
- Quality of content. A good IR not only defines a set of standards and criteria for the selection and validation of deposits but also communicates and promotes this editorial policy.
- Metadata. The description of the content and context of the ETD files will make a difference.
- Format. The IR should contain full text, offer different file formats, and have deposit formats are searchable, open, and appropriate for long-term preservation and use of the content.
- Repositories should network and interconnect.
- Provide needed services beyond basic searching, viewing, and downloading. Some possibilities are discussion forums, usage statistics and metrics, citations, Print On Demand in book format, copyright protection or Creative Commons licensing, and preservation.
Monday, February 11, 2013
Sustaining Our Digital Future Institutional Strategies for Digital Content
Sustaining Our Digital Future: Institutional Strategies for Digital Content. Nancy L Maron, Jason Yun and Sarah Pickle. Strategic Content Alliance. January 29, 2013. (PDF, 91 pp.)
The shift with digital media in scholarly communications is transformative; data sets, dynamic digital resources, websites, digital collections, crowd sourced or born digital content: there are challenges and opportunities, along with questions about who is responsible for maintaining them, and how to maximize the value of the content.
Some findings:
Sustainability and Use:
The shift with digital media in scholarly communications is transformative; data sets, dynamic digital resources, websites, digital collections, crowd sourced or born digital content: there are challenges and opportunities, along with questions about who is responsible for maintaining them, and how to maximize the value of the content.
Some findings:
- project have received support from the host institution, but few have plans for ongoing support.
- There are potential partners on campus, but project leaders do not seek them early
enough when critical decisions are being made. - Digital projects across campuses may be hosted by many groups, which poses challenges for discovery. There is often no single place for users to find digital projects and some projects can too easily slip from view.
- Current funding styles do not support ongoing operation
- Campus-wide solutions are beginning to emerge, but even these tend to address just the basic “maintenance” issues of storage, preservation and access.
- Focus is often on creating new content, with little thought about ongoing efforts to enhance the content or update user interfaces.
- Perform an early and honest appraisal to find which projects are likely to require support after completion:
- Digital content requiring just “maintenance”: plan that the content will be deposited and integrated into some other site, database, or repository.
- Digital resources requiring ongoing growth and investment: These require early sustainability planning, including identifying institutional or other partners and careful consideration of the full range of costs and activities needed to keep the resource vibrant.
- Be realistic in assessing the future needs of the resource at its outset and in continuing support.
- Identify campus partners early on.
- Consider how central your project is to the overall mission of the institution.
- Consider if projects could be drawn together to create a deeper network of support, both for “maintenance” projects and those with the potential to really grow.
- Develop ways to help users find decentralized content and to reach out to content users. These could start as an inventory of all of the digital holdings or common catalogs.
- Determine where scale solutions pay off, where experts are best placed to champion a project, or create common storage, usage and preservation systems for an organization.
- Continue to identify and support ongoing development of the “front-end”, including user needs, interface development, and content enhancement. Pay attention to the changing needs of users and determine what enhancements the digital resource will require.
Sustainability and Use:
- Research data platforms: At some institutions major initiatives are underway to develop research data platforms. The goal of the platform is not just preservation and storage but access and reuse. The first step is to have a platform. From there they can test and refine the service for researchers depositing data sets, library curated collections, and university departments.
- "A coherent digital policy from early review, guidelines on costings and deposit standards, to forecasting what ongoing activities will be needed and who will carry them out, would ideally remove much of the risk of “digital time bombs” while obliging both project leaders and university leaders to take a moment to envision the ongoing impact they want these resources to have, and how to best achieve that."
- Unlike universities (who often play the role of reluctant, passive, or simply unaware host, to a great deal of digital content created by their scholars) museums and libraries tend to be the ones initiating this work and are eager to build and maintain these collections.
- Despite the benefits of centralisation, the mere presence of a catalogue and centralised
repository does not ensure greater usage of or engagement with its holdings.
- Many institutions devote considerable attention to the upfront creation of content, but not nearly as much to its ongoing enhancement or reuse, resulting in collections that are certainly present in the main catalogue, but otherwise exist only as capsules of content, frozen in time.
- Once the project is finished, management of the digital resource is not always clear.
Monday, August 13, 2012
The Problem of Data
The Problem of Data. Lori Jahnke, Andrew Asherpub, Spencer D. C. Keralis. CLIR Report. Council on Library and Information Resources. August 12, 2012.
Excellent report on data storage, use, and curation. A section contains a snapshot of the current digital data curation education landscape. Below are some long notes and excerpts from the PDF article:
Key Findings
Excellent report on data storage, use, and curation. A section contains a snapshot of the current digital data curation education landscape. Below are some long notes and excerpts from the PDF article:
Key Findings
- None of the researchers interviewed for this study have received formal training in data management practices, nor do they expresssatisfaction with their level of expertise.
- Few researchers, especially among those who are early in their career, think about long-term preservation of their data.
- The demands of publication output overwhelm long-term considerations of data curation. Metadata and documentation are of interest only if they help a researcher complete his or her work.
- There is a great need for more effective collaboration tools, as well as online spaces that support the volume of data generated and provide appropriate privacy and access controls.
- Few researchers are aware of the data services that the library might be able to provide and seem to regard the library as a dispensary of goods (e.g., books, articles) rather than a place for research/professional support.
- There is unlikely to be a single out-of-the-box solution that can be applied to the problem of data curation. Instead, an approach is needed that emphasizes working with researchers to identify or build appropriate tools.
- Researchers must have access to adequate networked storage.
- Universities should revise access policies to support multi - institutional research projects.
- Programs should begin early in the researcher career path for the greatest long-term benefit.
- Data curation systems should be integrated with the active research phase (i.e., as a backup, etc).
- Privacy and data access control tools should be developed to manage confidential data. Policies must be developed that support researchers in using these technologies.
- Data curation, a term generally defined as a set of activities that includes the preserving, maintaining, archiving, and depositing of data to keep it secure, intact, and accessible for reuse.
- Many researchers expressed concerns surrounding the ethical reuse of research data. Additional work is needed to establish best practices in this area, particularly for qualitative data sets.
- Most participants reported feeling adrift when establishing protocols for managing their data and added that they lacked the resources to determine best practices, let alone to implement them. Almost none of the scholars reported that data curation training was part of their graduate curriculum.
- Perhaps one of the more complicated issues for data curation is the complex life cycle of research data and projects. Data collection may occur throughout the project and change from before it is completed.
- Scholars may collect data on a phenomenon unrelated to their current project with no clear idea of the potential usefulness of those data. Such data might be integrated with a later project, given away to an interested colleague, or never used at all.
- It would be helpful to have a way to collect data into a collection space that could be used throughout the project.
- The researchers held contradictory views about the value of their data. Some wanted to associate their data with publications or to have it available for use in the classroom
- Few of the researchers thought about long-term preservation of their data, especially those who were early in their career.
- The academic system offers little or no career reward for preserving one’s data.
- Data preservation strategies must take into account varied, proprietary, and non-standard data formats, and provide a real-time benefit for the scholar in meeting research goals.
- Given the lack of infrastructure for sharing and storing data, the social sciences may face similar problems of data loss in documenting social phenomena as researchers begin to work within larger collaborative groups and with larger data sets. Data stored on personal media devices are especially vulnerable to this type of loss, as few scholars have the skills necessary to maintain data over time and across hardware and software platforms. Several of the scholars interviewed reported storing data on legacy systems that may become inaccessible
- University policies that appropriately address the ethical considerations relating to data sharing and preservation would benefit researchers, administrators, and technologists alike.
- Researchers hold tremendous amounts of data on personal computers and hard drives, many of which are not backed up adequately. Among the participants, the research data ranged from under 1 GB to multiple terabytes. Data types included various formats of images, video, audio files, data sets, documents, etc.
- Managing large files presents significant challenges for researchers in that university infrastructures typically do not provide adequate storage space or sufficient bandwidth for data access. The data may be lost when researchers upgrade their computers or software. Few researchers put more than minimal effort into organizing non-active data or ensuring its continued compatibility with new software or hardware.
- There is a clear need for libraries to move beyond passively providing technology to embrace the changes in scholarly production that emerging technologies have brought.
- The data preservation step must be fully integrated into a scholar’s research workflow. Not only are necessary metadata and other materials much more easily captured while research is in progress, but also there is a real opportunity to streamline research workflows and to provide much needed support. Scholars need help with the technical aspects of managing and preserving data, as well as with basic curation issues (e.g., what to keep and what to delete), and the ethical implications of sharing their data (e.g., what is an appropriate latency period for the data and how does one balance the need to provide meaningful access with the risk of inadvertently exposing confidential participant information).
- Although some researchers acknowledge that their data could be useful to other researchers, there is little incentive to invest time in archiving or repackaging data sets.
- Extensive outreach to scholars is necessary to build the relationships that will facilitate data preservation. This is likely to be a slow process initially. Researchers are unlikely to engage with those they do not view as peers.
- Researchers need additional tools to manage preserved data on their own, and they would benefit from access to professionals who can offer advice on management strategies.
- Researchers typically align themselves with their disciplines rather than with their institutions; therefore, support models that extend beyond the university are likely to be especially beneficial.
- Reaching the level of collaboration among universities and the technical interoperability required to capture and preserve a career’s worth of data in the current environment is a challenge.
- Current data management systems must be fundamentally improved so that they can meet the capacity demand for secure storage and transmission of research data. Integrating the data preservation system with the active research cycle is essential to encourage researcher investment.
- Researchers are not well positioned to meet the technical and policy challenges without the coordinated support of libraries, information technology units, and professionals who possess both technical and research expertise.
- One example concerning the PETRA e+e collider project in Hamburg, Germany; In the more than 25 years since, theoretical insights and computing advancements have made the data valuable once again. However, much of the data have been irrevocably lost to corrupt storage media, lost computer code, and deactivated personal accounts. These early particle physics experiments are unique, as modern colliders operate at higher energy levels and cannot replicate the particle interactions.
Sunday, April 29, 2012
Web Archives for Researchers: Representations, Expectations and Potential Uses.
Web Archives for Researchers: Representations, Expectations and Potential Uses. Peter Stirling, et al. D-Lib Magazine. March/April 2012.
Web archiving is one of the missions of the Bibliothèque nationale de France. This study looks at content and selection policy, services and promotion, and the role of communities and cooperation. While the interest of maintaining the "memory" of the web is obvious to the researchers, they are faced with the difficulty of defining, in what is a seemingly limitless space, meaningful collections of documents. Cultural heritage institutions such as national libraries are perceived as trusted third parties capable of creating rationally-constructed and well-documented collections, but such archives raise certain ethical and methodological questions.
To find source material on the web, some researchers look for non-traditional sources, such as blogs and social networks. Researchers recognize the value of web archives, especially because websites disappear or change quickly. The Internet is no longer just a place for publishing things, “but rather the traces left by actions that people could equally perform in the streets or in a shop: talking to people, walking, buying things... It can seem improper to some to archive anything relating to this kind of individual activity. On the other hand, one of the researchers acknowledges that archiving this material would provide a rich source for research in the future, and thus compares archiving it to archaeology.” Some ask, "How do you archive the flow of time?" New models may be needed. And when selecting an archive, the selection criteria should also be archived, as they may change over time.
Web archiving is one of the missions of the Bibliothèque nationale de France. This study looks at content and selection policy, services and promotion, and the role of communities and cooperation. While the interest of maintaining the "memory" of the web is obvious to the researchers, they are faced with the difficulty of defining, in what is a seemingly limitless space, meaningful collections of documents. Cultural heritage institutions such as national libraries are perceived as trusted third parties capable of creating rationally-constructed and well-documented collections, but such archives raise certain ethical and methodological questions.
To find source material on the web, some researchers look for non-traditional sources, such as blogs and social networks. Researchers recognize the value of web archives, especially because websites disappear or change quickly. The Internet is no longer just a place for publishing things, “but rather the traces left by actions that people could equally perform in the streets or in a shop: talking to people, walking, buying things... It can seem improper to some to archive anything relating to this kind of individual activity. On the other hand, one of the researchers acknowledges that archiving this material would provide a rich source for research in the future, and thus compares archiving it to archaeology.” Some ask, "How do you archive the flow of time?" New models may be needed. And when selecting an archive, the selection criteria should also be archived, as they may change over time.
Thursday, December 08, 2011
Why don't we already have an Integrated Framework for the Publication and Preservation of all Data Products?
Why don't we already have an Integrated Framework for the Publication and Preservation of all Data Products? Alberto Accomazzi,et al. Astronomical Data Analysis Software and Systems.
7 Dec 2011.
Astronomy has long had a working network of archives supporting the curation of publications and data. There are examples of websites giving access to data sets, but they are sometimes short lived. "We can only realistically take implicit promises of long-term data archival as what they are: well-intentioned plans which are contingent on a number of factors, some of which are out of our control." We should take steps to ensure that our system of archiving, sharing and linking resources is as resilient as it can be. Some ideas are:
7 Dec 2011.
Astronomy has long had a working network of archives supporting the curation of publications and data. There are examples of websites giving access to data sets, but they are sometimes short lived. "We can only realistically take implicit promises of long-term data archival as what they are: well-intentioned plans which are contingent on a number of factors, some of which are out of our control." We should take steps to ensure that our system of archiving, sharing and linking resources is as resilient as it can be. Some ideas are:
- future-proof the naming system: assign persistent data IDs to items we want to preserve
- provide the ability to cite complete datasets, just as we can cite websites
- include a data reference section in academic papers
Saturday, October 22, 2011
Cite Datasets and Link to Publications
Cite Datasets and Link to Publications. Digital Curation Centre. 18 October 2011.
The DCC has published a guide to help authors / researchers create links between their academic publications and the underlying datasets. It is important for those reading the publication to be able to locate the dataset. This recognizes that data generated during research are just as valuable to the ongoing academic discourse as papers and monographs, and in many cases the data needs to be shared. "Ultimately, bibliographic links between datasets and papers are a necessary step if the culture of the scientific and research community as a whole is to shift towards data sharing, increasing the rapidity and transparency with which science advances."
This guide has identified a set of requirements for dataset citations and any services set up to support them. Citations must be able to uniquely identify the object cited, identify the whole dataset and subsets as well. The citation must be able to be used by people and software tools alike. There are a number of elements needed, but the "most important of these elements – the ones that should be present in any citation – are the author, the title and date, and the location. These give due credit, allow the reader to judge the relevance of the data, and permit access the data, respectively." A persistent url is needed, and there are several types that can be used.
The DCC has published a guide to help authors / researchers create links between their academic publications and the underlying datasets. It is important for those reading the publication to be able to locate the dataset. This recognizes that data generated during research are just as valuable to the ongoing academic discourse as papers and monographs, and in many cases the data needs to be shared. "Ultimately, bibliographic links between datasets and papers are a necessary step if the culture of the scientific and research community as a whole is to shift towards data sharing, increasing the rapidity and transparency with which science advances."
This guide has identified a set of requirements for dataset citations and any services set up to support them. Citations must be able to uniquely identify the object cited, identify the whole dataset and subsets as well. The citation must be able to be used by people and software tools alike. There are a number of elements needed, but the "most important of these elements – the ones that should be present in any citation – are the author, the title and date, and the location. These give due credit, allow the reader to judge the relevance of the data, and permit access the data, respectively." A persistent url is needed, and there are several types that can be used.
Friday, October 07, 2011
More (digital) wake-up calls for academic libraries
More (digital) wake-up calls for academic libraries. Rick Luce. LIBER 2011. Duurzame toegang blog. June 2011.
The topic was the core business of academic libraries: serving researchers and the scientific research process. There are many changes taking place in the sciences: "zetabytes of data; dynamic, complex data objects that require management; communities and data flows becoming much more important than static library collections, etc." The warning to academic libraries was that if libraries do not develop those services the new researcher needs, someone else will, and then there is no future for the research library. We need a "fundamental transformation process that will affect every aspect of the ‘library’ business." The library needs to provide a repository between the scientific process and IT infrastructure that supports and preserves workflows.
The topic was the core business of academic libraries: serving researchers and the scientific research process. There are many changes taking place in the sciences: "zetabytes of data; dynamic, complex data objects that require management; communities and data flows becoming much more important than static library collections, etc." The warning to academic libraries was that if libraries do not develop those services the new researcher needs, someone else will, and then there is no future for the research library. We need a "fundamental transformation process that will affect every aspect of the ‘library’ business." The library needs to provide a repository between the scientific process and IT infrastructure that supports and preserves workflows.
Thursday, September 08, 2011
Research Archive Widens Its Public Access—a Bit
Research Archive Widens Its Public Access—a Bit. Editorial. Technology Review. 7 September 2011.
JStor, an organization which maintains link to 1,400 journals for subscribing institutions, is providing free public access to articles published prior to 1923 in the United States or before 1870 in other countries, about 6 percent of its content. In a letter to publishers and libraries, JStor refers to plans for "further access to individuals in the future."
JStor, an organization which maintains link to 1,400 journals for subscribing institutions, is providing free public access to articles published prior to 1923 in the United States or before 1870 in other countries, about 6 percent of its content. In a letter to publishers and libraries, JStor refers to plans for "further access to individuals in the future."
Subscribe to:
Posts (Atom)