Showing posts with label repositories. Show all posts
Showing posts with label repositories. Show all posts

Wednesday, March 15, 2017

Developing a Digital Preservation Infrastructure at Georgetown University Library

Developing a Digital Preservation Infrastructure at Georgetown University Library. Joe Carrano, Mike Ashenfelder. The Signal. March 13, 2017.
     At the library of Georgetown University, half of the library IT department is focused on digital services such as digital publishing, digitization and digital preservation. These IT and library functions overlap and support each other, which creates a need for the librarians, archivists and IT to work together. It provides better communication and makes it easier to get things done. "Often it is invaluable to have people with a depth of knowledge from many different areas working together in the same department. For instance, it’s nice to have people around that really understand computer hardware when you’re trying to transfer data off of obsolete media." 

While digital preservation and IT is centered in one department, the preservation files are in different systems and on different storage mediums throughout the library, but they are in the process of  putting them into APTrust.  Several strategies to improve their digital preservation management are:
  1. Implement preservation infrastructure, including a digital-preservation repository
  2. Develop and document digital-preservation workflows and procedures
  3. Develop a training program and documentation to help build skills for staff
  4. Explore and expand collaborations with both university and external partners to increase the library’s involvement in regional and national digital-preservation strategies.
These goals build upon each other to create a sustainable digital-preservation framework which includes APTrust and the creation of tools to manage and upload the content, particularly creating  custom automated solutions to fit their needs. They are also developing documentation and workflows so any staff member can "upload materials into APTrust without much training".

Librarians and archivists need to be trained and integrated into the process to ensure the sustainability of the project’s outcome and to speed up the ingest rate. "Digital curation and preservation tasks are becoming more and more commonplace and we believe that these skills need to be dispersed throughout our institution rather than performed by only a few people". 

"By the end of this process we hope to have all our preservation copies transferred and the infrastructure in place to keep digital preservation sustainable at Georgetown."

Friday, November 20, 2015

Hydra: Get a head on your repository

Hydra: Get a head on your repository.  Hydra Project website. November 2015.
  • Hydra is a Repository Solution:  Hydra is an open source software repository solution used by institutions worldwide to provide access to their digital content.  Hydra software provides a versatile and feature rich environment for end-users and repository administrators.
  • Hydra is a Community: Hydra is a large, multi-institutional collaboration that gives institutions the ability to combine their repository development efforts into a collective solution beyond the capacity of any individual institution to create, maintain or enhance on its own. The project motto is “if you want to go fast, go alone.  If you want to go far, go together.”
  • Hydra is a Technical Framework: Hydra is an ecosystem of components that lets institutions build and deploy robust and durable digital repositories supporting multiple “heads”, which are fully-featured digital asset management applications and tailored workflows.  Its principal platforms are the Fedora Commons repository software, Solr, Ruby on Rails and Blacklight.  Hydra does not yet support “out-of-the-box” deployments but the Community is working towards such “solution bundles”, particularly “Hydra in a Box” and Avalon.

Thursday, March 26, 2015

Sowing the seed: Incentives and Motivations for Sharing Research Data, a researcher's perspective

Sowing the seed: Incentives and Motivations for Sharing Research Data, a researcher's perspective. Knowledge Exchange. November 2014. PDF.
This study has gathered evidence, examples and opinions on incentives for research data sharing from the researchers’ point of view. Using this study will help provide recommendations on developing policies and best practices for data access, preservation, and re-use. A emerging theme today is to make it possible for all researchers to share data and to change the collective attitude towards sharing.

A DCC project, investigating researchers’ attitudes and approaches towards data deposit,
sharing, reuse, curation and preservation found that the data sharing requirements should be defined at the finer-grained level, such as the research group.When researchers talk about ‘data sharing’ there are different modes of data sharing, such as:
  1. private management sharing, 
  2. collaborative sharing, 
  3. peer exchange, 
  4. sharing for transparent governance, 
  5. community sharing and 
  6. public sharing.
Important motivations for researchers to share research data are:
  1. When data sharing is an essential part of the research process; 
  2. Direct career benefits (greater visibility and recognition of one’s work, reciprocal data)
  3. As a normal part of their research circle or discipline;
  4. Existing funder and publisher expectations, policies, infrastructure and data services
Some points on preservation of research information for research institution and research funders:
  • Recognize and value data as part of research assessment and career advancement
  • Set preservation standards for data formats, file formats, and documentation
  • Develop clear policies on data sharing and preservation 
  • Provide training and support for researchers and students to manage and share data so it becomes part of standard research practice.
  • Make all data related to a published manuscript available
Actions of some organizations regarding data management and preservation:
  • The Royal Netherlands Academy of Arts and Sciences requests its researchers to digitally preserve research data, ideally via deposit in recognised repositories, to make them openly accessible as much as possible; and to include a data section in every research plan stating how the data produced or collected during the project will be dealt with.
  • The Alliance of German Science Organisations adopted principles for the handling of research data, supporting long-term preservation and open access to research data for the benefit of science.
  • Research organizations receiving EPSRC funding will from May 2015 be expected to have appropriate policies, processes and infrastructure in place to preserve research data, to publish metadata for their research data holdings, and to provide access to research data securely for 10 years beyond the last data request.
  • The European Commission has called  for coordinated actions to drive forward open access, long-term preservation and capacity building to promote open science for all EC and national research funding.
  • The UK Economic and Social Research Council has mandated the archiving of research data from all funded research projects. This policy goes hand in hand with the funding of supporting data infrastructure and services. The UK Data Service provides the data infrastructure to curate,
  • preserve and disseminate research data, and provides training and support to researchers.

Saturday, February 07, 2015

Digital Preservation Coalition publishes ‘OAIS Introductory Guide (2nd Edition)’ Technology Watch Report

Digital Preservation Coalition publishes ‘OAIS Introductory Guide (2nd Edition)’ Technology Watch Report. Brian Lavoie.  Digital Preservation Coalition. Watch Report. October, 2014. [PDF]

The report describes the OAIS, its core principles and functional elements, as well as the information model which support long-term preservation, access and understandability of data. The OAIS reference model was approved in 2002 and revised and updated in 2012. Perhaps “the most important achievement of the OAIS is that it has become almost universally accepted as the lingua franca of digital preservation”.

The central concept in the reference model is that of an open archival information system. An OAIS-type archive must meet a set of six minimum responsibilities to do with the ingest, preservation, and dissemination of archived materials: Ingest, Archival Storage, Data Management, Preservation Planning, Access, and Administration. There are also Common Services, which consist of basic computing and networking resources.

An OAIS-type archive references three types of entities: Management, Producer, and Consumer, which includes the Designated Community: consumers expected to independently understand the archived information in the form in which it is preserved and made available by the OAIS. This is a  framework to encourage dialogue and collaboration among participants in standards-building activities, as well as identifying areas most likely to benefit from standards development.

An OAIS-type archive is expected to:
  • Negotiate for and accept appropriate information from information producers;
  • Obtain sufficient control of the information in order to meet long-term preservation objectives;
  • Determine the scope of the archive’s user community;
  • Ensure the preserved information is independently understandable to the user community
  • Follow documented policies and procedures to ensure the information is preserved against all reasonable contingencies
  • Make the preserved information available to the user community, and enable dissemination of authenticated
An OAIS should be committed to making the contents of its archival store available to its intended user community, through access mechanisms and services which support users’ needs and requirements. Such requirements may include preferred medium, access channels, and any access restrictions should be clearly documented.

 The OAIS information model is built around the concept of an information package, which includes: the Submission Information Package, the Archival Information Package, and the Dissemination Information Package. Preservation requires metadata to support and document the OAIS’s preservation processes, called Preservation Description Information, which ‘is specifically focused on describing the past and present states of the Content Information, ensuring that it is uniquely identifiable, and ensuring it has not been unknowingly altered’. The information consists of:
  • Reference Information (identifiers)
  • Context Information (describes relationships among information and objects)
  • Provenance Information (history of the content over time)
  • Fixity Information (verifying authenticity)
  • Access Rights Information (conditions or restrictions)
OAIS is a model and not an implementation. It does not address system architectures, storage or processing technologies, database design, computing platforms, or other technical details of setting up a functioning archival system. But it has been used as a foundation or starting point. Efforts, such as TRAC, have been made to put the attributes of a trusted digital archive into a ‘checklist’ that could be used to support a certification process. PREMIS is a preservation metadata initiative that has emerged as the de facto standard. METS, and XML based  document form, has become widely used for encoding OAIS archival information packages.

The ‘OAIS reference model provides a solid theoretical basis for digital preservation efforts, though theory and practice can sometimes have an uneasy fit.’




Friday, January 23, 2015

The Dataverse Network

The Dataverse Network. Harvard Dataverse Network. 2014.
The Dataverse Network is an open source application to publish, share, reference, extract and analyze research data. It facilitates making data available to others and to replicate work of other researchers. The network hosts multiple studies or collections of studies, and each study contains cataloging information that describes the data plus the actual data and complementary files.

The Dataverse Network project develops software, protocols, and community connections for creating research data repositories that automate professional archival practices, guarantee long term preservation, and enable researchers to share, retain control of, and receive web visibility and formal academic citations for their data contributions.

Tuesday, December 16, 2014

A picture is worth a thousand (coherent) words: building a natural description of images

A picture is worth a thousand (coherent) words: building a natural description of images. Google Research Blob.
Google has developed a machine-learning system that can automatically produce captions to accurately describe images the first time it sees them. It can describe a complex scene which requires a deeper representation of what’s going on in the scene, capturing how the various objects relate to one another and translating it all into natural-sounding language. The full paper "Show and Tell: A Neural Image Caption Generator" is here.

Thursday, October 30, 2014

Digital Preservation Network (DPN) Launches Member Content Pilot

Digital Preservation Network (DPN) Launches Member Content Pilot. Carol Minton Morris. Duraspace.org. 2014-10-29. 
DPN has launched a Member Content Pilot program as a step toward establishing an operational, long-term preservation system. The pilot is testing real-world interactions between DPN members through DPN “nodes” that ingest data from DPN members and package it for preservation storage. Chronopolis/Duracloud, The Texas Preservation Node, and the Stanford Digital Repository will be functioning as First Nodes. APTrust and HathiTrust, in addition to the above three, will be providing replication services for the pilot data.

Wednesday, March 27, 2013

Supporting the Changing Research Practices of Chemists.

Supporting the Changing Research Practices of Chemists.  February 25, 2013. Matthew P. Long, Roger C. Schonfeld. Ithaka S+R. February 26, 2013. [PDF]

This report, intended for those who support chemists, including librarians, is about the latest research methods, practices, and information services needs of academics chemists. Chemists need services to make their lives easier and their research groups more productive; this includes minimizing paperwork and administrative tasks. They value academic libraries primarily for the access that they provide to electronic journals and other online resources. Researchers are often frustrated by an inability to share large amounts of data with a collaborator. Few chemists visit the physical library, but they use the library digital collections heavily.

In the survey, fewer than 10% reported a research consultation with a librarian, asked for help with a data management, or asked for assistance on an issue related to publishing in the past year; they rarely reach out to the library to discuss issues or request support. The main search sites for chemists are Web of Knowledge/Web of  Science, SciFinder, and PubMed. It would be helpful to have tools to help process all of this information,  a pre-scan of announcements from journals, and organize their materials. Electronic Lab Notebooks (ELNs) make it easy to share, archive, and search through past lab notes, but are at risk in the lab. Labs generally do not have good data management infrastructure or proper external support for developing it, especially in sharing and preserving files.

It is difficult for academic chemists to coordinate the recording and preservation of data after the completion of a project. When data are saved, they are often held in unstable or at-risk formats  or in formats where no one else can access or interpret them. Sometimes a large amount of potentially useful data is not shared or preserved in any durable way. One chemist invited the library to come and speak to the department about preservation and access. Chemists have a general lack of awareness of  effective data curation and preservation. Data management and preservation is time-consuming and rarely straightforward; it requires expert advice and constant monitoring.

The findings:
  1. Chemists need better support in data management, sharing and preservation.
  2.  Many researchers remain anxious about keeping up with the newest literature.
  3. They need new tools to stay aware of new research and also serendipitous discovery.
  4. Chemists  require greater support in disseminating their research, including articles, data, and other materials.
Other areas of concern for academic chemists : laboratory management, gaining access to industrial funding, and teaching support.
We see some real potential for the academic library to stretch the definition of the services it offers to the academic chemist. The library may also have a role in working with other service providers and ensuring that academics are aware of the latest research tools. It is clear from this project that libraries must think strategically about whether and how to invest in services for chemists.
 

Saturday, March 23, 2013

Adding Value to Electronic Theses and Dissertations in Institutional Repositories

Adding Value to Electronic Theses and Dissertations in Institutional Repositories. Joachim Schöpfel. D-Lib Magazine. March/April 2013.

This paper looks at the differences with institutional repositories that contain electronic theses and dissertations (ETDs, particularly regarding metadata, policy, access restrictions, representativeness, file format, status, quality and related services. The intent is to improve the  "quality of content and service provision in an open environment, in order to increase impact, traffic and usage". This paper shows five ways in which institutions can add value to the deposit and dissemination of electronic theses and dissertation:
  1. Quality of content. A good IR not only defines a set of standards and criteria for the selection and validation of deposits but also communicates and promotes this editorial policy.
  2. Metadata. The description of the content and context of the ETD files will make a difference. 
  3. Format. The IR should contain full text, offer different file formats, and have deposit formats are searchable, open, and appropriate for long-term preservation and use of the content.
  4. Repositories should network and interconnect.
  5. Provide needed services beyond basic searching, viewing, and downloading. Some possibilities are discussion forums, usage statistics and metrics, citations, Print On Demand in book format, copyright protection or Creative Commons licensing, and preservation. 
 Institutional Repositories must also be future-oriented and anticipate future transformation of scientific communication. "It is crucial for the success of a repository that the institution clearly defines its objectives in line with its scientific strategy and environment. "

Monday, February 11, 2013

Sustaining Our Digital Future Institutional Strategies for Digital Content

Sustaining Our Digital Future: Institutional Strategies for Digital Content. Nancy L Maron, Jason Yun and Sarah Pickle. Strategic Content Alliance. January 29, 2013. (PDF, 91 pp.)
The shift with digital media in scholarly communications is transformative; data sets, dynamic digital resources, websites, digital collections,  crowd sourced or born digital content: there are challenges and opportunities, along with questions about who is responsible for maintaining them, and how to maximize the value of the content.

Some findings:
  • project have received support from the host institution, but few have plans for ongoing support.
  • There are potential partners on campus, but project leaders do not seek them early
    enough when critical decisions are being made.
  • Digital projects across campuses may be hosted by many groups, which poses challenges for discovery. There is often no single place for users to find digital projects and some projects can too easily slip from view.
  • Current funding styles do not support ongoing operation
  • „Campus-wide solutions are beginning to emerge, but even these tend to address just the basic “maintenance” issues of storage, preservation and access.
  • „Focus is often on creating new content, with little thought about ongoing efforts to enhance the content or update user interfaces.
Recommendations:
  •  Perform an early and honest appraisal to find which projects are likely to require support after completion:
    1. Digital content requiring just “maintenance”: plan that the content will be deposited and integrated into some other site, database, or repository.
    2. „Digital resources requiring ongoing growth and investment: These require early sustainability planning, including identifying institutional or other partners and careful consideration of the full range of costs and activities needed to keep the resource vibrant.
  •  Be realistic in assessing the future needs of the resource at its outset and in continuing support.
  •  „Identify campus partners early on.
  •  „Consider how central your project is to the overall mission of the institution.
  • „Consider if projects could be drawn together to create a deeper network of support, both for “maintenance” projects and those with the potential to really grow.
  • Develop ways to help users find decentralized content and to reach out to content users. These could start as an inventory of all of the digital holdings or common catalogs.
  • „Determine where scale solutions pay off, where experts are best placed to champion a project, or create common storage, usage and preservation systems for an organization.
  • Continue to identify and support ongoing development of the “front-end”, including user needs, interface development, and content enhancement. Pay attention to the changing needs of users and determine what enhancements the digital resource will require.
Libraries, museums, technology departments, and digital humanities centres are among the players that have begun to emerge as potential leaders of greater coordinated digital support on university campuses. Libraries have begun to consider the support of digital resources to be a critical part of their missions. Some universities provide advisory support to project leaders and libraries provide help for digital projects, from help to understand grant requirements, co-developing projects, providing hosting, curation, and preservation expertise.

 Sustainability and Use: 
  •  Research data platforms: At some institutions major initiatives are underway to develop research data platforms. The goal of the platform is not just preservation and storage but access and reuse. The first step is to have a platform. From there they can test and refine the service for researchers depositing data sets, library  curated collections, and university departments.
  • "A coherent digital policy from early review, guidelines on costings and deposit standards, to forecasting what ongoing activities will be needed and who will carry them out, would ideally remove much of the risk of “digital time bombs” while obliging both project leaders and university leaders to take a moment to envision the ongoing impact they want these resources to have, and how to best achieve that."
  • Unlike universities (who often play the role of reluctant, passive, or simply unaware host, to a great deal of digital content created by their scholars) museums and libraries tend to be the ones initiating this work and are eager to build and maintain these collections.
  • Despite the benefits of centralisation, the mere presence of a catalogue and centralised
    repository does not ensure greater usage of or engagement with its holdings.
  • Many institutions devote considerable attention to the upfront creation of content, but not nearly as much to its ongoing enhancement or reuse, resulting in collections that are certainly present in the main catalogue, but otherwise exist only as capsules of content, frozen in time.
  •  Once the project is finished, management of the digital resource is not always clear.

Thursday, July 12, 2012

Bit Rot & Long Term Access.


a Problem, Requirement, Use Case sequence.  "We need to make serious efforts to drill down into every individual potential element and risk for Bit Rot. Create Use Cases and take action on these Use Cases and develop practical tools and/or workflow to assure we can prevent Bit Rot." It may involve "big development projects and often more a case of creating scripts that can compare "checksums" or "open and save as new" type of activities inside a repository. Todays hardware is subject to a very short technical life cycle and by far not as reliable for keeping the integrity of the bits as we sometimes like to believe." The site also includes a video of Vint Cerf's view on Bit Rot.

Friday, May 18, 2012

The CLIF Project: The Repository as Part of a Content Lifecycle

The CLIF Project: The Repository as Part of a Content Lifecycle. Richard Green, Chris Awre, Simon Waddington. Ariadne. 9 March 2012.
This was a joint project that did an extensive literature review and worked with  digital content creators to understand how to deal with the interaction of the authoring, collaboration and delivery of materials. At the heart of meeting institutional requirements for managing digital content is the need to understand the different operations through which content goes, from planning and creation through to disposal or preservation. Repositories must be integrated with the other systems that support other parts of this lifecycle to prevent them becoming yet another information silo within the institution.

The CLIF software has been designed to try and allow the maximum flexibility in how and when users can transfer material from one system to another, integrating the tools in such a way that they seem to be natural extensions of the basic systems.  This open source software is available for others to investigate and use.

The repository’s archival capability is regarded as one of its strongest assets, and the role of the repository within a University will be regarded very much in terms of what it can offer that other campus systems cannot.  It should not try to compete on all levels. There is a need to clarify better at an institutional level what functionality is offered by different content management systems, in order to better understand how different stages of the digital content lifecycle can be best enabled.


Friday, May 04, 2012

Library of Congress Digital Preservation Newsletter.

Library of Congress Digital Preservation Newsletter. May 2012. [PDF]
Items from the Newsletter include:
  • Key outcomes of the NDIIPP program are to identify priorities for born digital collections and engage organizations committed to preserving digital content.
  • Viewshare  is being used for the collections
  • Floppy Disks are Dead, Long Live Floppy Disks
    • Floppy disks are fragile constructions that were never designed for permanence.
    • Difficult to determine what is on the floppy and to recover
    • A floppy disk controller called Catweasel allows computers to access a wide variety of older disk formats (must have the floppy drive).
  • Web archiving.  
    • Because of the scope of the web sites, consider partnering with other institutions.
  •  Preservation of and Access to Federally Funded Scientific Data
    • Research data produced by federally funded scientific projects should be freely available to the wider research community and the public
    • Public data should be a public resource, and data sharing supports core scientific values like openness, transparency, and replication. 
    • Lack of resources for curating scientific data and a lingering tradition of data hoarding create resistance to open access to research data.

Saturday, October 22, 2011

Cite Datasets and Link to Publications

Cite Datasets and Link to Publications. Digital Curation Centre. 18 October 2011.
The DCC has published a guide to help authors / researchers create links between their academic publications and the underlying datasets.  It is important for those reading the publication to be able to locate the dataset.  This recognizes that data generated during research are just as valuable to the ongoing academic discourse as papers and monographs, and in many cases the data needs to be shared. "Ultimately, bibliographic links between datasets and papers are a necessary step if the culture of the scientific and research community as a whole is to shift towards data sharing, increasing the rapidity and transparency with which science advances."

This guide has identified a set of requirements for dataset citations and any services set up to support them. Citations must be able to uniquely identify the object cited, identify the whole dataset and subsets as well.  The citation must be able to be used by people and software tools alike.  There are a number of elements needed, but the "most important of these elements – the ones that should be present in any citation – are the author, the title and date, and the location. These give due credit, allow the reader to judge the relevance of the data, and permit access the data, respectively."  A persistent url is needed, and there are several types that can be used. 

Audit And Certification Of Trustworthy Digital Repositories.

The Management Council of the Consultative Committee for Space Data Systems (CCSDS) has published this manual of recommended practices. It is based on the 2003 version from RLG. “The purpose of this document is to define a CCSDS Recommended Practice on which to base an audit and certification process for assessing the trustworthiness of digital repositories. The scope of application of this document is the entire range of digital repositories.”

The document addresses audit and certification criteria, organizational infrastructure, digital object management, and risk management.  It is a standard for those who audit repositories; and, for those who are responsible for the repositories, it is an objective tool they can use to evaluate the trustworthiness of the repository.

Monday, October 17, 2011

Research Librarians Consider the Risks and Rewards of Collaboration.

Research Librarians Consider the Risks and Rewards of Collaboration. Jennifer Howard. The Chronicle of Higher Education. October 16, 2011.

Association of Research Libraries’ meeting discussed research and preservation projects like the HathiTrust digital repository and the proposed Digital Public Library of America, plans for which are moving ahead. Concerning the Digital Public Library of America: “Library” is a misnomer in this case, which is more of a federation of existing objects. It wouldn’t own anything. The main contribution would be to set standards and link resources.  “The user has to drive this.”

They said that it’s almost three times more expensive to store materials locally than it is to store them with HathiTrust. Researchers now also create and share digital resources themselves via social-publishing sites such as Scribd. There is a need for collection-level tools that allow scholars and curators to see beyond catalog records.

Discussed Recollection, a free platform built by NDIIPP and a company named Zepheira to give a better “collection-level view” of libraries’ holdings. The platform can be used to build interactive maps, timelines, and other interfaces from descriptive metadata and other information in library catalogs. So, for instance, plain-text place names on a spreadsheet can be turned into points of latitude and longitude and plotted on a map.

“Rebalancing the Investment in Collections,” discussed that libraries had painted themselves into a corner by focusing too much on their collection budgets. Investing in the right skills and partnerships is most critical now. “The comprehensive and well-crafted collection is no longer an end in itself.”

On person told librarians that they shouldn’t rush to be the first to digitize everything and invest in every new technology. “Everybody underestimates the cost of innovation,” he said. “Instead of rushing in and participating in a game where you don’t have the muscle, you want to stand back” and wait for the right moment.

Digital Preservation Matters.