Showing posts with label database. Show all posts
Showing posts with label database. Show all posts

Monday, June 20, 2016

Preserving Transactional Data

Preserving Transactional Data. Sara Day Thomson. DPC Technology Watch Report 16-02. May 2016.
     This report examines the requirements for preserving transactional data and the challenges in re-using these data for analysis or research.   Transactional will be used to refer to "data that result from single, logical interactions with a database and the ACID properties (Atomicity, Consistency, Isolation, Durability) that support reliable records of interactions."

Transactional data, created through interactions with a database, can come from many sources and different types of information. "Preserving  transactional data, whether large or not, is imperative for the future usability of big data, which is often comprised of many sources of transactional data.  Such data have potential for future developments in consumer analytics and in academic research and "will only lead to new discoveries and insights if they are effectively curated and preserved to ensure appropriate reproducibility."

The organizations who collect transactional data aim to manage and preserve collected data for business purposes as part of their records management. There are strategies for database preservation as well as tools and standards  that can look at data re-use. The strategies for managing and preserving big transactional data must adapt to both SQL and NoSQL environments. Some significant challenges include the large amounts of data, rapidly changing data, and different sources of data creation. 

Some notes:
  • understanding the context and how the data were created may be critical in preserving the meaning behind the data
  • data purpose: preservation planning is critical in order to make preservation actions fit for purpose while keeping preservation cost and complexity to a minimum
  • how data are collected or created can have an impact on long-term preservation, particularly when database systems have multiple entry points, leading to inconsistency and variable data quality.
  • Current technical approaches to preserving transactional data primarily focus on the preservation of databases. 
  • Database preservation may not capture the complexities and rapid changes enabled by new technologies and processing methods 
  • As with all preservation planning, the relevance of a specific approach depends on the organization’s objectives.
There are several approaches to preserving databases:
  • Encapsulation
  • Emulation 
  • Migration/Normalization
  • Archival Data Description Markup Language (ADDML)
  • Standard Data Format for Preservation (SDFP) 
  • Software Independent Archiving of Relational Databases (SIARD)
"Practitioners of database preservation typically prefer simple text formats based on open standards. These include flat files, such as Comma Separated Value (CSV), annotated textual documents, such as Extended Markup Language (XML), and the international and open Structured Query Language (SQL)." The end-goal is to keep data in a transparent and vendor-neutral database so they can be  reintegrated into a future database.

Best practices:
  1. choose the best possible format, either preserving the database in its original format or migrating to an alternative format.
  2. after a database is converted, encapsulate it by adding descriptive, technical, and other relevant documentation to understand the preserved data.
  3. submit database to a preservation environment that will curate it over time.
Research is continuing in the collection, curation, and analysis of data; digital preservation standards and best practices will make the difference between just data and "curated collections of rich information".

Tuesday, April 28, 2015

Database Preservation Toolkit

Database Preservation Toolkit. Website. April 2015.
The Database Preservation Toolkit uses input and output modules and allows conversion between database formats, including connection to live systems. It allows conversion of live or backed-up databases into preservation formats such as DBML, SIARD, or XML-based formats created for the purpose of database preservation.

This toolkit was part of the RODA project and now has been released as a separate project. The site includes download links and related publications and presentations.

Preserving digital records and databases

Preserving digital records and databases. Luis Faria. PASIG Presentation. March 13, 2015.
Presentation on tools and models for database preservation. Diagram of  import and export flow using db-preservation-toolkit, as well as their model and the OAIS model. Throughput of row-intensive databases is 10.000 rows/s. Use the SIARD format for preservation. The SIARD-E version is underdevelopment.


Thursday, April 09, 2015

Digital Preservation: We Know What it Means Today, But What Does Tomorrow Bring?

Digital Preservation: We Know What it Means Today, But What Does Tomorrow Bring? Randy Kiefer's presentation.  UKSGLive. April 3, 2015.
Long-term preservation refers to processes and procedures required to ensure content remains accessible well into the future. Publishers want to be good stewards of their content. Digital Preservation is an "insurance policy" for e-resources. Commercial hosting platforms and aggregators are not preservation archives! They can remove discontinued content from their system, and commercial businesses may disappear, as with Metapress. There are global digital preservation archives, such as CLOCKSS and Portico, and regional archives, such as National Libraries.
The biggest challenges in the future are: formats (especially presentation of content; and what to do with databases, datasets and supplementary materials. "Any format can be preserved, including video. The issue is that of space, cost and presentation (especially if the format is now not in use/supported)." There are legal issues with cloud based preservation systems. There is no legal precedent with a cloud-based preservation system, and no protection with regards to security.



Saturday, March 21, 2015

Reaching Out and Moving Forward: Revising the Library of Congress’ Recommended Format Specifications

Reaching Out and Moving Forward: Revising the Library of Congress’ Recommended Format Specifications. Ted Westervelt, Butch Lazorchak. The Signal. Library of Congress. March 16, 2015.
The Library has created the Recommended Format Specifications, which is the result of years of work by experts from across the institution because it is essential to the mission of the institution. The  Library is committed to making the collection available to its patrons now and for generations to come and must be able to determine the physical and technical characteristics needed to fulfill this goal. The Specifications have hierarchies of characteristics, physical and digital, in order to provide guidance and determine the level of effort involved in managing and maintaining content. In order to continue manage the materials, the Specifications must be constantly reviewed and updated and materials and formats change. An example is exploring the potential value of the SIARD format developed by the Swiss Federal Archives as a means of preserving relational databases.