This report examines the requirements for preserving transactional data and the challenges in re-using these data for analysis or research. Transactional will be used to refer to "data that result from single, logical interactions with a database and the ACID properties (Atomicity, Consistency, Isolation, Durability) that support reliable records of interactions."
Transactional data, created through interactions with a database, can come from many sources and different types of information. "Preserving transactional data, whether large or not, is imperative for the future usability of big data, which is often comprised of many sources of transactional data. Such data have potential for future developments in consumer analytics and in academic research and "will only lead to new discoveries and insights if they are effectively curated and preserved to ensure appropriate reproducibility."
The organizations who collect transactional data aim to manage and preserve collected data for business purposes as part of their records management. There are strategies for database preservation as well as tools and standards that can look at data re-use. The strategies for managing and preserving big transactional data must adapt to both SQL and NoSQL environments. Some significant challenges include the large amounts of data, rapidly changing data, and different sources of data creation.
Some notes:
- understanding the context and how the data were created may be critical in preserving the meaning behind the data
- data purpose: preservation planning is critical in order to make preservation actions fit for purpose while keeping preservation cost and complexity to a minimum
- how data are collected or created can have an impact on long-term preservation, particularly when database systems have multiple entry points, leading to inconsistency and variable data quality.
- Current technical approaches to preserving transactional data primarily focus on the preservation of databases.
- Database preservation may not capture the complexities and rapid changes enabled by new technologies and processing methods
- As with all preservation planning, the relevance of a specific approach depends on the organization’s objectives.
- Encapsulation
- Emulation
- Migration/Normalization
- Archival Data Description Markup Language (ADDML)
- Standard Data Format for Preservation (SDFP)
- Software Independent Archiving of Relational Databases (SIARD)
Best practices:
- choose the best possible format, either preserving the database in its original format or migrating to an alternative format.
- after a database is converted, encapsulate it by adding descriptive, technical, and other relevant documentation to understand the preserved data.
- submit database to a preservation environment that will curate it over time.
No comments:
Post a Comment