Friday, July 31, 2015

Rosetta Customer Testimonial - Jennifer L. Thoegersen, University of Nebraska–Lincoln

Rosetta Customer Testimonial - Jennifer L. Thoegersen, University of Nebraska–Lincoln. Jennifer Thoegersen. University of Nebraska–Lincoln / Ex Libris. July 5, 2015. [YouTube video.]
      Jennifer Thoegersen, Data Curation Librarian at the University of Nebraska–Lincoln, talks about her experience with using Rosetta for managing and preserving different types of digital content, and its impact at UNL. The challenges that they were facing included having digital materials throughout the library and the campus that they were backing up but they wanted to do more to actively preserve and manage the materials far into the future. Libraries have been tasked to be the gatekeepers for the information. They have lots of different types of content, such as research data, audiovisual content, born digital content, websites, digitized images. They have moved content from ContentDM into Rosetta.

One of the things she really likes about being a Rosetta user is that the Rosetta User Community is very helpful. The group provides insights to working with different types of situations and challenges and they share code as well. The major benefit for UNL is the ability to validate their content, monitor our digital assets over an extended period of time, and being able to tailor the system to meet their needs. Rosetta is an open, extendable, and customizable digital preservation system. The implementation team worked well, and they have also been able too work with the system developers to suggest improvements and have those changes added to the system.

Related posts:

How do you preserve your personal data forever? We ask an expert

How do you preserve your personal data forever? We ask an expert. Simon Hill. Digital Trends. July 12, 2015.
     About two billion photos are uploaded to the cloud every single day. Every minute about 300 hours of video is uploaded to YouTube.  Our digital worlds are constantly expanding. Storage and access are not the same in the digital world as they are with physical materials.  The article is a discussion with Dr. Micah Altman, Director of Research and Head/Scientist, Program on Information Science for the MIT Libraries.  “Preservation is really about long term access. It’s about communicating with the future at some point.” Digital materials are under threat of loss: “One threat is that the media fails. The hard drive fails, the DVD fails, or the disc can’t be read. Another threat is that you can see the bits, but you can no longer tell what they mean because there’s no software available that will render that document... that format is not supported anymore.”

People think that digital files will last forever, but different types of digital media have "tremendously varying shelf lives". The shelf life can vary drastically by how they are stored. Not all storage is created equal. The media that’s sold to consumers is not really built for long term storage. “One of the challenges is that the media that’s usually sold to consumers, like hard drives in computers, are not really built for long term storage.... They’re designed for an operating lifetime of maybe three, four, or five years.” There is also a lot of variation in hard drive quality; some brands and batches last longer than others.

It’s possible to buy archival quality optical discs that are designed to last a long time. Some have been working on an Archival Disc standard. The  M-Discs from Millenniata have been designed to last for about 1,000 years. Many are looking at cloud storage, but you are "shifting the burden of figuring out how to preserve your data onto your cloud provider". Some vendors are Backblaze and CrashPlan, because they implement best practices and they’re transparent about what they do. Unfortunately, cloud storage isn’t a foolproof method of preservation, and you are placing a great deal of trust in the provider you select. Businesses go bankrupt or there could be other problems. It may help to use two cloud storage solutions, because the chance of simultaneous problems at two independent companies is much lower. There are also privacy questions to consider. The data can be protected with encryption, but "if you lose the encryption key then you’ve lost the data.”

Some other tips:
  • Put materials in a standardized format. TIF and PDF-A are recommended
  • Have an explicit succession plan plan in place for what happens to your accounts when you’re not in a position to access them
  • Make multiple copies
  • Research the cloud storage services before using them
There’s no definitive answer on how to ensure your digital files last forever, but you can hedge your bets and come up with a multi-pronged strategy to give your data the greatest chance of survival.


Related posts:

Advice for our donors and depositors

Advice for our donors and depositors. Jenny Mitcham. Digital Archiving at the University of York. 25 October 2013.
     One of the best ways to ensure the longevity of your digital data is to plan for it at the point of creation. If data is created with long term archiving in mind and if a few simple and common sense data management rules are followed, then the files will be much easier for the file creator and the digital archivist to work with and to manage in the future. It is important for those who deposit digital materials in the archives to put good data management into practice. We should speak to them about this and the sooner the better.

Some of the tips include:
  • Name files sensibly
  • Organize files within a directory structure
  • Document your files
  • Always back up your files
  • Use anti-virus software

Some of the current 'hot topics' in the digital archiving world include:
  • How do you archive e-mails?
  • Is cloud storage safe?
  • What is wrong with pdf files?
  • What is the life span of a memory stick?

Related posts:

Thursday, July 30, 2015

DPOE Interview with Danielle Spalenka of the Digital POWRR Project

DPOE Interview with Danielle Spalenka of the Digital POWRR Project. Susan Manus, Barrie Howard. The Signal. July 20, 2015.
     Article about an interview with Danielle Spalenka, Project Director for the Digital POWRR Project. They had a National Leadership Grant to investigate digital preservation at institutions with limited resources. They have prepared a workshop, a white paper and the Tool Grid. The workshop, free through the end of 2016 with funding is from the NEH, looks at best practices and standards. 

Our review of the landscape of digital preservation instruction was that it is largely aimed at an audience beginning to come to grips with the idea that digital objects are subject to loss if we don’t actively care for them. There are lots of offerings discussing the theory of digital preservation – the “why” of the problem – and we found that there were limited opportunities to learn the “how” of digital preservation, both on the advocacy and technical sides. We also found that other great offerings, like the Digital Preservation Management Workshop Series based at MIT, had a tuition fee that was unaffordable for many prospective attendees, especially from under-funded institutions. Our goal in this phase is to make the workshops free to attend.

"A major goal of the workshop is to discuss specific tools and provide a hands-on portion so that participants could try a tool that they could apply directly at their own institutions." It provides an  overview of how digital preservation services and tools actually relate to the standards, how to use them in a workflow, and how to advocate for implementation. The POWRR Tool Grid is now maintained by COPTR (Community Owned digital Preservation Tool Registry).

Some recommendations for those just starting out:
  • First consider what type of tool you might be interested in (processing, storage, etc.) Looking at the specific function of a tool might be a good place to understand the wide variety of tools better.
  • A number of tools and services offer free webinars and information sessions to learn more about a specific tool. Download the tools to gain some hands-on experience.
  • Remember that digital preservation is an incremental process, and there are small steps you can take now to start digital preservation activities at your own institution. 
  • Remember you are not alone! 
  • See what others are doing and talking about. 

Related posts:

Can history and geography survive the digital age?

Can history and geography survive the digital age?  Matthew Reisz. The Times Higher Education. July 10 2015.
A leading historical geographer has called on both his disciplines to find better ways of “navigating the digital world”.
Even though history and geography rank “among the greatest synthesizing disciplines” and could help to “make the world more meaningful, more legible, for everyone”, academics rely too much on outdated technology and run the risk of having their writings end up behind a "pay-wall universe”.
“History has traditionally required long-form prose,” and academics are encouraged to write in more more publishable formats. Books have been the standard method of writing, but these do not work as well for computers, phones, tablets and e-readers are not "suitable for long-form reading.” Also, “no file format is less suitable to a smartphone than a PDF”, and they are often hidden away behind paywalls, which are difficult to access and “invisible” to search engines. Academics might need to prepare for a world in which “intellectual endeavours take place in app space”. These disciplines are "better suited to the digital world than it might seem” because historians and geographers have always relied on stories, maps and descriptions.  That is "how we can navigate the digital world.”

Similar posts:

Wednesday, July 29, 2015

BitCurator 1.5.1 VM and ISO released

BitCurator 1.5.1 VM and ISO released. Bitcurator Group. July 21, 2015.
The latest release of BitCurator is now available.The Bitcurator wiki has BitCurator 1.5.1 VM and the BitCurator 1.5.1 Installation ISO. It contains:
  • Create forensic disk images: Disk images packaged with metadata about devices, file systems, and the creation process.
  • Analyze files and file systems: View details on file system contents from a wide variety of file systems.
  • Extract file system metadata: File system metadata is a critical link in the chain of custody and in records of provenance.
  • Identify and redact sensitive information: Locate private and sensitive information on digital media and prepare materials for public access.
  • Locate and remove duplicate files: Know what files to keep and what can be discarded.
It also contains a new bootstrap and upgrade automation tool, and support for USB 3.0 devices.

Related posts:

Tuesday, July 28, 2015

detect-empty-folders

detect-empty-folders. Ross Spencer. Github. 22 July 2015.
A tool to detect empty folders in a DROID CSV. A blacklist allows you to simulate the deletion of non-record objects, which may render a folder empty.  The heuristics used here can be implemented in any language; this tool is in Python.

Related posts:

Storage Trends Around Computex 2015

Storage Trends Around Computex 2015. Tom Coughlin. Forbes. June 8, 2015.     The 2015 Computex Conference attracted many digital storage vendors. There were announcements about flash-based storage products, new memory products and optical archives.  CMC Magnetics said that it is selling 100 GB Blu-ray optical discs to Facebook for archiving applications. The company expects other internet service companies to follow suit. In May, Sony announced that it was buying Optical Archive, a start-up created by former Facebook executive. Sony is making a big push to create digital archiving solutions using Blue-ray disc technology and the acquisition is seen as an extension of this effort.

Related posts:
 

Monday, July 27, 2015

Now available: 100 GB capacity M-Disc

Now available: 100 GB capacity M-Disc. Press release. Millenniata. June 2015.
     The new 100 GB Blu-ray M-discs are now available. The new disc has all the features of the  original M-Disc. Previously the products were the standard DVD and a 25 GB Blu-ray. Archiving large data sets is now much more convenient.
[The new 100 GB discs, which completely sold out in a short time, are now available again.]

Related posts:

Researchers Open Repository for ‘Dark Data’

Researchers Open Repository for ‘Dark Data’. Mary Ellen McIntire. Chronicle of Higher Education.  July 22, 2015.
     Researchers working to create a one-stop shop to retain data sets after the papers they were produced for are published. The DataBridge project will attempt to expand the life cycle of so-called dark data by creating an archive for data sets and metadata, and will group them into clusters of information to make relevant data easier to find. They can then be reused, re-purposed, and then be reused by others to further science. A key aspect of the project will be to allow researchers to make connections pull in other data of a similar nature.

The researchers want to also include archives of social-media posts by creating algorithms to sort through tweets for researchers studying the role of social media. This could save people time who may otherwise spend a lot of time cleaning their data reinventing the wheel. The project could serve as a model for libraries at research institutions that are looking to better track data in line with federal requirements and extend researchers’ “trusted network” of colleagues with whom they share data.

Related posts:

Friday, July 24, 2015

Announcing the ArchivesDirect Price Drop

Announcing the ArchivesDirect Price Drop: Affordable Preservation, Evaluation and Workflows Plus DuraCloud Storage. Carol Minton Morris. DuraSpace. July 21, 2015.
     The ArchivesDirect hosted service from Artefactual Systems and storage in DuraCloud now has reduced pricing. This price includes a hosted instance of Archivematica, training and replicated DuraCloud storage (with a copy in Amazon S3 and one in Amazon Glacier).

The subscription plans are:
  1. Assessment. A three month plan with 500 GB of storage. Cost: $4,500
  2. Standard. An annual plan with 1 TB of storage. Cost: $9,999
  3. Professional. A custom plan. Cost: not available.

Thursday, July 23, 2015

First Large Scale, In Field SSD Reliability Study Done At Facebook

First Large Scale, In Field SSD Reliability Study Done At Facebook. Adam Armstrong. Storage Review. June 22, 2015.
    Carnegie Mellon University has released a study titled “A Large-Scale Study of Flash Memory Failures in the Field.” The study was conducted using Facebook’s datacenters over the course of four years and millions of operational hours. The study looks at how errors manifest and aim to help others develop novel flash reliability solutions.
Conclusions drawn from the study include:
  • SSDs go through several distinct failure periods – early detection, early failure, usable life, and wearout – during their lifecycle, corresponding to the amount of data written to flash chips.
  • The effect of read disturbance errors is not a predominant source of errors in the SSDs examined.
  • Sparse data layout across an SSD’s physical address space (e.g., non-contiguously allocated data) leads to high SSD failure rates; dense data layout (e.g., contiguous data) can also negatively impact reliability under certain conditions, likely due to adversarial access patterns.
  • Higher temperatures lead to increased failure rates, but do so most noticeably for SSDs that do not employ throttling techniques.
  • The amount of data reported to be written by the system software can overstate the amount of data actually written to flash chips, due to system-level buffering and wear reduction techniques.
The study doesn’t state one type of drive is better than another.

Related posts:

Digital Preservation Business Case Toolkit

Digital Preservation Business Case Toolkit. Jisc / Digital Preservation Coalition. May 2014.
     A comprehensive toolkit to help practitioners and middle managers build business cases to fund digital preservation activities. It includes step by step guide to building a case for digital preservation, such as the key activities for preparing, planning and writing a digital preservation business case. It includes templates, case studies, and other resources to go in a chronological order through the step needed when constructing a business case.

The key activities include
  1. Preparation; look at timing, your organization's strategy, and what others are doing 
  2. Audit your organization's readiness and do a risk assessment
  3. Assess where you are and what you need, your collections, your organizational risks
  4. Think hard about your stakeholders and intended audience
  5. Decide your objectives for your digital preservation activity and define what you want to achieve
  6. List your digital preservation benefits and map to your organization's strategy
  7. Look at benefits, risks and cost benefit analysis 
  8. Validate / refine your business case; Identify weaknesses and gaps in your business case
  9. Deliver your business case with maximum impact; Create an Elevator Pitch, so you have the right language ready to make your case to potential advocates in your organization. 
The elements of the template include:
  • The key features of the business case and a compelling argument for what you want to achieve.
  • Decide where you want the plan to be by a specific time
  • The key background and foundational sections of your business case, a focus on your digital assets and an assessment of the key stakeholders, and the risks facing your digital assets.
  • A description of the business activity that your business case will enable.
  • The possible options along with an assessment of the benefits, and associated costs and risks. 

Wednesday, July 22, 2015

Information Governance: Why Digital Preservation Should Be a Part of Your IG Strategy

Information Governance: Why Digital Preservation Should Be a Part of Your IG Strategy. Robert Smallwood. AIIM Community. July 6, 2015.
     The post looks at Information Governance and digital preservation. The post author wrote the first textbook on information governance (IG). He used key models as part of this, such as the  Information Governance Reference Model (IGRM), E-discovery Reference Model and the OAIS model.  The question to answer is whether or not long term digital preservation should be a part of a information governance strategy.

Information Governance is defined as: 
a set of multi-disciplinary structures, policies, procedures, processes and controls to manage information at an enterprise level that supports an organization's current and future regulatory, legal, risk, environmental and operational requirements. 
  • "Long term digital preservation applies to digital records that organizations need to retain for more than 10 years."
  • digital preservation decisions need to be made early in the records lifecycle, ideally before creation.
  • Digital preservation becomes more important as repositories grow and age.

"The decisions governing these long term records - such as digital preservation budget allocation, file formats, metadata retained, storage medium and storage environment - need to be made well in advance of the process of archiving and preserving."

"All this data - these records - cannot simply be stored with existing magnetic disk drives. They have moving parts that wear out. The disk drives will eventually corrupt data fields, destroy sectors, break down, and fail. You can continue to replace these disk drives or move to more durable media to properly maintain a trusted repository of digital information over the long term."

If you move to a cloud provider that makes preservation decisions for you, then "you must have a strategy for testing and auditing, and refreshing media to reduce error rates, and, in the future, migrating to newer, more reliable and technologically-advanced media."

Your information governance strategy is incomplete if do not have a digital preservation strategy as well. Your organization "will not be well-prepared to meet future business challenges".

IGRM_v3.0




Arkivum: Long-term bit-level preservation of large repository content

Arkivum: Long-term bit-level preservation of large repository content.  Nik Stanbridge. Arkivum. DSpace User Group. 16 June 2015. [PDF slides]
     Based on the Principles of ‘Active Archiving’, which is replication, escrow, and integrity
checking. Trying to preserve content for longer than 25 years. Principles based on diversity, intervention, with different technologies and locations.
  • Adding media, a continual process
  • Monthly checks and maintenance updates
  • Annual data retrieval and integrity checks
  • 3-5 year obsolescence of servers, operating systems and software.
  • Tape format migration
Integration with DSpace. It has ISO 27001 validated processes and procedures and is designed for bit level preservation for large volumes of data.

Similar posts:


Tuesday, July 21, 2015

File identification tools, part 8: NLNZ Metadata Extraction Tool

File identification tools, part 8: NLNZ Metadata Extraction Tool. Gary McGath. Mad File Format Science Blog.  July 10, 2015.
     This tool is for extracting metadata from files. It uses some basic tests to determine the format and then it looks at the following file formats: 
BMP, GIF, JPEG TIFF, MS Word, Word Perfect, Open Office, MS Works, MS Excel, MS PowerPoint, PDF, WAV, MP3, BWF, FLAC, HTML, XML, and ARC. 
The Java tool is available as open source on SourceForge. There are command line versions for Unix and Windows. [This tool is available to use in Rosetta.]

Related posts:

Toolkit for Managing Electronic Records

Toolkit for Managing Electronic Records. National Archives and Records Administration. May 13, 2015 updated.
The Toolkit for Managing Electronic Records is a spreadsheet that provides descriptions of a collection of guidance products for managing electronic records. It includes tools and resources that have been developed by NARA and other organizations. The separate tabs can be sorted or searched as needed.

Monday, July 20, 2015

Open Source Tools for Records Management

Open Source Tools for Records Management. National Archives and Records Administration. March 18, 2015. [PDF, 22pp.]
      NARA has identified open source tools that could be used for records management, but it does not include proprietary free software tools. Security is a concern with some implementations of open source tools.  These are  neither tested nor endorsed by NARA. The list of tools is approximately 18 pages; the tools address functionality, such as:
managing workflows; identifying duplicates; extracting and managing metadata; handling email archives; web publishing; data analyzing; working with PDF files; preservation planning; scanning files; identifying confidential data; file renaming; web archiving; comparing web pages; document managing; format identification; file integrity; image processing; and natural language processing.

The document also contains lists of other resources and tools.

Related posts:

File identification tools, part 7: Apache Tika

File identification tools, part 7: Apache Tika. Gary McGath. Mad File Format Science Blog.  July 1, 2015.
     Apache Tika is a Java-based open source toolkit that can identify a wide range of formats and extract metadata from others. It doesn’t distinguish variants as much as DROID. Plugins can be added for formats that it does not regularly support.

Related posts:

Saturday, July 18, 2015

Library of Congress Recommended Formats Statement 2015 - 2016

Library of Congress Recommended Formats Statement 2015 - 2016. July 15, 2015.
     The Library of Congress, as part of it ongoing commitment to to digital preservation, has provided the 2015-2016 version of the Recommended Formats Statement, and is seeking comments for next year's version. (The article in the Signal reviews some of the changes.)

There have been changes to the content, the layout, and also they have changed the name to from "Specifications" to "Statement".
"The Statement provides guidance on identifying sets of formats which are not drawn so narrowly as to discourage creators from working within them, but will instead encourage creators to use them to produce works in formats which will make preserving them and making them accessible simpler."

Related posts:


Friday, July 17, 2015

Filling the Digital Preservation Gap. A Jisc Research Data Spring project. Phase One report - July 2015

Filling the Digital Preservation Gap. A Jisc Research Data Spring project. Phase One report - July 2015. Jenny Mitcham, et al. Jisc Report. 14 July 2015.
     Research data is a valuable institutional asset and should be treated accordingly. This data is often unique and irreplaceable. It needs to be kept to validate or verify conclusions recorded in publications. Preservation of the data in a usable form may be required by the research funders, publishers, or  universities. The research data should be preserved  and available for others to consult  after the project that generated it is complete.This means the research data needs to be actively managed and curated. "Digital preservation is not just about implementing a good archival storage system or ‘preserving the bits’ it is about working within the framework set out by international standards (for example the Open Archival Information System) and taking steps to increase the chances of enabling meaningful re-use in the future."

Accessing research data is clearly already a problem for researchers when formats and media become obsolete. A 2013 survey showed that 25% of respondents had encountered the “Inability to read files in old software formats on old media or because of expired software licences”. A digital preservation program should address these issues. Open Archival Information System and it uses standards such as PREMIS and METS to store metadata about the objects that are being preserved.  A digital preservation system, such as Archivematica recommended in the report, would consist of a variety of different systems performing different functions within the workflow. "Archivematica should not be seen as a magic bullet. It does not guarantee that data will be preserved in a re-usable state into the future. It can only be as good as digital preservation theory and practice is currently and digital preservation itself is not a fully solved problem."

Research data is particularly challenging from a preservation point of view because of the many data types and formats, many of which are not formats that digital preservation tools and policies exist for, thus they will not receive as a high a level of curation when ingested into Archivematica.
The rights metadata within Archivematica may not fit the granularity that would be required for research data. This information would need to be held elsewhere within the infrastructure.

The value of research data can be subjective and difficult to assess and there may be disagreement on the value of the data. However, the bottom line is "in order to comply with funder mandates, publisher requirements and institutional policies, some data will need to be retained even if the researchers do not believe anyone will ever consult it." Knowing the types of formats used is a key to digital archiving and planning, and without that there will be problems later. In the OAIS Reference Model, information about file formats needs to be part of the ‘Representation Information’ that an end user must have to open and view a file.

Thursday, July 16, 2015

Toshiba's 3-D Magnetic Recording May Increase Hard Disk Drive Capacity

Toshiba's 3-D Magnetic Recording May Increase Hard Disk Drive Capacity. Tom Coughlin. Forbes. July 9, 2015.
     Toshiba demonstrated a method for using magnetic fields from microwave radiation to reverse the magnetization direction selectively in a multi-layer magnetic recording media. This could lead to the development of 3-D magnetic recording where independent data is written and read from overlapping layers of a multilayer recording media and substantially increase hard disk drive capacity. "Magnetic storage is where most of the world’s accessible digital data now lives."

Wednesday, July 15, 2015

A Compressed View of Video Compression

A Compressed View of Video Compression. Richard Wright. Preservation Guide. 22 June 2015.
   Digital audio and digitised film can also be compressed, but there are particular issues. The basic principle is that audio and video signals carry information, though the efficiency may vary. "The data rate of the sequence of number representing a signal can be much higher than the rate of information carried by the signal. Because high data rates are always a problem, technology seeks methods to carry the information in concise ways." The video signal has been altered in order to squeeze it into limited bandwidth. Redundant data  may be sent in the signal to improve the odds that the information will be transmitted. It is important to know what matters and what can be discarded. With preservation, "a key issue is management: knowing what you’re dealing with, having a strategy, monitoring the strategy, keeping on top of things so loss is prevented." Basic principles of preservation also apply to compression:
  • Keep the original
  • Keep the best
  • Do no harm
 There are best practices in dealing with compressed materials, and in migrating compressed versions to new compressed versions. His estimate is that with storage costs decreasing "there will be no economic incentive for such a cascade of compressions." "The next migration will dispense with the issue by migrating away from compressed to lovely, stable uncompressed video."

Tuesday, July 14, 2015

Seagate Senior Researcher: Heat Can Kill Data on Stored SSDs

Seagate Senior Researcher: Heat Can Kill Data on Stored SSDs.  Jason Mick. Daily Tech. May 13, 2015.
   A research paper by Alvin Cox, a senior researcher, warns that those storing solid state drives should be careful to avoid storing them in hot locations. Average "shelf life" in a climate controlled environment is about 2 years but drops to 6 months if the temperature hits 95° F / 35° C. It also says that typically enterprise-grade SSDs can retain data for around 2 years without being powered on if the drive is stored at a temperature of 25°C / 77°F. For every 5°C / 9°F increase, the storage time halves.  This also applies to storage of solid-state equipped computers and devices. If only a few  sectors are bad it may be possible to repair the drive.  But if too many sectors are badly corrupted, the only option may be to format the device and start over.

A Large-Scale Study of Flash Memory Failures in the Field

A Large-Scale Study of Flash Memory Failures in the Field. Justin Meza, et al. ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. June 15-19, 2015.
     Servers use flash memory based solid state drives (SSDs) as a high-performance alternative to hard disk drives to store persistent data. "Unfortunately, recent increases in flash density have also brought about decreases in chip-level reliability." This can lead to data loss.

This is the first large-scale study of actual flash-based SSD reliability and it analyzes data from flash-based solid state drives at Facebook data centers for about four years and millions of operational hours in order to understand the failure properties and trends. The major observations:
  1. SSD failure rates do not increase monotonically with flash chip wear, but go through several distinct periods corresponding to how failures emerge and are subsequently detected, 
  2. the effects of read disturbance errors are not prevalent in the field, 
  3. sparse logical data layout across an SSD's physical address space can greatly affect SSD failure rate, 
  4. higher temperatures lead to higher failure rates, but techniques that throttle SSD operation appear to greatly reduce the negative reliability impact of higher temperatures, and 
  5. data written by the operating system to flash-based SSDs does not always accurately indicate the amount of wear induced on flash cells
The findings will hopefully lead to other analyses and flash reliability solutions.

Monday, July 13, 2015

Risk Assessment as a Tool in the Conservation of Software-based Artworks

Risk Assessment as a Tool in the Conservation of Software-based Artworks. Patricia Falcao.
   The article looks at the use of risk assessment methodologies to identify and evaluate vulnerabilities of software-based artworks. Software-based art is dependent on technology. Two consequences of this:
1. Because electronic equipment is usually mass-produced, there are very few cases where one individual device is essential for the artwork.
2. On the other hand, when the equipment is no longer commercially available it becomes very difficult to replace any of its elements.
A sculpture conservator may be able to re-make a missing element for a sculpture by using the same or similar materials but a time-based media conservator cannot always re-make obsolete electronics. A particular artwork may use custom-made software. "Any software, in turn, usually requires a specific operating system. All the programs, from the firmware to the operating system, must run properly. All settings, plug-ins, and codecs must be in place. Without all of these, there is no artwork."

This means that each artwork is a custom-made system; the components may vary with each iteration of the work and as technology changes. A conservator understand "how these components are used in the particular system and how they influence the risks and options for long-term preservation." With the conservation of contemporary art,  obsolescence only affects an artwork once something stops working. But the effect of obsolescence will increase over time. 

Software-based artworks are similar to time-based media, bu they are more vulnerable to those risks because:
  1. Systems are customized for each artwork.
  2. Systems are easily changed, so that connecting a archival computer to the Internet could cause it to run an automatic update that causes the file will no longer run. 
  3. The technical environment is rapidly changing.
The degree of significance can be evaluated by
  1. Provenance 
  2. Rarity or representativeness 
  3. Condition or completeness 
  4. Interpretive capacity
A procedure for the acquisition of software-based artworks being developed is composed of simple actions during acquisition that can diminish the impact of obsolescence in the medium-term. It is important to discuss the artwork, technology, and possible preservation measures with the artists and technical staff.  The conservator should identify and define:
  1. The display or presentation parameters 
  2. What can or cannot be changed, and within what limits. 
  3. Identify obsolescent elements and create a plan for recovery.  
  4. How the artist wants the artwork preserved. Identify core elements and migration strategy.  
  5. Understand the system (hardware, software). Test the system with the artist and staff. 
Over the lifetime of the artwork,
  1. Document the system and any changes over time  
  2. Prevent changes such as automatic updates
  3. Monitor obsolescence issues with the components of the work.  
  4. Re-evaluate preservation needs regularly. 
Some steps that can be taken to reduce the failure and obsolescence
  1. Make clones of the computer’s hard drive immediately upon acquisition. 
  2. Create an exhibition copy of the system, possibly with the artist and staff. 
  3. Gather operation manuals, service manuals, and hardware specifications. 
  4. Save the software versions, source code, libraries, and programming tools necessary to read project files. 
For long term preservation,
  1. Continue to implement the preservation strategies identified.
  2. Develop clear procedures for the acquisition of software-based artworks. 
  3. Identify software tools useful for preservation. 
  4. Test recovery strategies and confirm results over time. 
  5. Develop relationships with experts in the fields required for preservation. 
Software-based preservation will require more than just the conservator. It will also require help from the technology field and many tools.

Friday, July 10, 2015

Track the Impact of Research Data with Metrics; Gauge Archive Capacity

How to Track the Impact of Research Data with Metrics. Alex Ball, Monica Duke.  Digital Curation Centre. 29 June 2015.
   This guide from the DCC provides help on how to track and measure the impact of research data. It provides:
  • impact measurement concepts, services and tools for measuring impact
  • tips on increasing the impact of your data 
  • how institutions can benefit from data usage monitoring  
  • help to gauge capacity requirements in storage, archival and network systems
  • information on setting up promotional activities 
Institutions can benefit from data usage monitoring as they:
  • monitor the success of the infrastructure providing access to the data
  • gauge capacity requirements in storage, archival and network systems
  • create promotional activities around the data, sharing and re-use
  • create special collections around datasets;
  • meet funder requirements to safeguard data for the established lifespan
Tips for raising research data impact
  • deposit data in a trustworthy repository
  • provide appropriate metadata
  • enable open access
  • apply a license to the data about what uses are permitted
  • raise awareness to ensure it is visible (citations, publication, provide the dataset identifier, etc)

Thursday, July 09, 2015

Respected US professor says libraries are places of knowledge creation and librarians our educators.

Respected US professor says libraries are places of knowledge creation and librarians our educators. CILIP . 2 July 2015.
  • R. David Lankes: librarians have the power to change the world by “promoting informed democracy”.
  • “Libraries are not about books, and librarians are not about collections, nor are they about waiting to serve. Our libraries are mandated, mediated spaces owned by the community, and librarians are educators dedicated to knowledge creation who exist to unleash the expertise held within their community.”
  • There is a need for a skilled workforce to properly understand and manage information
A new innovation also mentioned is the Ideas Box , a durable, portable library in a box that is designed to provide access to vital information and culture in humanitarian crises. Pioneered by Bibliothèques Sans Frontières/Libraries Without Borders, it can be sent to refugee camps and other remote populations anywhere in the world and set up in under an hour."
ideasbox img1

Wednesday, July 08, 2015

Audiovisual Preservation: Sustainability is Paramount

Audiovisual Preservation: Sustainability is Paramount. David Braught. Crawford Media Services. July 6, 2015.
   Many organizations want the most pristine, uncompressed, high quality files possible. That may seem to make sense, but that is usually unrealistic for most organizations. The storage costs to store the massive files can "lead to paralysis in your digital initiatives and to significant long-term data loss (owing to lack of funds for digitization and archival storage upkeep)." While this may be the best way for some, don't automatically assume there is only one right way of audiovisual digitization. There are many options, file types, and organizational factors.  An important part of this is to define your primary goal. The web site includes a chart that shows estimated information about a project:

File Type Bitrate (Mbps) Total Footprint for 500 Hours (Terabytes)
4K DPX (no audio)   9830 2,109.29 TB
2K DPX (no audio)   2400    514.98 TB
Uncompressed 10 Bit HD   1200    257.49 TB
Uncompressed 8 Bit HD     952    204.28 TB
Uncompressed 10 Bit SD     228      48.92 TB
Uncompressed 8 Bit SD     170      36.48 TB
Lossless Jpg2K 10 Bit HD     445      95.49 TB
Lossless Jpg2K 8 Bit HD     330      70.81 TB
Lossless Jpg2K 10 Bit SD       85      18.24 TB
Lossless Jpg2K 8 Bit SD       65      13.95 TB
DV25       31        6.65 TB

Storage costs are "neither cheap nor long term". You can't just put files on a hard drive and expect them to survive indefinitely. A long term solution requires redundant, archivally sound storage that is  migrated to newer storage every five years. "It does no one any good to ingest thousands of hours of 4K scans and then have to pull the plug on the storage fifteen years down the line. Sustainability should always be paramount."  Each institution has to decide what is the best option for them.

Tuesday, July 07, 2015

With The Rise Of 8K Content, Exabyte Video Looms

With The Rise Of 8K Content, Exabyte Video Looms. Tom Coughlin. Forbes. June 25, 2015.
   Digital storage has an important role in the professional media and entertainment industry. The ever growing archive of long-tail digital content and increasing digitized historical analog content is in increasing demand for archives using tape, optical discs and hard drive arrays. There has been a noticeable increase in 8K content. " It is expected that single video projects generating close to 1 Exabyte of raw data will occur within 10 years." A recent survey on digital storage in media and entertainment showed important for digital storage trends in the area.

Those using cloud storage:
  • 2015: 30.2% of participants
  • 2014: 25.6% 
  • 2013: 24.7%
  • 2012: 15.1%  
 Those with over 1 TB in the cloud:
  • 2015: 32.9%
  • 2014: 28.1% 
  • 2013: 23%
  • 2012: 26.7%
Some other results from the survey concerning archiving:
  • 34% had over 2,000 hours of content in a long term archive in 2015
  • 26.9% added over 1,000 hours to their archive in 2015
  • 32.6% had over 2,000 hours of unconverted analog content in 2015
  • 42.8% said they have an annual analog conversion rate of 2% or less (4.5% was average)
Types of storage media in 2015:
  • Digital Tape: 40% 
  • External Hard Disk Drives: 28%  
  • Disk-based Local Storage Networks: 16%.
  • Optical discs: 6%.
  • Public cloud: 5%

In 10 Years A Single Movie Could Generate Close To 1 Exabyte Of Content

In 10 Years A Single Movie Could Generate Close To 1 Exabyte Of Content. Tom Coughlin. Forbes. October 5, 2014.
   Storage requirements for images and video are increasing.  "In the near future, several petabytes of storage may be required for a complete digital movie project at 4K resolution.  By the next decade total video captured for a high end digital production could be hundreds of PB, even approaching 1 Exabyte." A recent survey shows that overall cloud storage for media and entertainment is expected to grow 37 times  (322 PB to 11,904 PB) and cloud storage revenue will exceed $1.5 billion by 2019.
  • The largest demand for storage is for digital conversion and preservation (including archiving of new digital content - 96.5%).  
  • Archiving and preservation in 2013 was about 47% of the total storage revenue. Active archiving will drive increased use of storage for long term archives.
  • By 2019 it is expected that 64% of archived content will be in near-line storage, up from 43% in 2013.
  •  Over 50 Exabytes of digital storage will be used by 2019 for digital archiving and content conversion and preservation

Cloud Storage Revenue

Collection, Curation, Citation at Source: Publication@Source 10 Years On

Collection, Curation, Citation at Source: Publication@Source 10 Years On. Jeremy G. Frey, et al. International Journal of Digital Curation. Vol 10, No 2, 2015.
   The article describes a scholarly knowledge cycle which says the accumulation of knowledge is based on the continuing use and reuse of data and information. Collection, curation, and citation are three processes intrinsic to the workflows of the cycle. The currency of collection, curation, and citation is metadata."Policies should recognize that small amounts of adequately characterized, focused data are preferable to large amounts of inadequately defined and controlled data stored in a random repository." The increasing size of data-sets and the growing risk of loss through catastrophic failure (such as a disk failure) has led to researchers to use cloud storage, perhaps too uncritically so.

The responsibilities of researchers for meeting the requirements of sound governance and ensuring the quality of their work have become more apparent. The article places the responsibility for curation firmly with the originator of the data. "Researchers should organize their data and preserve it with semantically rich metadata, captured at source, to provide short- and long-term advantages for sharing and collaboration."  Principal Investigators, as custodians, are particularly responsible for clinical data management and security (though curation and preservation activities exist in other research roles). "Curators usually attempt to add links to the original publications or source databases, but in practice, provenance records are often absent, incomplete or ad hoc, often despite curators’ best efforts. Also, manually managed provenance records are at higher risk of human error or falsification." There is a pressing need for training and education to encourage researchers to curate the data as they collect it at source.

"All science is strongly dependent on preserving, maintaining, and adding value to the research record, including the data, both raw and derived, generated during the scientific process. This statement leads naturally to the assertion that all science is strongly dependent on curation."

Monday, July 06, 2015

TIFF/A

TIFF/A. Gary McGath. File Formats Blog.  July 3, 2015.
   The tiff format has been around for a long time. There have been many changes and additions, such that "TIFF today is the sum of a lot of unwritten rules".  A group of academic archivists have been working on a long term readable version, calling it TIFF/A. A white paper discusses the technical issues. Discussions starting in September will hope to create a version to submit for ISO consideration.

Presentation on Evaluating the Creation and Preservation Challenges of Photogrammetry-based 3D Models

Presentation on Evaluating the Creation and Preservation Challenges of Photogrammetry-based 3D Models. Michael J. Bennett. University of Connecticut. May 21, 2015.
    Photogrammetry allows for the creation of 3D objects from 2D photography, which mimics human stereo vision. There are many steps in the process, images, masks, depth maps, models, and textures. The question is, what should be archived for long term digital preservation? When models are output into an open standard, there is data loss, since “native 3D CAD file formats cannot be interpreted accurately in any but the original version of the original software product used to create the model.”

General lessons from archiving CAD files, are that, when possible, the data should be normalized into open standards. But native formats, which are often proprietary, should also be archived. With Photogrammetry Data, the author reviews some of the options and recommendations. There are difficulties with archiving the files, and also organizing the files in a container that are documents the relationships of the files. Digital repositories can play a role in the preservation of the 3D datasets.

Friday, July 03, 2015

Australian electronic books to be preserved at the National Library in Canberra under new laws

Australian electronic books to be preserved at the National Library in Canberra under new laws. Clarissa Thorp. ABC. 3 July 2015.
Starting in January of next year digital materials including e-books, blogs, prominent websites, and  important social media messages will be collected as a snapshot of Australian life. Under existing copyright laws, the National Library of Australia is able to collect all books produced by local publishers through the legal deposit system. Now with new legislation adopted by the Federal Parliament the Library will be able to preserve published items from the internet that could disappear from view in future. "This legislation puts us in a position where we are able to ask publishers to deposit electronic material with the National Library in a comprehensive way." "So we will be able to open that up and collect the whole of the Australian domain, for websites for example it means we are able to collect e-books that are only published in digital form." This new legislation will expand the Library's digital preservation program and ensure that future collections reflect Australian society as a whole.

Thursday, July 02, 2015

Vatican Library digitizes ancient manuscripts, makes them available for free

Vatican Library digitizes ancient manuscripts, makes them available for free. Justin Scuiletti.  PBS NewsHour. October 22, 2014.
The Vatican Apostolic Library is digitizing its archive of ancient manuscripts and making them available to view.  view. They are undertaking an extensive digital preservation of its 82,000 documents.  The entire undertaking is expected to take at least 15 years and cost more than $63 million. “Technology gives us the opportunity to think of the past while looking towards the future, and the world’s culture, thanks to the web, can truly become a common heritage, freely accessible to all, anywhere and any time.” The current list of digitized manuscripts can be viewed through the Vatican Library website  and the project website.

Wednesday, July 01, 2015

Over 28 exabytes of storage shipped last quarter

More than 28 billion gigabytes of storage shipped last quarter. Lucas Mearian. Computerworld. June 30, 2015.
Worldwide data storage hardware sales increased 41.4% over the same quarter in 2014. This past quarter, 28.3 exabytes of capacity was shipped out.  Traditional external arrays decreased while demand strongly increased for server-based storage and hyperscale infrastructure (distributed infrastructures that support cloud and big data processing, and can scale to thousands of servers). The largest revenue growth was in the server market (new server sales and not just upgrades to existing server infrastructures).  The most popular external storage arrays were all-flash models and hybrid flash arrays that combine NAND flash with hard disk drives.

Tuesday, June 30, 2015

National Archives kicks off 'born-digital' transfer

National Archives kicks off 'born-digital' transfer. Mark Say. UKAuthority. 24 June 2015.
The National Archives is looking at the long term issue of keeping records accessible as the technology in which they are originally created changes.

"To make sure born-digital records can be permanently preserved we’re engaged in what we call parsimonious presentation, in which we’re making sure it can be used by the next trends of technology being developed. We want them to be easily viewed in 10 years’ time, although we cannot plan for 100 years as there’s no way we can know what the technology will look like."

“To ensure records will still be used in the same way we want to see what the technology is going to do in the next 10 years.

“Digital preservation is a major international challenge. Digital technology is changing what it means to be an archive and we are responding to these changes.

“These records demonstrate how we are leading the archive sector in embracing the challenges of storing digital information for future generations. We are ensuring that we are ready to keep the nation’s public records safe and accessible for the future, whatever their format.”

Monday, June 29, 2015

File identification tools, part 5: FITS

File identification tools, part 5: FITS. Gary McGath. File Formats Blog.  June 25, 2015.
The File Information Tool Set (FITS), which aggregates results from several file identification tools, was created by the Harvard University Libraries and is available in Github. FITS uses Apache Tika, DROID, ExifTool, FFIdent, JHOVE, the National Library of New Zealand Metadata Extractor, and four Harvard tools.  The tool can be used in the ingest process; it processes directories and subdirectories, and produces a single XML output file in various schemas. It can be run as a standalone tool or incorporated with other tools, and can be configured to determine which tools to run and which extensions to examine.  Documentation is found on Harvard’s website.

SIRF: Self-contained Information Retention Format

SIRF: Self-contained Information Retention Format. Sam Fineberg,et al. SNIA Tutorial. 2015. [PDF]
Generating and collecting very large data sets that need to be kept for long periods is a necessity for many organizations, included sciences, archives, commerce. The presentation describes the challenges with keeping data long term with Linear Tape File System (LTFS) technology and a Self-contained Information Retention Format (SIRF). The top external factors driving long-term retention requirements are: Legal risk, compliance regulations, business risk, and security risk.

What does long-term mean? Retention of 20 years or more is required by 70% of the responses in a poll.
  • 100 years: 38.8%
  • 50-100 years: 18.3%
  • 21-50 years: 31.1%
  • 11-20 years: 15.7%
  • 7-10 years: 12.3%
  • 3-5 years: 1.9%
The need for digital preservation:
  • Regulatory compliance and legal issues
  • Emerging web services and applications
  • Many other fixed-content repositories (Scientific data, libraries, movies, music, etc.)
Data stored should remain accessible, undamaged, and usable for as long as desired and at an affordable cost. Affordable depends on the "perceived future value of information". There are problems with verifying the correctness and authenticity of semantic information over time. SIRF is the digital equivalent of a self contained archival box. It contains:
  • set of preservation objects and a catalog (logical or physical)
  • metadata about the contents and individual objects
  • self describing standard catalog information so it can all be maintained
  • a "magic object" that identifies the container and version
The metadata contains basic information that can vary depending on the preservation needs. It allows a deeper description of t he objects along with the content meaning and the relationship between the objects.

When preserving objects, we need to keep all the information to make them fully usable in the future. No single technology will be "usable over the time-spans mandated by current digital preservation needs". LTFS technologies are "good for perhaps 10-20 years".

Saturday, June 27, 2015

Russian Official Proposes International Investigation Into U.S. Moon Landings. Cultural Preservation?


Russian Official Proposes International Investigation Into U.S. Moon Landings. Ingrid Burke. The Moscow Times.  June 16, 2015.
Russia's Investigative Committee spokesman, Vladimir Marki, called for an international investigation to (among other things) solve the mystery of the disappearance of film footage from the original moon landing in 1969. "But all of these scientific - or perhaps cultural - artifacts are part of the legacy of humanity, and their disappearance without a trace is our common loss. An investigation will reveal what happened."

 [Interesting that the political wranglings have now reached the level of  historical archiving and cultural preservation.]
 

Friday, June 26, 2015

ARSC Guide to Audio Preservation

ARSC Guide to Audio Preservation. Sam Brylawski, et al. National Recording Preservation Board of the Library of Congress. May 2015. [PDF, 252 pp.]
CLIR, the Association for Recorded Sound Collections (ARSC) and the National Recording Preservation Board (NRPB) of the Library of Congress, has published CLIR Publication No. 164, an excellent guide to audio preservation.
"Our audio legacy is at serious risk because of media deterioration, technological obsolescence, and, often, lack of accessibility. This legacy is remarkable in its diversity, ranging from wax cylinders of extinct Native American languages to tapes of local radio broadcasts, naturalists’ and ethnographers’ field recordings, small independent record company releases, and much more. These recordings are held not by a few large organizations, but by thousands of large and small institutions, and by individuals. The publishers hope that this guide will support and encourage efforts at all institutions to implement best practices to help meet the urgent challenge of audio preservation."

Chapters include:

  • Preserving Audio (Recorded Sound at Risk, Preservation Efforts, Roles)
  • Audio Formats: Characteristics and Deterioration (Physical, digital)
  • Appraisals and Priorities (Tools; Selection/collection policies, decisions)
  • Care and Maintenance (Handling, assessment) and arrangement
  • Description of Audio Recordings (Metadata, standards, tools)
  • Preservation Reformatting (Conversion to digital files, metadata, funding)
  • Digital Preservation and Access: Process, storage infrastructure
  • Audio Preservation: The Legal Context (Copyright, control, donor agreements)
  • Disaster Prevention, Preparedness, and Response
  • Fair Use and Sound Recordings Lessons
Some notes from reading the publication:
  • the ultimate goals of preservation are sustained discovery and use
  • all these dissimilar recordings together represent is an audio DNA of our culture
  • our enjoyment of the recordings has far exceeded our commitment to preserve them
  • history is represented in sound recordings; it entertains and enriches us
  • if compressed files are the only versions available to the public, we have no assurances that anyone is maintaining the higher fidelity originals
  • efforts of large and small institutions and private collectors are needed to make a meaningful dent in the enormous volume of significant recordings not yet digitized for preservation
  • if we are to preserve our audio legacy, all institutions with significant recordings must be part of the effort
  • proactive attention, care, and planning are critical to the future viability and value of both analog and digital recordings
  • institutions often have more items in their care than they have resources for adequate processing, cataloging, and preservation
  • the potential technical obsolescence of the hardware to play a recording should influence priorities and resources allocated for preservation
  • perhaps the most crucial feature a metadata schema is its degree of interoperability for sharing, searching, harvesting, and transformation or migration  
  • the preservation choice is not binary "either we implement intensive preservation immediately and forever; or we do nothing". We should not delay action because the ideal cannot be achieved
  • preservation metadata is the information needed to support the long-term management and
    usability of an object 
  • the Broadcast Wave Format (BWF) is the de facto standard for digital audio archiving
  • monitoring and planning to avoid obsolescence are important aspects of a solid digital preservation strategy
  • audio preservation is an ongoing process that may be challenging and intimidating; setting priorities is central to a successful preservation strategy
  • digital preservation will enable the fulfillment of the goal of long-term use (whether focused on education, scholarship, broadcasting, marketing, or sales)
  • ensure that there is at least one geographically separate copy of all digital content
  • recognize the use of sound recordings as sources of information by students and researchers
  • libraries and memory institutions should provide points of cultural reference for the current generation of creators
Several free, open source software tools are available
  • assessing audio collections for the purpose of setting preservation priorities
    • The Field Audio Collection Evaluation Tool (FACET)
    • Audio/Video Survey
    • Audiovisual Self-Assessment Tool (AvSAP)
    • MediaSCORE and MediaRIVERS
  • metadata tools
    • CollectiveAccess
    • Audio-Visual and Image Database (AVID)
    • AudioVisual Collaborative Cataloging (AVCC)
    • PBCore
 "When libraries, archives, and museums exercise their legal rights to preserve and facilitate
access to information, even without permission or payment, they are
furthering the goals of copyright."

"The professional management of a collection requires the development of criteria for selecting and preserving collections of sound recordings. A selection or collection development policy defines and sets priorities for the types of collections that are most appropriate and suitable for an organization to acquire and to preserve. The basis for these criteria should be the goals and objectives of the individual institution."