This blog contains information related to digital preservation, long term access, digital archiving, digital curation, institutional repositories, and digital or electronic records management. These are my notes on what I have read or been working on. I enjoyed learning about Digital Preservation but have since retired and I am no longer updating the blog.
Thursday, January 14, 2016
Files on nearly 200 floppy disks belonging to Star Trek creator recovered
Information from nearly 200 floppy disks that belonged to Gene Roddenberry has been recovered. Roddenberry, Star Trek creator, used the 160KB disks in the 1980's to store his work and "to capture story ideas, write scripts and notes." The 5.25-inch floppy disks, found several years after the death of Roddenberry, were used with two custom computers and a custom-built OS. The computers were no longer available for recovery use, so the floppies were sent to DriveSavers, which wrote software that could read the disks.
Wednesday, March 25, 2015
I tried to use the Internet to do historical research. It was nearly impossible.
How do you organize so much information? So far, the Internet Archive has archived more than 430,000,000,000 web pages. It’s a rich and fantastic resource for historians of the near-past. Never before has humanity produced so much data about public and private lives – and never before have we been able to get at it in one place. In the past it was just a theoretical possibility, but now we have the computing power and a deep enough archive to try to use it.
But it’s a lot more difficult to understand than we thought. "The ways in which we attack this archive, then, are not the same as they would be for, say, the Library of Congress. There (and elsewhere), professional archivists have sorted and cataloged the material. We know roughly what the documents are talking about. We also know there are a finite number. And if the archive has chosen to keep them, they’re probably of interest to us. With the internet, we have everything. Nobody has – or can – read through it. And so what is “relevant” is completely in the eye of the beholder."
Historians must take new approaches to the data. No one can read everything, nor know what is even in the archive. Better sampling, specifically chosen for their historical importance, can give us a much better understanding. We need to ask better questions about how sites are constructed, what links exist between sites, and have more focused searches. And we need to know what questions to ask.
Thursday, March 28, 2013
Challenges of Dumping/Imaging old IDE Disks
Full system preservation through imaging or processing in digital forensics depend on reliable hardware-software stacks for identity system disk migrations. There are a number of pitfalls which might prevent authentic copies of the original components to an image file. The article discusses issues with disk recognition, reading and verification. The tool of choice to produce identical copies of block devices in Linux/Unix systems is dd.
Thursday, September 20, 2012
Swatting the Long Tail of Digital Media:A Call for Collaboration.
Archiving born digital content stored on a wide range of physical media types requires specialized
knowledge, expertise, and equipment to read and preserve the content on physical media, ranging from punched cards to flash drives. In general, transferring content from a particular physical medium requires a compatible computer that can read the data in the format that is stored on the medium, but also other hardware and software components, such as cables and drivers. A community-based approach could establish software and workstations for antiquated technology (SWAT ) sites where a few institutions acquire and maintain the technology and expertise to read data and transfer content from particular types of obsolete media.
Thursday, September 06, 2012
You've Got to Walk Before You Can Run: First Steps for Managing Born-Digital Content Received on Physical Media.
Friday, May 04, 2012
Library of Congress Digital Preservation Newsletter.
Items from the Newsletter include:
- Key outcomes of the NDIIPP program are to identify priorities for born digital collections and engage organizations committed to preserving digital content.
- Viewshare is being used for the collections
- Floppy Disks are Dead, Long Live Floppy Disks
- Floppy disks are fragile constructions that were never designed for permanence.
- Difficult to determine what is on the floppy and to recover
- A floppy disk controller called Catweasel allows computers to access a wide variety of older disk formats (must have the floppy drive).
- Web archiving.
- Because of the scope of the web sites, consider partnering with other institutions.
- Preservation of and Access to Federally Funded Scientific Data
- Research data produced by federally funded scientific projects should be freely available to the wider research community and the public
- Public data should be a public resource, and data sharing supports core scientific values like openness, transparency, and replication.
- Lack of resources for curating scientific data and a lingering tradition of data hoarding create resistance to open access to research data.
Wednesday, July 20, 2011
The death of backup and the rapid rise of the cloud
This may be a great benefit for system administrators, but it doesn't eliminate the need to understand your archiving requirements, data security and protection issues, and a plan for vendor migration.
Tuesday, March 02, 2010
Digital Preservation Matters - March 2, 2010
A Guide to Distributed Digital Preservation. Katherine Skinner, Matt Schultz. Educopia Institute. February 2010. [156 p. PDF]
The software provides bit-level preservation for digital objects of any file type or format, but it can also provide a set of services to make the preserved files usable in the future, such as normalizing and migrating. The MetaArchive network is a dark archive with no public interface; communication between caches is secure. Organizations collaborating on preserving digital content must examine the roles and responsibilities of members, address essential management, policy, and staffing questions, develop standards, and define the network’s sphere of activity. Ingest, monitoring, and recovery of content are critical steps for preserving the content.
Some interesting quotes from the guide:
- Paradoxically, there is simultaneously far greater potential risk and far greater potential security for digital collections
- many cultural memory organizations are today seeking third parties to take on the responsibility for acquiring and managing their digital collections. The same institutions would never consider outsourcing management and custodianship of their print and artifact collections;
- A great deal of content is in fact routinely lost by cultural memory organizations as they struggle with the enormous spectrum of issues required to preserve digital collections,
- A true digital preservation program will require multi-institutional collaboration and at least some ongoing investment to realistically address the issues involved in preserving information over time.
- One of the greatest risks we run in not preserving our own digital assets for ourselves is that we simultaneously cease to preserve our own viability as institutions.
---
Encouraging Open Access. Steve Kolowich. Inside Higher Ed. March 2, 2010.
Conversations about open access to journal articles currently revolve around policy, not technology; about if the content should be made available, not how. “Without content, an IR is just a set of empty shelves.” A new model of repository focuses on giving researchers an online “workspace” within the repository where they can upload and preserve different versions of an article they are working on. The idea is to make publishing articles to the open repository a natural extension of the creative process. This is based on a survey where professors wanted:
- to be able to work with co-authors easily,
- to keep track of different versions of the same document, and
- to make their work more visible
- all while doing as little extra work as possible.
---
In the digital age, librarians are pioneers. Judy Bolton-Fasman. The Boston Globe. February 10, 2010.
Book review of This Book Is Overdue: How Librarians and Cybrarians Can Save Us All By Marilyn Johnson.
- Among information professionals, Johnson notes there are librarians and archivists: “Librarians were finders [of information]. Archivists were keepers.’’ But the information revolution is affecting both.
- The digital age is making possible the creation of searchable databases of archives, but it’s also making information, especially on the Internet, more ephemeral and harder to collect.
- Information archivists “capturing history before it disappears because of a broken link or outdated software.”
- in a world where technology moves life at a breathtaking pace, “where information itself is a free-for-all, with traditional news sources going bankrupt and publishers in trouble, we need librarians more than ever’’ to help point the way to the best, most reliable sources.
---
Installing OAIS Software: Archivematica. Chris Prom. Practical E-Records. February 1, 2010.
One of several reports on open source tools the blog author is evaluating to help with ingest, storage, and access processes in archives. This post looks at Archivematica, and he likes the supportable model for facilitating archival work with electronic records. It is a Ubuntu-based virtual appliance which can exist alongside preservation tools on other systems. It can be installed locally and in a variety of ways. Worth looking in to.
---
IBM announces massive NAS array for the cloud. Lucas Mearian. Computerworld. February 11, 2010.
IBM has announced SONAS, an enterprise-class network-attached storage array capable of scaling from 27TB to 14 petabytes under a single name space. It is designed to provide access to data anywhere any time. The policy-driven automation storage software allows an institution to predefine where data is placed, when it is created, where and when it moves to in the storage hierarchy, where it's copied for disaster recovery, and when it will be eventually deleted.
Monday, October 26, 2009
Digital Preservation Matters - 23 October 2009
Sidekick Data Restoration Has Started, Microsoft Says. Barry Levine. NewsFactor. October 20, 2009.
Danger, a Microsoft subsidiary using ‘cloud computing’, experienced a system problem that erased all the users' contacts, calendar entries, to-do lists, and photos for those using the Sidekick smart-phone. Much of the data may be eventually recovered, but effective data backup and protection measures were not being followed. It shows the importance of using reliable vendors and have data backups. [This is the first major loss of ‘cloud – data’ that I know of.]
---
Millennial disc guarantees data preservation. Logan Bradford. Daily Universe. September 15, 2009.
Barry Lunt, a BYU information technologies professor, will launch a product with the company, Millenniata, that produces a disc just like a CD or DVD that will last up to 1,000 years. He learned, through his seven years working for IBM in computer data, that data on CDs and DVDs would decay and be lost over just a few years because of optical discs’ ephemeral qualities, such as when they are exposed to sunlight and humidity. [We have been testing these discs and writers.]
---
Wellcome Library to use JPEG2000 image format. Library blog. September 18, 2009
The Wellcome library in London has been using TIFF images as their archival storage format. But, anticipating adding over 30 million images, they wanted to find a way to efficiently store the digital content but still maintain high levels of quality and open standards required for long-term preservation. To do this they have chosen to use the JPEG2000 format in its digitization program. But the difficulty is that the JPEG2000 format has multiple versions. They wanted to know which version is best for long-term storage and access, so they commissioned a study by Kings College: JPEG 2000 as a Preservation and Access Format for the Wellcome Trust Digital Library. Robert Buckley, Simon Tanner.
Based on the study will adopt a "visually lossless" lossy compression to gain at least 75% storage savings in comparison to a TIFF version. “The recommended compression parameters will produce an image with no visible difference in image quality, but the compression is irreversible - i.e. the original bit stream will not be possible to reconstruct. As the Library will be digitising physical items that can (if necessary) be re-digitised, it was considered an acceptable compromise.” Some materials may be candidates for JPEG2000 lossless compression. They are also recommending that “JPEG 2000 be used with multiple resolution levels.”
---
The Swedish Research Council requires free access to research results. Press release. October 8, 2009.
Researchers granted funds by the Research Council should publish their scientific research in publications that are available according to Open Access guidelines within a maximum period of six months. "We consider that publication of research which has been paid for out of public funds should be made freely accessible to all." The Open-Access rules apply so far only to scientifically assessed texts in journals and conference reports, and not to monographs and chapters of books.
---
Sound archive of the British Library goes online, free of charge. Mark Brown. The Guardian News. 3 September 2009.
The British Library has made its archive of world and traditional music freely available on the internet. The Archival Sound Recordings archive contains about 28,000 recordings, estimated at 2,000 hours of sound. These recordings are from around the world and the oldest are from wax cylinders made in 1898. The Library wants to change the perception that “things are given to libraries and then are never seen again – we want these recordings to be accessible."
---
Keeping Research Data Safe2: Data Survey added to project website. Neil Beagrie. Blog. 26 Sep 2009.
Information about the project and link to the website. The project is to identify long-lived datasets for the purpose of cost analysis will be ending soon. It refers to the previous project. In the activity model it mentions it will look at the development of an archive’s selection policy, also staff training and development. One area of concern was of OAIS terminology potentially being a barrier to understanding for some user groups.
Tuesday, April 07, 2009
Digital Preservation Matters - 03 April 2009
Nevada Statewide Digital Initiative. Website. Updated 3 April 2009.
The purpose of the Nevada Statewide Digital Initiative is to: “Increase access to the collections held by Nevada's cultural heritage institutions through digital access to materials by residents of Nevada and scholars and researchers interested in Nevada's culture and history.” The series of activities to build statewide collaboration include:
- creating a collection policy;
- creating a website that links existing projects;
- adopting statewide best practice and standards;
- creating local partnerships that would build up to statewide partnerships;
- developing a digital pilot project curate and manage their digital materials.
Millenniata continues to make progress with its patent-pending Millennial Disc and Millennial Writer. Press Release. February 2, 2009.
This press release has information about a new optical disc that has been developed. It is designed to be a permanent archiving product that has no degradable components and “safely stores data for 1,000 years”. The technology makes a permanent change to the disc. It is referred to as Write Once Read Forever™ and can be read in a standard DVD drive. [check back for test results.]
Systemwide organization of information resources: a multiscalar environment. Lorcan Dempsey. Higher Education in a global economy: the implications for technology and JISC. 23 March 2009. [pdf presentation]
Interesting presentation that looks at libraries and their environment. Compares core components of companies and libraries. Examines a grid of Uniqueness and Stewardship, from Freely accessible web resources in the low-low quadrant to Special collections in the high-high quadrant, and shows where preservation appears. Moving from the institution to the multiscalar level.
Digital Project Staff Survey of JPEG 2000 Implementation in Libraries. David Lowe, Michael J. Bennett. University of Connecticut. March 20, 2009. [xls]
Preliminary findings of a survey about JPEG 2000, and to understand the community perception of it. JPEG 2000 is the product of efforts for an open standard. The concerns about implementing JPEG 2000 include: limited software tools, lack of functionality, and uncertainty of need. Some survey results of interest:
- 59.5% said they use the format,
- 19.7% use for new archival collections,
- 16. 3% use for converting tiff collections
- 53.5% use for online access
Other questions discuss the tools used and include comments about them.
Rocks Don't Need to Be Backed Up. Henry Newman. Enterprise Storage Forum. March 27, 2009.
General article about the need for digital preservation. “The first thing we need is a standardized framework for file metadata, backup and archival information.” “The integrity of modern data is not guaranteed except at high cost.” “We have no real framework to change and transcribe formats.”
[This is more about transferring information between computer systems rather than archival metadata. It shows the lack of interaction between digital preservation worlds. Some of the comments about the article are interesting.]
Goodbye, Encarta. A cautionary tale for newspapers? John Yemma. The Christian Science Monitor. March 31, 2009.
An article about how Wikipedia replaced the Encarta digital encyclopedia and what that points to. What Encarta did not do was to embrace the power of the internet, which includes almost instant updating. The “lesson is that general knowledge … can’t withstand an effort that was developed specifically for the Internet and that harnesses gifted amateurs.” There is power in open-source knowledge. Organizations can take their values with them, but it can’t take the old model, nor the old work habits. “The Web is its own universe with its own rules.”
INSIGHT into issues of Permanent Access to the Records of Science in Europe. PARSE.Insight. March 27, 2009. [pdf]
This document is to give an overview and details of technical and non-technical components which would be needed for science data infrastructures. The infrastructure components are aimed at bridging the gaps between areas of functionality, developed for particular projects, separated by either discipline or time. These components should play a unifying role in science data. They are developed within a European wide infrastructure, but there should also be advantages if these components are used more widely. The group has defined four main roles: funding, research, publishing, and storage/preservation.
Science Data Infrastructure: those things, technical, organization and financial which are usable across communities to help in the preservation, re-use and open access of digital holdings.
Preservation: meant in the OAIS sense of maintaining the usability and understandability of a digital object.
Representation Information: the OAIS term for everything that is needed in order to understand a digital object.
The report discusses some major threats. Those who responded marked these as “Important” or “Very Important”:
- Users unable to understand or use the data e.g. the semantics, format, etc
- Not able to maintain hardware, software or environment to make the information inaccessible
- No chain of evidence causing uncertain provenance or authenticity
- Access and use restrictions may not be respected in the future
- Inability to identify the data location
- The current data custodian may cease to exist
- Those responsible to look after the digital holdings may let us down
Any of the components must be able to be handed to another organization, and the Persistent Identifiers must transfer and resolve correctly. In general it is not possible to state that an object is authentic, other than providing evidence, such as technical details, to show the provenance of the object, or a social decision of trust.
Thursday, December 04, 2008
Hard drive sounds
Read more about each product and common problems, or listen to the sounds. They advise if you hear the sounds and your drive is still working, to back it up immediately. An interesting site.