Tuesday, December 29, 2015

Storage For The Next 5,000 Years

Storage For The Next 5,000 Years. Tom Coughlin. Forbes. Dec 15, 2015.
     We are creating as much information annually as mankind generated from the beginning of civilization to a few years ago. Some of the data is temporary while other data has longer-term value and may be useful in the future. As we generate and save more data the question is whether we can actually keep the data the long term with hardware or format obsolescence. "But even if data is transferred from older formats/media to modern formats regularly natural processes driven mostly by thermal energy can destroy data over time. The longer the data is kept the greater the chance of data corruption". 

Keeping data for a long time can be expensive and requires management and multiple copies of data on different hardware. While large organizations with valuable content can afford to protect their data, smaller organizations or consumers will find it difficult, though one way may be to move the data to managed cloud storage data centers where it can be managed by professionals. "Carrying data into the far future will require careful management of data to support multiple copies and continuous detection and elimination of data corruption". On-line archives may be able to provide access and archived data.

Wednesday, December 23, 2015

Personal Digital Archiving

Personal Digital Archiving.  Gabriela Redwine. DPC Technology Watch Report 15-01. December 2015. [PDF]
     This excellent report looks at some of the  key challenges people face in managing and storing their digital files. It "stresses the importance of preserving personal files" that include physical, digitized and born-digital materials. The term ‘personal digital archiving’ or ‘Save your digital stuff!’ refers to how people keep track of their digital files, where they store them, and how the files are described and organized.

The report reviews the archiving issues and offers guidance and resources to help individuals be proactive and save their digital content. It also argues for the "importance and urgency of preserving personal files while also acknowledging the difficulty of managing digital media and files". Personal items increasingly exist only in digital format. "This brings a new understanding of what letters, photos and other sources look like in the digital age, and raises important questions about how to manage these personal items today and how to preserve them for future generations."

"Thinking of a personal collection of digital files as ‘archives’ places emphasis on the larger context within which those digital files exist. The records of people’s lives are intrinsically important and worth preserving." Social media archiving necessarily requires a considerable investment of resources so it is important to choose which social media services should be archived.  Some key threats to a personal digital archive:
  • old hardware and software
  • lack of secure storage and backup 
  • natural and man-made disasters 
  • neglect of content
  • loss of cloud-based host or service provider
  • lack of planning
  • death of an individual
The report lists recommendations (quick wins, more effort, maximum effort) for the threats listed. Some of these are:

Recommendations: addressing key threats to personal digital files
  • Choose software that is well supported and creates files that can be read by a variety of different programs.
  • Develop file naming conventions that are easy to remember and apply these consistently.
  • Create multiple back-up copies and store them in different geographical locations.
  • Test your back-up copies to make sure they are accessible and contain what you intend them to.
  • Transfer files to new media every 2 to 4 years.

Recommendations: taking good care of a personal digital archive
  • Choose high-quality storage media and refresh it regularly.
  • Be proactive about refreshing storage media, replacing outdated equipment before it
  • fails, and not relying exclusively on one service provider or storage solution.
  • Follow best practice when naming files.

With digital preservation, and especially with creating and maintaining a personal digital archive the hardest part is deciding how to start. Start first by making a back-up copy of your files, then address questions such as these:
  • Which files would you miss most if they suddenly disappeared?
  • What qualities about those files are most important – for example, does it matter if the formatting of a word-processing document changes if the text is still readable?
  • Do your digital photos include important descriptive or contextual information that you need to use a particular program to see?

Monday, December 21, 2015

OhioLINK Adopts Ex Libris Rosetta for Digital Preservation

OhioLINK Adopts Ex Libris Rosetta for Digital Preservation. Ex Libris. Press release. December 21, 2015.
     OhioLINK has selected the Ex Libris Rosetta digital management and preservation solution for 120 academic libraries plus the State Library of Ohio. Rosetta will ensure long-term access to the OhioLINK Electronic Journal Collection (EJC), Electronic Book Collection (EBC), Electronic Theses and Dissertations (ETD) Center, and Digital Resource Commons (DRC) collections. - OhioLINK sought a preservation system based on the Open Archival Information System (OAIS) reference model that could integrate with its existing content management systems and support a wide range of processing workflows. As a large and complex consortium, OhioLINK required a solution that could be implemented and maintained in a way that suits a wide variety of content.

Friday, December 18, 2015

Digital preservation in 2016: 5 predictions

Digital preservation in 2016: 5 predictions. Jon Tilbury. ItProPortal. December 15, 2015.
     The article presents five trends that he sees in digital preservation from his point of view:
  1. Old analog media and file formats will continue to become obsolete. Betamax, "think of the floppy disk, the CD-ROM, Lotus 1-2-3 or WordStar." Digitize content to digitally preserve content against obsolescence.
  2. Moving critical long-term and permanent digital records to the safety of secure and open archival repositories, where records can be "future-proofed for the long-term".
  3.  Digital preservation will go mainstream. Cultural organizations have been managing and preserving digital content; now many commercial and government organizations are now understanding the need for long-term digital preservation.
  4. Use of the cloud for preserving digital content will continue to increase.
  5. Technology refresh cycles will get faster. The Digital Dark Age debate has helped to move digital preservation to a higher level.

Thursday, December 17, 2015

The Future of the Humanities in a Digital Age

The Future of the Humanities in a Digital Age. SDSU News. December 15, 2015.
    In a preview of  a January lecture, Vint Cerf was asked about his comment of a "digital dark age" in that storage formats could become incompatible with future hardware technologies. His response  was "I am deeply concerned that people take "digital preservation" to mean digitizing fixed text and imagery. What I worry about is that this format will prove to be unreliable if the software that interprets it is no longer available. We really need to figure out how to assure that digitized content can be preserved regardless of format."

Wednesday, December 16, 2015

5 Open Source Digital Preservation Tools to Assist Enterprise Archiving

5 Open Source Digital Preservation Tools to Assist Enterprise Archiving. Christopher J. Michael. Paragon Solutions. December 15, 2015.
     General article about digital preservation and some useful tools. "Digital archiving and preservation are needed to ensure the authenticity, integrity, and protection of electronic records despite limited resources and a constant stream of new complex technologies. "
  • "Digital preservation is the foundation of enterprise archiving."
  • "Electronic records are archived when they have long-term retention needs in order to fulfil legal, business and regulatory requirements."
  • A digital archive is a repository to store collections of digital objects to provide long-term access to the information.
There are some useful tools to help with the challenges of archiving and obsolescence:
  1. Matchbox: software to identify duplicate images.
  2. DROID: identify and standardize file formats and metadata extraction.
  3. Xena (XML Electronic Normalising for Archives): detect the file formats of objects and convert them into into open formats.
  4. ePADD: supports the appraisal, ingest, processing, discovery, and delivery of email archives.
  5.  Web Curator Tool: a tool for harvesting websites for archiving with descriptive metadata.
A clearly documented digital preservation policy that includes standard file formats and that is followed consistently will help ensure that objects in the archive will be available long term.

Tuesday, December 15, 2015

Building a Digital Preservation Strategy

Building a Digital Preservation Strategy. Edward Pinsent. DART Blog, University of London Computer Centre. 23 November 2015.
     A presentation on how to develop a digital preservation strategy. The blog and the slides included the following points:
  • Start small, and grow the service. Do it in stages
  • You already have knowledge of your collections and users – so build on that
  • Ask why you are doing digital preservation, who will benefit, and what are you preserving
  • Build use cases
  • Determine your own organisational capacity for the task
  • Reasons why metadata matters (intellectual control, manage and document
  • Determine your digital preservation strategies before talking to IT or vendors
The presentations also includes several scenarios that would address digital preservation needs incrementally and meet requirements for different audiences, such as archivists, records managers, and users:
  • Bit-level preservation (access deferred)
  • Emphasis on access and users
  • Emphasis on archival care of digital objects
  • Emphasis on legal compliance
  • Emphasis on income generation

Monday, December 14, 2015

Free OAIS Beginners Course – Update

Free OAIS Beginners Course – Update. Stephanie Taylor. DART Blog, University of London Computer Centre. 9 December 2015.
     An online course ‘A Beginners Guide to the OAIS Reference Model’ was launched in November for those interested in learning more about OAIS. The course remains open and free to anyone interested. "It’s been fantastic to see so much international engagement. We’ve also had a great cross-section of students in many roles from many kinds of organisations, including national memory institutions, higher education, cultural heritage, national and local government departments and the commercial sector." The blog has the link to sign up for the course.

Saturday, December 12, 2015

The FLIF format

The FLIF format. Gary McGath. Mad File Format Science blog. November 25, 2015.
     The post is a look at a new image format FLIF (Free Lossless Image Format) which claims to outcompress other formats for "any kind of image".  "It’s still a work in progress, and any new image format faces an uphill battle"against the existing well-established and well-funded formats. More information about the format is available at the FLIF website. The format is said to be "completely royalty-free and it is not encumbered by software patents." There is still work to do on support for additional metadata and color spaces.

Friday, December 11, 2015

oldweb.today website

oldweb.today.  Ilya Kreymer. Website. December 10, 2015.
     This is an interesting site that provides an emulator for various web browsers to search historic web sites. The tool Netcapsule, which can be used on the website oldweb.today, is built with open source tools that communicates with web archives. It allows you to browse "old web pages the old way with virtual browsers"; the user can navigate by url and by time. When the page is loaded "the old browser is loaded in an emulator-like setup" that can connect to the archive. Any archive that supports the CDX or Memento protocol interfaces can be a source. Full source code is available on Github.

Thursday, December 10, 2015

The Digital Preservation Network (DPN) Explained

The Digital Preservation Network (DPN) Explained. DuraSpace.org. December 8, 2015.
     The DPN digital preservation service guarantees academic institutions that scholarly resources will survive into the “far-future”. DPN is "the only large-scale digital preservation service that is built to last beyond the life spans of individuals, technological systems, and organizations". Like insurance, DPN provides a guarantee that future access to scholarly resources will be available in the event of any type of change in administrative or physical institutional environments. This is possible by establishing a redundant and varied technical and legal infrastructure at multiple administrative levels. DPN is a scholarly “dark archive” which means that the content stored is not actively used or accessed, but that it can be made available for use at any time from multiple digital storage facilities.

Academic institutions require that key aspects of their scholarly histories, heritage and research remain part of the record of human endeavor. DPN members will begin adding digital assets to the network through DuraCloud Vault, a cooperative development between DPN, DuraSpace and Chronopolis which will serve as the primary ingest point beginning in January.

The digital data revolution: top 5 storage predictions for 2016

The digital data revolution: top 5 storage predictions for 2016. Posted by Ben Rossi, Sourced from Nik Stanbridge, Arkivum. Information Age. December 9, 2015.
    The need for storage and archiving services keeps growing.
  1. Video footage will continue to require a lot of storage. "The requirement will be for very large amounts of highly secure, incorruptible long-term storage."
  2. Momentum will grow for outsourcing. "In-house IT will ‘let go’ and realise that the benefits of outsourcing to specialty archive storage providers will far outweigh concerns about security, access and control. IT will be happy not to have to worry about buying too much storage too early, or being caught short with not enough. They’ll realise that predictable costs and outsourcing resource-intensive headaches like upgrades and system migration will make a lot of sense. The clue is in the name: service. Using a managed service, as in-house IT departments already do for so many other services, will be a burden removed."
  3. Many will still confuse data archiving with data backup
  4. Scientific needs will outpace storage capacities
  5. Digital preservation will require ultra-reliable storage. One of the fundamental tenets of digital preservation is that it’s for the long-term
"With the rise of the Internet of Things, big data and personal data, there will be a huge and fundamental shift. And as organisations start to make things intelligent, this will become a major engine for creating new products and new services."

Wednesday, December 09, 2015

5 storage technologies I'm thankful for

5 storage technologies I'm thankful for.  Robin Harris. Storage Bits. November 27, 2015.
     The "basis of any civilization is the storage of its culture." Previous the culture was stored by physical means, books, art and such. As this changes to digital means, there are several storage technologies that the article mentions will help preserve culture long term:
  • Data encryption. "Because digital data is easy to copy and share, we need encryption to keep what is ours, ours alone."
  • The thousand year disc. The M-disc is the "only digital media with a lifespan as good as a well produced "book.
  • Scale-out object storage. Files that can be easily accessed from multiple servers, such as cloud services.
  • Advanced archive storage. "As we collect and store more information, archiving - not backup - becomes the critical success factor."
  • Solid-state storage. This "has revolutionized mobile device and enterprise storage."

Tuesday, December 08, 2015

Digital Preservation Handbook: revision

Digital Preservation Handbook: revision. Digital Preservation Coalition. October 2015.
     The original version of the handbook was compiled by Neil Beagrie and Maggie Jones; the revised 2nd edition of the Digital Preservation Handbook is being updated and released in stages between October 2015 and March 2016. The contents include:
  • Getting started
  • Organisational activities: Creating digital materials, acquisition and appraisal, retention and review, storage, legacy media, preservation planning, access, and metadata and documentation
  • Technical Solutions and Tools: Fixity and checksums, file formats and standards, information Security, cloud services, digital forensics, and persistent identifiers
  • Content-specific preservation: such as e-Journals, moving pictures and sound, and web-archiving
Sections to be added:
  • Digital preservation briefing
  • Institutional strategies
The revisions and additions will make this an even more valuable resource.


Monday, December 07, 2015

When the Technology Changes on You

When the Technology Changes on You.  Maha Bali. The Chronicle of Higher Education. November 9, 2015.
   An article about changing technology, ways to deal with it, efforts and costs that it takes to do that. The article was a result of Twitter changing the favorite icon from stars to hearts. How do we handle unexpected changes to the technology we use for work? Some of the changes encounter include when:
  • a website disappears. The Internet Archive may help, but you may want to have contingency plans
  • a tool changes or loses features. A change in features, such as the Twitter hearts can mean different things to different people and alter the way they work. "Tools distort what we are trying to express." Changes may require we discuss the situation with others but which can be beneficial.
  • a tool is down or unavailable. We should ensure that we have alternatives or backups
Some alternative plans may include hosting your own tools so that you can control them if there are changes. However, this does not always work, and there are other costs to consider.

Saturday, December 05, 2015

Digital Curation Coordinator, OhioLINK

Digital Curation Coordinator, OhioLINK. Online posting. December 3, 2015.
     The Digital Curation Coordinator will manage OhioLINK’s implementation of the Rosetta digital preservation platform; assist in developing required policies and procedures related to digital collections and preservation; represent OhioLINK at digital curation events; and interact with OhioLINK members on issues related to digital curation. 

Friday, December 04, 2015

March 2015 PASIG Meeting Presentations and Recent Webinars

March 2015 PASIG Meeting Presentations and Recent Webinars . Preservation and Archiving Special Interest Group (PASIG). March 11-13, 2015.
  Recent presentations and webinars from PASIG and ASIS&T are available on the PASIG site. These include:
  • March 2015 PASIG Meeting Presentations 
  • Tiered Adaptive Storage for Big Data and Supercomputing. Jason Goodman
  • Video Surveillance: Consuming I.T. Capacity At Significant Rates. Jay Jason Bartlett
  • Archive and Preservation for Collections Leveraging Standards Based Technologies and the Cloud. Brian Campanotti
  • What Would an Ideal Digital Preservation Technical Registry Look Like?. Steve Knight and Peter McKinney
  • Three Critical Elements of Long-Term Storage in the Cloud. Amir Kapadia
  • Policy Based Data Management. Reagan Moore
  • Digital Forensics and BitCurator. Christopher (Cal) Lee
  • The Essential Elements of Intelligently Managed Tiered Storage Infrastructures. Raymond Clarke
  • Implementing Sustainable Digital Preservation.
  • How to Access Your Digital Value at Risk:  An Introduction to the Digital Value at Risk.
  • Building Communities and Services in Support of Data-Intensive Research. Stephen Abrams
  • Storage Technology Trends for Archiving. Tom Wultich and Bob Raymond
  • Stewarding Research Data with Fedora and Islandora. Mark Leggott
  • Challenges of Digital Media Preservation in an Active Archive. Karen Cariani, David W. MacCarn
  • An Introduction to the National Digital Information Infrastructure and Preservation Program (NDIIPP) and its Digital Preservation Initiatives. Leslie Johnston
  • Digital Preservation in Theory and Practice:  A Preservation and Archiving Special Interest Group (PASIG) Boot Camp Webinar. Tom Cramer

Thursday, December 03, 2015

The direction of computing is only going in one way: to the cloud

The direction of computing is only going in one way: to the cloud.  Rupert Goodwins. Ars Technica. Nov 14, 2015.
     "The cloud is well on its way to becoming the standard model for IT." The cloud has changed the economics and usability of providing and using services, including the many mobile applications and services.

The most common cloud model is a mix of public cloud and private infrastructure: for convenience called the hybrid cloud. "The increasing use of hybrid cloud tech is a reflection of the economic drivers that pull more and more IT, corporate and consumer, towards the public cloud. The most fundamental driver is good old economy of scale." Because of the scale, "companies save three to four dollars on internal IT for every dollar they spend on shifting infrastructure and services to cloud."

New cloud providers may find it difficult to compete since companies such as Amazon and Google had such a head start: the "biggest challenges have been access to scalable software to build public and private clouds and networking technologies to connect them."
Even so, there have been and still are some big problems with cloud computing, reliability, the safety of your data, and security. "Cloud adoption is highly susceptible to perceptions of trust."  But the direction of computing is going towards the cloud. New opportunities are opening up and the constraints of pre-cloud computing are fading away."

Data-driven Decision Making at L. Tom Perry Special Collections

Data-driven Decision Making at L. Tom Perry Special Collections. Ryan Lee, Cory Nimer, Gordon Daines. Society of American Archivists. November 2015.
     Archivists are looking for new ways to identify materials to digitize. This case study looks at data-driven digitization and the decision process.  This study brings Web analytics and in-house use statistics together as a way to make more informed, data-driven decisions. "Digitizing and mounting materials publicly on the internet is a form of publishing, and success in publishing means knowing and targeting viewers." Unique Page Views "provide a sense of general interest, while circulation statistics suggest personal engagement with the materials themselves." These metrics provide a more accurate sense of the usefulness of the collections.

"The findings of this study reflect much of what Peter Hirtle suggests would happen as we digitize more and more of our special collections materials, when he stated that “[e]lectronic access will replace most uses of printed, paper copies, [and]… [t]he use of paper originals will decrease."  The study found that "digitization has often significantly reduced the use of originals in Perry Special Collections".

Wednesday, December 02, 2015

The Irony of Writing Online About Digital Preservation

The Irony of Writing Online About Digital Preservation. Meredith Broussard. The Atlantic. Nov 20, 2015.
     "Last month, The Atlantic published a lengthy article about information that is lost on the web. That story itself is in jeopardy." Article about the difficulties in preserving digital content long term, particularly from news sources that use a variety of methods and software to manage their information.  Some quotes from the article:
  • "There is no guarantee that we will be able to read today’s news on tomorrow’s computers. I’ve been studying news preservation for the past two years, and I can confidently say that most media companies use a preservation strategy that resembles Swiss cheese."
  • "News apps [interactive databases] aren’t being preserved because they are software, and software preservation is a specialized, idiosyncratic pursuit that requires more money and more specialized labor than is available at media organizations today.
  • “The challenges of maintaining digital archives over long periods of time are as much social and institutional as technological,” reads a report from 2003.
  • "When I started my research into news preservation, I thought there would be an easy technological solution. There isn’t. Every media company in the world grapples with the issue of digital archiving."
  • "Remember when Macromedia Flash was the new hot thing in journalism? Most of those elaborate Flash projects have disappeared now. They’re probably archived on Jaz drives in a storage room somewhere, next to boxes of color slides and piles of floppy disks and other outdated media. Future historians will likely lament this loss."
  • "The quantity and variety of information we now produce has outpaced our ability to preserve it for the future. Librarians are the only ones who are making sure that our collective memory is preserved. And they, along with small teams of digital historians elsewhere, are still trying to understand the scope of myriad challenges involved in modern preservation. If today’s born-digital news stories are not automatically put into library storehouses, these stories are unlikely to survive in an accessible way."
  • "The folks at the Internet Archive are thoughtful digital preservationists, and I am grateful every day for their work preserving our collective digital memory." "If I know exactly what web page I am looking for, the Internet Archive is very helpful."
  • "But if I don’t know exactly the web page that I want and exactly the day that the information appeared, I won’t be able to find the information in the Internet Archive."
  • "... we are losing digital history almost as soon as we make it.

Monday, November 30, 2015

Sharing the load: Jisc RDM Shared Services events

Sharing the load: Jisc RDM Shared Services events. Chris Awre. Digital Archiving blog. 25 November 2015.
     This post is a summary of the Jisc event he attended that was looking at shared services for research data management.  Most academic institutions are struggling to manage research data and some form of shared service provision will be of benefit.  The presentation "Digital Preservation Requirements for Research Data Management" that he and Jenny Mitcham gave "highlighted the importance of digital preservation as part of a full RDM service, stressing of how a lack of digital preservation planning has led to data loss over time, and how consideration of requirements has been based on long established principles from the OAIS Reference Model". Any RDM shared service should include digital preservation capabilities. There is a need to provide a suit of shared services, including providing a shared service platform for digital preservation and providing independent digital preservation tools.

Friday, November 27, 2015

High-speed digitization of historic artifacts

CultLab3D’s 3D scanning conveyor belt allows high-speed digitization of historic artifacts. Benedict. 3D printer and 3D printing news website. Nov 19, 2015.
     Researchers at the Fraunhofer Institute for Computer Graphics Research IGD have developed CultLab3D: a 3D scanning system that can create digital images of 3D objects. The project aim is to provide mass digitization, annotation and storage of historical artifacts for museums and other places of preservation. Quotes and notes from the article:
  • "Digital preservation is one of the most important methods of sustaining our cultural history."
  • digital preservation makes it possible to created and maintain scans of written texts
  • "Digital preservation of texts is one thing, but the preservation of physical artifacts is quite another."
  • while there is no real substitute for an authentic historical artifact, something should be done to preserve historical artifacts
This organization believes that the digital preservation of historical artifacts via 3D scanning is undoubtedly a worthwhile endeavor.

Wednesday, November 25, 2015

Tool Time, or a Discussion on Picking the Right Digital Preservation Tools for Your Program: An NDSR Project Update

Tool Time, or a Discussion on Picking the Right Digital Preservation Tools for Your Program: An NDSR Project Update. John Caldwell; Erin Engle. The Signal. November 17, 2015    
     "There are lots of tools out there, from checksum validators to digital forensics suites and wholesale preservation solutions." Instead of wanting the latest tool, ask if this right tool is right for you for this situation?  The NDSA project is looking at:

  •     studying current workflows;
  •     benchmarking current policies against best practices;
  •     reviewing and testing potential digital curation applications;
  •     proposing sustainable workflows that align with current digital curation standards; and
  •     producing a white paper to sum up current processes and propose next steps.

In order to determine what the right tool, there are some things you need to know:
  1. know your records: how electronic records are being managed, how archivists are processing them, and what happens with the materials after.
  2. what do you want the end result to be. 
  3. what tool to use for the task
    1. Placement: Where does the tool fit into your process?
    2. Purpose: What does the tool actually do?
    3. Utility: How easy is the tool to use and does its output make sense?
"The seemingly straightforward question of utility is fundamentally tied to the question of purpose, and also the viability question: is the tool a long-term solution or a quick fix for today?" They are finding that they need to add preservation metadata to the records and establish the record integrity as early in the lifecycle as possible.
 An interesting comment on the blog post: "Digital preservation systems are precisely that: systems. Systems are a complex set of elements (people, technologies) and the connections between them (policies, procedures). Without all of these pieces, there really isn’t a system. There is just a tool. A hammer isn’t a house, just as a tool isn’t a digital preservation system."

Tuesday, November 24, 2015

Five Takeaways from AOIR 2015

Five Takeaways from AOIR 2015. Rosalie Lack. Netpreserve blog. 18 November 2015. 
     A blog post on the annual Association of Internet Researchers (AOIR) conference in
Phoenix, AZ. The key takeaways in the article:
  1. Digital Methods Are Where It’s At.  Researchers are recognizing that digital research skills are essential. And, if you have some basic coding knowledge, all the better. The Digital Methods Initiative from Europe has tons of great information, including an amazing list of tools.
  2. Twitter API Is also Very Popular
  3. Social Media over Web Archives. Researchers used social media more than web archived materials.  
  4. Fair Use Needs a PR Movement. There is a lot of misunderstanding or limited understanding of fair use, even for those scholars who had previously attended a fair use workshop. Many admitted that they did not conduct particular studies because of a fear of violating copyright. 
  5. Opportunities for Collaboration.  Many researchers were unaware of tools or services they can use and/or that their librarians/archivists have solutions.
There is a need for librarians/archivists to conduct more outreach to researchers and to talk with them about preservation solutions, good data management practices and copyright.

Monday, November 23, 2015

Introduction to Metadata Power Tools for the Curious Beginner

Introduction to Metadata Power Tools for the Curious Beginner. Maureen Callahan, Regine Heberlein, Dallas Pillen. SAA Archives 2015. August 20, 2015.   PowerPoint  Google Doc 
      "At some point in his or her career, EVERY archivist will have to clean up messy data, a task which can be difficult and tedious without the right set of tools." A few notes from the excellent slides and document:

Basic Principles of Working with Power Tools
  • Create a Sandbox Environment: have backups. It is ok to break things
  • Think Algorithmically: Break a big problem down into smaller steps
  • Choosing a Tool: The best tools, works for your problem and skill set
  • Document: Successes, failures, procedures
Dare to Make Mistakes
  • as long as you know how to recognize and undo them!
  • view mistakes as an opportunity
  • mistakes can teach you as much about your data as about your tool
  • share your mistakes so others may benefit
  • realize that everybody makes them
General Principles
  • Know the applicable standards
  • Know your data
  • Know what you want
  • Normalize your data before you start a big project
  • The problem is intellectual, not technical
  • Use the tools available to you
  • Don’t do what a machine can do for you
  • Think about one-off operations vs. tools you might re-use or re-purpose
  • Think about learning tools in terms of raising the level of staff skill
  • XPath
  • Regex
  • XQuery
  • XQuery Update
  • XSLT
  • batch
  • Linux command line
  • Python
  • AutoIt

The Provenance of Web Archives

The Provenance of Web Archives. Andy Jackson; Jason Webber. UK Web Archive blog. 20 November 2015.
     More researchers are taking an interest in web archives.  The post author says their archive has "tried to our best to capture as much of our own crawl context as we can." In addition to the WARC request and response records, they store other information that can answer how and why a particular resource has been archived:
  • links that the crawler found when it analysed each resource 
  • the full crawl log, which records DNS results and other situations
  • the crawler configuration, including seed lists, scope rules, exclusions etc.
  • the versions of the software we used  
  • rendered versions of original seeds and home pages  and associated metadata.
Th archive doesn't "document every aspect of our curatorial decisions, e.g. precisely why we choose to pursue permissions to crawl specific sites that are not in the UK domain. Capturing every mistake, decision or rationale simply isn’t possible, and realistically we’re only going to record information when the process of doing so can be largely or completely automated". In the future, there "will be practical ways of summarizing provenance information in order to describe the systematic biases within web archive collections, but it’s going to take a while to work out how to do this, particularly if we want this to be something we can compare across different web archives."

No archive is perfect. They "can only come to be understood through use, and we must open up to and engage with researchers in order to discover what provenance we need and how our crawls and curation can be improved. " There are problems need to be documented, but researchers "can’t expect the archives to already know what they need to know, or to know exactly how these factors will influence your research questions."

Saturday, November 21, 2015

How Much Of The Internet Does The Wayback Machine Really Archive?

How Much Of The Internet Does The Wayback Machine Really Archive? Kalev Leetaru. Forbes.  November 16, 2015.
     "The Internet Archive turns 20 years old next year, having archived nearly two decades and 23 petabytes of the evolution of the World Wide Web. Yet, surprisingly little is known about what exactly is in the Archive’s vaunted Wayback Machine." The article looks at how the Internet Archive archives sites and suggests "that far greater understanding of the Internet Archive’s Wayback Machine is required before it can be used for robust reliable scholarly research on the evolution of the web." It requires a more "systematic assessment of the collection’s holdings." Archive the open web uses enormous technical resources.

Maybe the important lesson to learn is that we have little understanding of what is actually in the data we use and few researchers really explore the questions about the data.  The archival landscape of the Wayback Machine was far more complex than original realized, and it is unclear how the Wayback Machine has been constructed. This insight is critical. "When archiving an infinite web with finite resources, countless decisions must be made as to which narrow slices of the web to preserve." The selection can be either random or prioritized by some element.  Each approach has distinct benefits and risks.

Libraries have formalized over time how they make collection decisions. Web archives must adopt similar processes.  The web is "disappearing before our very eyes" which can be seen in the fact that  up to 14% of all online news monitored by the GDELT Project is no longer accessible after two months".  We must "do a better job of archiving the online world and do it before this material is lost forever."

Friday, November 20, 2015

Hydra: Get a head on your repository

Hydra: Get a head on your repository.  Hydra Project website. November 2015.
  • Hydra is a Repository Solution:  Hydra is an open source software repository solution used by institutions worldwide to provide access to their digital content.  Hydra software provides a versatile and feature rich environment for end-users and repository administrators.
  • Hydra is a Community: Hydra is a large, multi-institutional collaboration that gives institutions the ability to combine their repository development efforts into a collective solution beyond the capacity of any individual institution to create, maintain or enhance on its own. The project motto is “if you want to go fast, go alone.  If you want to go far, go together.”
  • Hydra is a Technical Framework: Hydra is an ecosystem of components that lets institutions build and deploy robust and durable digital repositories supporting multiple “heads”, which are fully-featured digital asset management applications and tailored workflows.  Its principal platforms are the Fedora Commons repository software, Solr, Ruby on Rails and Blacklight.  Hydra does not yet support “out-of-the-box” deployments but the Community is working towards such “solution bundles”, particularly “Hydra in a Box” and Avalon.

Developing Best Practices in Digital Library Assessment: Year One Update

Developing Best Practices in Digital Library Assessment: Year One Update. Joyce Chapman, Jody DeRidder, Santi Thompson. D-Lib Magazine. November 2015.
     While research and cultural institutions have increased focus on online access to special collections in the past decade, methods for assessing digital libraries have yet to be standardized. Because of limited resources and increasing demands for online access, assessment has become increasingly important. Library staff do not know how to begin to assess the costs, impact, use, and usability of digital libraries. The Digital Library Federation Assessment Interest Group is working to develop best practices and guidelines in digital library assessment. The definition of a digital library used is "the collections of digitized or digitally born items that are stored, managed, serviced, and preserved by libraries or cultural heritage institutions, excluding the digital content purchased from publishers."

They are considering two basic questions:
  1.     What strategic information do we need to collect to make intelligent decisions?
  2.     How can we best collect, analyze, and share that information effectively?
There are no "standardized criteria for digital library evaluation. Several efforts that are devoted to developing digital library metrics have not produced, as yet, generalizable and accepted metrics, some of which may be used for evaluation. Thus, evaluators have chosen their own evaluation criteria as they went along. As a result, criteria for digital library evaluation fluctuate widely from effort to effort." Not much has changed in the last 10 years in the area in regards to digitized primary source materials and institutional repositories. "Development of best practices and guidelines requires a concerted engagement of the community to whom the outcome matters most: those who develop and support digital libraries". The article shares "what progress we have made to date, as well as to increase awareness of this issue and solicit participation in an evolving effort to develop viable solutions."

Thursday, November 19, 2015

Old formats, new challenges: preservation in the digital world

Old formats, new challenges: preservation in the digital world. Kevin Bunch. C & G News. November 13, 2015.
     Without proper preservation, digital materials are going to degrade and become useless. Digital preservation is "basically coming up with policies and procedures to address mostly the obsolescence that happens with digital content. We know file formats die, we know operating systems and platforms die at some point, so how do we sustain this digital content through time?” In addition to hardware and media failing, there are also difficulties in reading old formats. Archivists generally try to convert files from on old format to an “open” format that will hopefully be in use for some time into the future. Some people work at converting analog media, like audio and video recordings, to open digital formats. It can be challenging as older equipment is outdated and fails. Analog magnetic media formats like VHS and audio cassettes are also "at an ever-increasing risk of deterioration, especially those from the 1980s or 1990s, and should be digitized as soon as possible".

Wednesday, November 18, 2015

iPRES workshop report: Using Open-Source Tools to Fulfill Digital Preservation Requirements

iPRES workshop report: Using Open-Source Tools to Fulfill Digital Preservation Requirements.  Jenny Mitcham. Digital Archiving at the University of York. 12 November 2015.
     The ‘Using Open-Source Tools to Fulfill Digital Preservation Requirements’ workshop provided a place to talk about open-source software and share experiences about implementing open-source solutions. Archivematica, Archivespace, Islandora and BitCurator (and BitCurator Access) were also discussed.

Sam Meister of the Educopia Institute talked about a project proposal called OSSArcFlow. "This project will attempt to help institutions combine open source tools in order to meet their institutional needs. It will look at issues such as how systems can be combined and how integration and hand-offs (such as transfer of metadata) can be successfully established". The lessons learned (including workflow models, guidance and training) will be available to others besides the 11 partners. 

Digital Preservation Videos for the Classroom

Back to School: Digital Preservation Videos for the Classroom. Erin Engle. The Signal, Library of Congress. August 30, 2013.
     There have been some educational programs created geared toward students and about the K-12 Web Archiving Program.  There is a Digital Preservation Video Series and here is a list of videos that educators may find most relevant. Some of those videos include:

Tuesday, November 17, 2015

Born Digital: Guidance for Donors, Dealers, and Archival Repositories

Born Digital: Guidance for Donors, Dealers, and Archival Repositories. Gabriela Redwine, et al. Council on Library and Information Resources. October 2013. [PDF]
     "Until recently, digital media and files have been included in archival acquisitions largely as an afterthought." People may not have understood how to deal with digital materials, or staff may not be prepared to manage digital acquisitions. The object is to offer guidance to rare book and manuscript dealers, donors, repository staff, and other custodians to help ensure that digital materials are handled, documented appropriately, and arrive at repositories in good condition, and each section provides recommendations for donors, dealers, and repository staff..

The sections of the report cover:
  • Initial Collection Review
  • Privacy and Intellectual Property
  • Key Stages in Acquiring Digital Materials
  • Post-Acquisition Review by the Repository
  • Appendices, which include: 
    • Potential Staffing Activities for the Repository
    • Preparing for the Unexpected: Recommendations
    • Checklist of Recommendations for Donors and Dealers, and Repositories
Some thoughts and quotes from the report:
  • it is vital to convince all parties to be mindful of how they handle, document, ship, and receive digital media and files.
  • Early communication also helps repository staff take preliminary steps to ensure the archival and file integrity, as well as the usability of digital materials over time.
  • A repository’s assessment criteria may include technical characteristics, nature of the relationship between born-digital and paper materials within a collection, information about context and content, possible transfer options, and particular preservation challenges.
  • Understand if there is a possibility that the digital records include the intellectual property of people besides the creator or donor of the materials.
  • Clarify in writing what digital materials will be transferred by a donor to a repository
    (e.g., hard drives, disks, e-mail archives, websites)
  • It is strongly recommended that donors and dealers seek the
    guidance of archival repositories before any transfer takes place.
  • To avoid changing the content, formatting, and metadata associated with the files, repositories
    must establish clear protocols for the staff’s handling of these materials.
The good practices in this report can help reduce archival problems with digital materials. "Early
archival intervention in records and information management will help shape the impact on archives of user and donor idiosyncrasies around file management and data backup."

Monday, November 16, 2015

Fixity Architecting for Integrity

Fixity Architecting for Integrity. Scott Rife, Library of Congress, presentation. Designing Storage Architectures for Digital Collections 2015. September 2015. [PDF]
     The Problem: “This is an Archive. We can’t afford to lose anything!” They are custodians to the history of the United States and do not want to consider that the loss of data is likely to happen. The current solutions:
  • At least 2 copies of everything digital
  • Test and monitor for failures or errors
  • Refresh the damaged copy from the good copy
  • This process must be as automated as possible
  • Recognize that someday data loss will occur
Fixity is the process of verifying that a digital object has not been altered or corrupted. It is a function of the whole architecture of Archive/Long Term Storage (hardware, software, network, processes, people, budget)
What costs are reasonable to reduce the loss of data?
Need to understand the possible solutions.  How much more secure will our customers content be if:
  • There is a third, fourth or fifth copy?
  • All content is verified once a year versus every 5 years?
  • More money is spent on higher quality storage?
  • More staff are hired
RAID, erasure encoding, is at risk due to larger disk sizes. With storage, there is a wide variation in price, performance and reliability. Performance and reliability are not always correlated with price. Choose hardware combinations to limit likely failures based on your duty cycle

Background reading list for Designing Storage Architectures for Digital Collections

Background reading list. Designing Storage Architectures for Digital Collections. Library of Congress. September 9, 2015.
     A list of items that may be representative of materials and projects related to the meeting topics. They might be useful to provide context for the meeting topics:

Friday, November 13, 2015

Alternatives for Long-Term Storage Of Digital Information

Alternatives for Long-Term Storage Of  Digital Information. Chris Erickson, Barry Lunt. iPres 2015. November 2015.   Poster  Abstract
     This is the poster and abstract that Dr. Lunt and I created and was presented at iPres 2015. The most fundamental component of digital preservation is storing the digital objects in archival repositories. Preservation Repositories must archive digital objects and associated metadata on an affordable and reliable type of digital storage. There are many storage options available; each institution should evaluate the available storage options in order to determine which options are best for their particular needs. This poster examines three criteria in order to help preservationists determine the best storage option for their institution:
  1. Cost
  2. Longevity
  3. Migration Time frame
Each institution may have different storage policies and environments. Not every situation will be the same. By considering the criteria above (the storage costs, the average lifespan of the media and the migration time frame), institutions can make a more informed choice about their archival digital storage environment. The poster has more recent cost information than what is in the abstract.

Thursday, November 12, 2015

Digital Curation Decision Form

Digital Curation Decision Form. Chris Erickson. Harold B. Lee Library. November 13, 2015.
Latest version is found here: Policies and Procedures
     This is the [former] version of our Digital Curation Decision Form (old version). The form is used by subject specialists (curators, subject librarians, or faculty responsible for collections) to determine
  • what materials should be included in our Rosetta Digital Archive; 
  • whether additional copies are needed, including copies on M-Discs; and 
  • whether or not the digital collection is a preservation priority. 
Additional questions ask about access to the preservation copies; the preservation actions needed; and directions on content options if format migration is needed. The form was created to help subject specialists determine what should be preserved, even if they are unaware of digital preservation topics. In practice, we complete the form during an interview with new subject specialists. Documentation will be added when the final version is approved.

Monday, November 09, 2015

Web Archiving Questions for the Smithsonian Institution Archives

Five Questions for the Smithsonian Institution Archives’ Lynda Schmitz Fuhrig. Erin Engle. The Signal. October 6, 2015.   
     Article about the Smithsonian's Archives and what they are doing. Looks at the Smithsonian Institution archives its own sites and the process. Many of the sites contain significant content of historical and research value that is now not found elsewhere. These are considered records of the Institution that evolve over time and they consider that it would irresponsible as an archives to only rely upon other organizations to archive the websites. They use Archive-It to capture most of these sites and they retain copies of the files in their collections. Other tools are used to capture specific tweets or hashtags or sites that are a little more challenging due to the site construction and the dynamic nature of social media content.

Public-facing websites are usually captured every 12 to 18 months, though it may happen more frequently if a redesign is happening, in which case the archiving will happen before and after the update. An archivist appraises the content on the social media sites to determine if it has been replicated and captured elsewhere.

The network servers at the Smithsonian are backed up, but that is the not the same as archiving. Web crawls provide a snapshot in time of the look and feel of a website. "Backups serve the purpose of having duplicate files to rely upon due to disaster or failure" and are only saved for a certain time period. The website archiving we do is kept permanently. Typically, website captures may not going to have everything because of excluded content, blocked content, or dynamic content such as Flash elements or calendars that are generated by databases. Capturing the web is not perfect.

Monday, November 02, 2015

Emulation as a Tool. What Can Emulation Do for You?

Emulation as a Tool. What Can Emulation Do for You? Dr. Klaus Rechert. CurateGear 2015. January 7, 2015.
     Emulation can be used as a tool for:
  • Contextualization, To identify, describe and preserve object environments
  • Generalization. To allow the environment to be run everywhere
  • Preservation Planning. Prepare environments to run long term
  • Publication & Access. Provide citation of objects in context; allow reuse
Emulation as a Service (EaaS)
  • Encapsulation of different emulators and technology to common component 
  • Centralize technical services
  • Hide technical complexity of emulation through web interfaces
  • Browser-based access
Preservation of and access to inherited personal digital assets
  • Provides citation support
  • Available with simple browser-based access 
  • Make emulated content embeddable and shareable like Youtube videos 

The Shanghai Library Selects Ex Libris Rosetta

The Shanghai Library Selects Ex Libris Rosetta. Press release. Ex Libris. November 2, 2015.
     The Shanghai Library, the second largest library in China and one of the world’s largest public libraries, chose Rosetta to manage and preserve its vast collection of digitized records such as ancient books, sound recordings, manuscripts, genealogy resources, archives (such as the Sheng Xuanhuai Archives, books and journals published in the Republic period, and the North China Daily News). Rosetta’s support for multiple languages and its customized Chinese interface will enable library staff to deposit diverse content into the system and expose a wide range of rich Chinese heritage to the world. "Rosetta was the only solution on the market that supports the whole spectrum of digital asset management and preservation, from ingest and export, to collection management and publishing.”

Research data management: A case study

Research data management:  A case study. Gary Brewerton. Ariadne, 74. October 12, 2015.
Loughborough University faced a number of challenges in meeting the expectations of its research funders, especially in three areas:
  • publishing the metadata describing the research data that it holds
  • where appropriate providing access to the research data
  • preserving the research data for at least ten years since last accessed
They did a survey of their research groups to determine existing data management practices and storage requirements. The data could take a variety of formats and vary dramatically in size. Also, not all the data collected by the researchers would need to be preserved. This made it hard to predict the amount of storage needed. Instead of using the existing institutional repository, at possible archiving and discovery solutions and decided on two:
  • Arkivum: a digital archiving service guaranteeing long-term preservation of data deposited
  • figshare: a cloud-based research sharing repository
Each of these answered a different need: "Arkivum could provide the storage and preservation required, whilst figshare addressed the light-touch deposit process and discoverability of the research data." Both suppliers were asked to work together to develop a platform to meet all the University’s needs, and a two tier implementation occurred, and faculty reaction to the platform has been very positive to the interface and the deposit workflow.  It "remains to be seen how researchers will engage with the platform in the mid- to long- term, but it is clear that advocacy will need to remain an ongoing process if the platform is going to achieve continued success."

Saturday, October 31, 2015

This is how we wash ... discs!

Så gør vi sådan, når vi vasker… plader! Digitalbevaring.dk. October, 2015.
     The Danish State Library has started to wash a large part of their old records so they can be digitized. Some can be cleaned by just wiping off the dust with a dry brush but other need a turn in the discwasher. This is a machine that looks like a turntable but instead of a needle has a small vacuum system that sucks water from the record. Another article on this, Disc washing at the library, has more information with images of the process of digitizing the 78 rpm record discs. There are about 37,000 Danish shellac discs (78 rpm records) and the audio engineer digitizes about 10 to 12 discs a day.  The process is to register the items to be digitized, check the condition of the disc, after which they are washed and cleaned. About three or four discs can be washed per hour. In the digitization room is the recording machine and an audio engineer who listens to the record and based on what he hears, chooses one of four different pickup needles for the final digitization. 

Friday, October 30, 2015

Vital information could be lost in 'digital dark age' warns professor

Vital information could be lost in 'digital dark age' warns professor. Sarah Knapton. The Telegraph. 11 Oct 2015.
     Professor David Garner, former president of the Royal Society for Chemistry, said that the world faces an information 'dark age' because so much information is stored digitally, and that "wherever possible, scientific data should be printed out and kept in paper archives to avoid crucial research being lost to future generations." Other quotes from the article are:
  • “Digital storage is great and has put knowledge in an instantly accessible form, but things really need to be backed up in paper formats as well. In my own lifetime I have experienced not being able to access information any longer because the formats are now out of date. I am not a luddite, and I think the internet is fantastic. But while it’s great to have a Plan A, we really need to have a Plan B. It’s really important that we have accessible paper archives. We risk a lot of information being lost without adequate paper copies."
  • Digital materials are especially vulnerable to loss and destruction because they are stored on fragile magnetic and optical media which can deteriorate and can easily be damaged by exposure to heat, humidity, and short circuits.
  • While a book can be left on a shelf for hundreds of years with little damage, information can suffer ‘bit rot’ where it can no longer be accessed. And opening each file manually to save it in a readable form would never be possible.
  • "Long term accessibility of data was not really taken into account in the 1980s and 1990s in the way it is now and I am delighted that there are a number of initiatives underway for the long term preservation of digital data," added Prof Garner.

Wednesday, October 28, 2015

DPOE Plants Seed for Statewide Digital Preservation Effort in California

DPOE Plants Seed for Statewide Digital Preservation Effort in California. Barrie Howard. The Signal. October 9, 2015.
     The Library of Congress partnered with the State Library of California to host a three-and-a-half day workshop to increase the knowledge and skills of those providing long-term access to digital content. The California Preservation Program provides consultation, information resources, and preservation services to archives, historical societies, libraries, and museums across the state. They want to help librarians, archivists, and museum curators educate others and advocate for statewide digital preservation services. The state's smaller memory institutions need help with digital preservation. The workshop helped participants think about how to work across jurisdictional and organizational boundaries to meet the needs of all state cultural heritage institutions, especially small organizations with very few staff.

Saturday, October 24, 2015

The National Film Board’s CTO offers a close-up look at its digital archiving project

The National Film Board’s CTO offers a close-up look at its digital archiving project. Shane Schick. IT World Canada. October 16, 2015.
     The Canadian National Film Board has been putting together the technology, processes and policies to change the way films are produced, collected and stored. The NFB collection needs a particular set of metadata because of the versions produced.

Archiving digital content is an ongoing challenge for many organizations because the volume of content and also "the fact that formats change, and ensuring the long-term accessibility and quality can be uncertain". The organization tries to stay ahead of the difficulties by adhering to to four ‘golden rules’ of archiving. These include:
  1. There must be a process to continually check the integrity of the data which has been stored.
  2. Open file formats should be used whenever possible, in order to avoid frequent data migrations.
  3. Obsolescence of the storage hardware should be assumed as inevitable.
  4. Two copies of all content or media assets should be maintained on different technologies, in different locations, which is the "most critical" part.
While LTO tapes are often used in the industry, the organization uses ASG’s Digital Archive (based on Sony’s Optical Disk Array). These discs have a 50 year life expectancy. They still use LTO for backup, but now they have the optical element that they can go back to. “The archiving system allowed us to think beyond the film.”  The new way of thinking is very open. "We can ingest content as we produce it."  

Friday, October 23, 2015

Metadata for your Digital Collections

Metadata for your Digital Collections. Jenn Riley. Indiana Cooperative Library Services Authority. March 6, 2007.
    A slideshow about metadata that I came across while preparing a presentation. A summary:
There are many definitions of metadata; generally it can be defined as structured information about an information resource. The presentation looks at the uses, structure and types of metadata:
  • Descriptive metadata
  • Technical metadata
  • Preservation metadata
  • Rights metadata
  • Structural metadata
Each of the various metadata types have their structures, values, benefits, and limitations, including:
  • Dublin core, inability to "provide robust record relationships".                          
  • Qualified Dublin Core
  • MARC
  • MARCXML, "the exact structure of MARC21 in an XML syntax"
  • MODS, "'MARC-like' but intended to be simpler"
  • Others include Visual Resources Association Core, TEI, EAD, FRBR,
The standards are important now because it will help in migrating to other systems later and the collections will be more inter-operable.  Good digital collections are:
  • Inter-operable, shareable and searchable
  • Persistent
  • Re-usable for multiple purposes
It notes that "good metadata promotes good digital collections". To share the metadata it needs to be prepared to map across other formats and systems. A map or 'crosswalk' can be created to do this. It is "good practice to create and store most robust metadata format possible." You need to find the right balance for your metadata. Good shareable metadata should involve:
  • content
  • consistency
  • coherence
  • context
  • communication
  • conformance
That is what the standards help to do.

Thursday, October 22, 2015

Preparing for format migration

Preparing for format migration. Chris Erickson. Presentation to the Utah State Archives fall conference. October 22, 2015. [PDF presentation]
     The presentation begins with terms and definitions of digital preservation, obsolescence, fixity, migration, refreshing, and formats. Formats include hardware, software, media, and systems. The purpose of migration is:
  1. Avoid media failure
  2. Avoid obsolescence
  3. Benefit from new technologies
The goal of migration is to change the object to deal with software and hardware developments but not affect the original representation. There are some cautions (cited):
  • “Data migration success rates are never 100%”
  • Successive storage/migration cycles accumulate failures, data corruption and loss.
  • Even if data migration is flawless, repeated migrations will take its toll on the data “the nearly universal experience has been that migration is labor-intensive, time-consuming, expensive, error-prone, and fraught with the danger of losing or corrupting information.”
The presentation provides an overview of creating a migration plan, advance preparations and follow up actions. Some of the issues are from my personal data migrations, as well as corporate examples. In the end, it is important to clearly understand what you have and what you need to do, then to start, even if it is a small step.

Wednesday, October 21, 2015

DPC invites members to review the OAIS Standard

DPC invites members to review the OAIS Standard. William Kilbride, et al. Digital Preservation Coalition,  Open Preservation Foundation. October 21, 2015.
"DPC is delighted to welcome members to participate in the review of OAIS, work that will hold our interest for a couple of years and which we aim to build into a platform for collaboration among our diverse members in the future.

The OAIS standard published by both the Consultative Committee for Space Data Systems (CCSDS) and as ISO14721 has been highly influential in the development of digital preservation. As a reference model it provides a common basis for aligning disparate practice in diverse institutional settings. A range of standards have emerged around and related to OAIS including PREMIS (for preservation metadata), ISO16363 (for certification) and PAIMAS (for exchange between Producers and Archives).

Since OAIS was initially proposed the digital preservation community has grown tremendously in absolute numbers and in diversity. OAIS adoption has expanded far beyond the space data community to include cultural heritage, research data centers, commerce, industry and government.

The digital preservation community has – we have! – a responsibility to keep our standards relevant. The upcoming ISO review of the OAIS standard in 2017 offers a chance for a cooperative, transparent review process. It also creates an opportunity for further community building around OAIS and related initiatives.

"The outcome from this activity is not simply a wiki nor is it a set of recommendations. By providing a shared open platform for the community that gathers around the OAIS we aim to ensure on-going dialogue about our standards and their implementation in the future.
In this sense the 2017 review is a milestone on the way to an engaged and empowered community rather than a destination.
  • OAIS Community forum via a wiki: Your feedback and the discussions on this wiki will provide raw material for an editorial committee of the most active participants to formulate recommendations which will result in a formal submission to the 2017 review. So sign in and add your views!
  • Exploring official mechanisms: Official mechanisms for the review of ISO standards are well established via National Standard Bodies and these will be explored and used to give input for the review.
  • Active Interaction: Ensuring inclusion for this large, diverse community will mean collaborative virtual meetings are necessary but we all recognize the value of meeting face to face and will seek to enable this.
Join the community and contribute your views on the wiki here: http://wiki.dpconline.org/index.php?title=OAIS_Community

University of Alabama at Birmingham Selects Ex Libris Rosetta for Digital Preservation

The University of Alabama at Birmingham Opts for Ex Libris Alma, Primo, and Rosetta Solutions. News Release. Ex Libris. October 21, 2015.
     The University of Alabama at Birmingham (UAB) has selected the Ex Libris products, including the Rosetta digital asset management and preservation solution. "Rosetta’s end-to-end digital asset infrastructure will preserve digital resources at both libraries and keep such resources accessible for future generations.... We are acquiring Rosetta to support preservation for UAB’s digital assets, ranging from institutional memory to research data."

Tuesday, October 20, 2015

Faculty receive digital preservation grant for statewide project

Faculty receive digital preservation grant for statewide project. Press Release. Indiana State University. October 15, 2015.
     Faculty members in the College of Education received am IMLS Library Services Technology Act grant to partner with the Indiana State Library to establish the Indiana Memory Digital Preservation Collaborative. The collaborative is a statewide initiative to provide an affordable and sustainable digital preservation solution for small to mid-sized cultural heritage organizations that lack the necessary resources to manage the digital files in their collections.The collaborative will join the   MetaArchive Cooperative. The grant funding will be used for education, hardware and data preparation.

Monday, October 19, 2015

Published Preservation Policies

Published Preservation Policies. Carl Wilson, Barbara Sierman. Scape. Aug 11, 2015.
    The SCAPE project gathered a number of policies "concerning the creation of the Policy Framework". Other sources, such as a report in the Signal, Analysis of Current Digital Preservation Policies: Archives, Libraries and Museums, by Madeline Sheldon, were helpful when creating this overview of published preservation policies. The policies are divided in four categories:
  •     Libraries
  •     Archives
  •     Data Centers
  •     Miscellaneous
These policies are not all "preservation policies" and may be published under different headings.

Google Drivageddon and Docsapocalypse are here: Why I’m typing this in Microsoft Word

Google Drivageddon and Docsapocalypse are here: Why I’m typing this in Microsoft Word. John Brandon. Computerworld.
     Article about a writer unable to access documents on the Internet when there was an outage on Google. These outages may seem minor, but they are not if you cannot access the content you need.  The author also had "some of my documents in a backup saved to a hard drive, which is a good thing. What’s not a good thing? Having a total lack of control over the situation. None." But this outage is a wake-up call to save every document, not just a few, to another place. "It’s time to not put every egg in the Google basket."
[A good example of why we need multiple managed copies. -cle]

Saturday, October 17, 2015

Flooding Threatens The Times’s Picture Archive

Flooding Threatens The Times’s Picture Archive. David W. Dunlap. New York Times. October 12, 2015.
     A broken pipe sent water cascading into the storage area where The Times keeps its collection of historical photos, newspaper clippings, microfilm records, books and other archival material. About 90 percent of the affected photos would be salvageable, but how many were lost remains unknown. The card catalog was not damaged; otherwise it would be impossible to locate materials in the archive. "What makes the card catalog irreplaceable is that it has never been digitized. Hundreds of thousands of people and subjects are keyed by index numbers to the photo files, which contain an estimated six million prints and contact sheets." This "raised the question of how in the digital age... can some of the company’s most precious physical assets and intellectual property be safely and reasonably stored?"

Tuesday, October 13, 2015

Presentations from Library of Congress Storage Architectures Symposium 2015

Presentations from Library of Congress Storage Architectures Symposium 2015. Clifford Lynch. CNI. October 12, 2015. [PDF files]
     The presentations from the Library of Congress 2015 Symposium on Storage Architectures for Digital Collections are now available. The presentations during the symposium include:
  • Technology Overview of Library of Congress Storage Architectures and also Industry
  • Technical Presentations: Tape Futures, Object Storage, Fixity and Integrity
  • Community Presentations
  • Alternative Media Presentations: Digital Optical, DNA
  • Look Back/Future Predictions of Storage

Monday, October 12, 2015

Digital Curation as Journalism

Digital Curation as Journalism. Online Journalism 2. September 28, 2015.
     An interesting perspective:  
Curate – To gather, source, verify and redistribute information or social media elements to track an event. If done well, it can make sense of chaos and create a narrative of an event.  “I think curation has always been part of journalism; we just didn’t call it that.” – Andy Carvin.

Social Media Usage: 2005-2015

Social Media Usage: 2005-2015. Andrew Perrin. Pew Research Center. October 8, 2015
     Results of report on social network usage statistics. "Nearly two-thirds of American adults (65%) use social networking sites, up from 7% when Pew Research Center began systematically tracking social media usage in 2005".  The figures reported here are for social media usage among all adults, not just among those Americans who are internet users.
  • Age differences: 
    • 90% of young adults use social media
    • 35% of all those 65 and older report using social media 
  • Gender differences: 
    • 68% of all women use social media
    • 62% of all men use social media
  • Socio-economic differences: 
    • Those in higher-income households were more likely to use social media. 
    • Over 56% of the lowest-income households now use social media. 
    • Those with college experience are more likely to use social media than those with high school degree or less
  • Racial and ethnic similarities: There are no notable differences by racial or ethnic group: 
    • 65% of whites, 65% of Hispanics and 56% of African-Americans use social media today.
  • Community differences: 
  • Today, 58% of rural residents, 68% of suburban residents, and 64% of urban residents use social media.

Saturday, October 10, 2015

Software benchmark initiatives

Software benchmarks in digital preservation: Do we need them? Can we have them? How do we get them? Kresimir Duretec. Open Preservation Foundation Blog. 9th Oct 2015.
     Blog post that addresses the need for improving software evaluations in digital preservation. "A significant part of the work in digital preservation field is dependent on various software tools." Achievements have been made in various areas of digital preservation but it is quite hard to quantify how successful this has been. The lack of demonstrated evidence an important research challenge to be addressed. The BenchmarkDP project explores improving software evaluations in the digital preservation field with software benchmarks, as discussed in their paper. Two initiatives have been started:
  1. A Benchmarking forum at this year’s IPRES conference to discuss possible scenarios which are in need of proper benchmarks. 
  2. A short consultation to gather more information around current practices in software evaluation.
These initiatives should be a good starting point for a wider community involvement and better understanding of software evaluation needs in the digital preservation field.

Friday, October 09, 2015

Questions to ask when you learn of digitization projects

Questions to ask when you learn of digitization projects. Sarah Werner. Wynken de Worde. 6 October 2015.
    With new digitization projects that we hear about it may be helpful to ask some questions:
  1. Who financially benefits from such agreements? Sometimes researchers forget that the primary commercial digitization projects "isn’t to enable access to cultural heritage materials but to make money. And cultural heritage institutions have not always prioritized open access to their collections over monetizing them, either."
  2. Who is going to have access to the resulting images? In commercial projects the results "are typically limited to institutions who can pay to subscribe to the commercial database". 
  3. Who is not going to have access to the images? It is important to realize who will be excluded from such projects.  
  4. What will you be able to do with the resulting images? "Most commercial databases retain copyright over their digitized products and do not license them beyond personal use".
  5. How will this impact the ability of researchers to access the original documents? "If you are a holding institution that will be restricting access to your newly digitized collection, will you help fund scholars to come use your database if their institution doesn’t subscribe to it?"
Without knowing these kinds of details we won't know if these enterprises are "good or bad things". The projects can be expensive, and balancing the access and the cost can be complicated. "But researchers and librarians should ask themselves this list of questions before cheerleading announcements." How will we support institutions in order to "create high quality digitizations without selling our cultural heritage to the highest bidder?"