Showing posts with label cloud. Show all posts
Showing posts with label cloud. Show all posts

Thursday, October 27, 2016

Exit Strategies and Techniques for Cloud-based Preservation Services

Exit Strategies and Techniques for Cloud-based Preservation Services. Matthew Addis. iPres 2016. (Proceedings p. 276-7/ PDF p. 139).
   This poster discusses the need for an exit strategy for when organisations that use cloud-based preservation services, and understanding what is involved in migrating to or from a cloud-hosted service. It specifically looks at Arkivum and Archivematica. Some of the topics include Contractual agreements, data escrow, open source software licensing, use of independent third-party providers, and tested processes and procedures in order to mitigate risks. The top two issues are
  1. the need for an exit strategy when using a cloud preservation service, and
  2. the need to establish trust and perform checks on the quality of the service
It mentions that “full support for migrating between preservation environments has yet to be implemented in a production preservation service.” The approach used in the poster includes:
  • Data escrow
  • Log files of the software versions and updates
  • Ability to export database and configuration
  • Ability to test a migration
 It is important to remember in a migration test that “production pipelines may contain substantial amounts of data and hence doing actual migration tests of the whole service on a regular basis will typically not be practical”.  “Hosted preservation services offer many benefits but their adoption can be hampered by concerns over vendor lock-in and inability to migrate away from the service, i.e. lack of exit-plan.“   

Monday, October 10, 2016

Secure cloud doesn’t always mean your stuff in it is secure too

Secure cloud doesn’t always mean your stuff in it is secure too. Gareth Corfield. The Register. 6 Oct 2016.
     Workflows are moving to the cloud and security technology is helping to build customer confidence. “Picking a secure cloud partner is not as trivial as it may seem. Don't assume that because the cloud is secure, your business within the cloud is secure."  The public cloud can provide  better security monitoring and analysis, management, redundancy and resilience. But you have to choose a secure cloud platform. Microsegmentation can help secure the platform against malware and other security threats. It helps to improve operational efficiency. The cloud provides many services, more than just storage.

Thursday, September 22, 2016

Content Delivery Drives The Move To The Cloud

Content Delivery Drives The Move To The Cloud. Tom Coughlin. Forbes. Sep 13, 2016.
     The growing reliance on the Internet is also increasing cloud-based services for collaborative workflows and content delivery in the Media and Entertainment Industry. This is causing a shift from capital expenses to operating expenses for media and entertainment content storage.  Cloud storage for the media and entertainment industry is projected to grow from $2.5 billion in 2016 to over $20 billion by 2021.  Archiving and preservation is a large part of this, seen in this chart.

Monday, February 08, 2016

New digital preservation solution from Arkivum

New digital preservation solution from Arkivum, shaped to grow with your data. Nik Stanbridge. Arkivum Press release. January 21, 2016.
     Arkivum is launching a new cloud-based digital preservation and archiving service with Artefactual Systems Inc. of Vancouver. "Arkivum/Perpetua is a cost-effective, comprehensive, fully hosted and managed digital preservation and public access solution that uses Archivematica and AtoM (Access to Memory) services in the cloud."

In a survey of archivists and data curators, 87% said "file format preservation and data integrity were important elements to their digital preservation workflow. And a third of respondents stated that they would be using a cloud-based solution for their digital preservation data."

Saturday, February 06, 2016

MRF for large images

MRF for large images. Gary McGath. Mad File Format Science Blog. January 21, 2016.
NASA, Esri speed delivery of cloud-based imagery data. Patrick Marshall. GCN. Jan 20, 2016.
     NASA and Esri are releasing to the public a jointly developed raster file format and a compression algorithm designed to deliver large volumes of image data from cloud storage.  The format, called MRF (Meta Raster Format) together with a patented compression algorithm called LERC, can deliver online  images ten to fifteen times faster than JPEG2000. The MRF format breaks files into three parts which can be cached separately. The metadata files can be stored those locally so users can "examine data on file contents and download the data-heavy portions only when needed". This would help to minimize the number of files that are transferred. The compression allows users to get faster performance, lower storage requirements, and they estimate the cloud storage costs would be about one-third as much as traditional file-based enterprise storage. An implementation of MRF from NASA is available on GitHub and an implementation of LERC is on GitHub from Esri.

Tuesday, January 26, 2016

Cloud-supported preservation of digital papers: A solution for special collections?

Cloud-supported preservation of digital papers: A solution for special collections? Dirk Weisbrod. Liber Quarterly. January 2016.
     A problem for Special Collections is that in many cases digital media have replaced paper for many writers. Digital papers are "difficult to process using established digital preservation strategies, because of their individual and unique nature". The article suggests that document creators should be involved in the preservation process, and that special collections should look at the cloud as a way to solve the problem.

The relatively short durability of digital media is in contrast to the durability of paper. An example in the article shows that data were lost from an Atari computer after a short period of time. Both paper and digital media can be destroyed or damaged, but the potential loss of digital media is much higher since there are many software and hardware components that can fail. The computer skills of the writers can also influence the degree of preservation of the personal digital documents. "To minimize those risks is the task of digital preservation".

A writer’s archive of digital objects (documents, email correspondence, texts, photos, and such) may be scattered over a variety of social networks and web services. This will affect the acquisition of the content by an archive, which would have a problem of identifying and acquiring the digital objects, including accessing the online services, which may be passworded. 

Archives and special collections need to manage these processes for digital preservation and develop a preservation strategy that "matches with the characteristics of digital papers". This needs to change from a “custodial” to a “pre-custodial” view and work with writers and their lifelong personal archives. Writers should contribute to the digital preservation of their own works. Some approaches to consider:
  • Regular captures of the creators’ digital data by preservation specialists to be transferred directly into a managed digital repository.
  • The periodic transfer of data from old hardware and media to a special collection.
  • Have preservation specialists help writers maintain their digital materials
  • Provide self archiving of email archives
  • IT-supported self-archiving and automated data transfers. The solution could includes services such as
    • email archive, like Mailbox
    • data storage, like Dropbox 
    • website hosting
These approaches could help solve the problem of ongoing archiving while the original objects remain on the creator’s computer and continue to be updated. Another potential problem if writers use cloud services is that accounts may be cancelled if inactive. Archives and Special Collections should consider the cloud not as a problem but as an opportunity to work with authors. "By establishing a cloud, special collections get an instrument that provides writers with a reasonable working environment and, at the same time, enables the preservation of their personal digital archives. The time span between an object’s creation and its preservation, this critical factor of digital preservation, reduces to a minimum."


Thursday, December 03, 2015

The direction of computing is only going in one way: to the cloud

The direction of computing is only going in one way: to the cloud.  Rupert Goodwins. Ars Technica. Nov 14, 2015.
     "The cloud is well on its way to becoming the standard model for IT." The cloud has changed the economics and usability of providing and using services, including the many mobile applications and services.

The most common cloud model is a mix of public cloud and private infrastructure: for convenience called the hybrid cloud. "The increasing use of hybrid cloud tech is a reflection of the economic drivers that pull more and more IT, corporate and consumer, towards the public cloud. The most fundamental driver is good old economy of scale." Because of the scale, "companies save three to four dollars on internal IT for every dollar they spend on shifting infrastructure and services to cloud."

New cloud providers may find it difficult to compete since companies such as Amazon and Google had such a head start: the "biggest challenges have been access to scalable software to build public and private clouds and networking technologies to connect them."
Even so, there have been and still are some big problems with cloud computing, reliability, the safety of your data, and security. "Cloud adoption is highly susceptible to perceptions of trust."  But the direction of computing is going towards the cloud. New opportunities are opening up and the constraints of pre-cloud computing are fading away."


Monday, September 28, 2015

AWS glitch hits Netflix and Tinder, offers a wake-up call for others. Katherine Noyes. IDG News Service. Sep 21, 2015.
     A number of major websites were affected for a time by glitches in Amazon Web Services' Northern Virginia facility. This is a cautionary lesson to organizations that rely on the cloud service for mission-critical capabilities. The problem resulted in higher-than-normal error rates. Mission-critical systems should have massive redundancies. "In the end, Amazon does not have adequate failover protection, which means its customers need to make sure they do."  Any outage is a significant one for a cloud provider, but "all providers have outages."  More than anything, this is a wake-up call to design your storage to account for problems.

Wednesday, September 09, 2015

The C-Suite: Sleepwalking To Digital Disaster?

The C-Suite: Sleepwalking To Digital Disaster? Jon Tilbury. TechWeekEurope. September 2, 2015.
     While many Chief Information Officers have continuity plans, or plans for digital storage "not many think as far ahead as to think that many of these files will be obsolete in thirty years."  In that 30 year time span, obsolescence may well fall within the regulatory deadline for records accessibility and they will have to deal with the outcome. Digital preservation isn’t high on the corporate agenda, and many executives are likely "unfamiliar with the term which refers to the practice of protecting digital assets against obsolescence so that they are usable and readable when required. " Many people feel that if they have their records in the 'cloud' then they are covered. Yet backing things up or putting them in the cloud does not ensure file longevity. The records would need to be managed by a comprehensive digital preservation solution. "Safeguarding critical digital content and ensuring records can be found, read and understood requires a deep understanding of the information lifecycle and a dedicated digital preservation effort." Public institutions are leading the way with digital preservation and corporate entities ought to look at this very serious to ensure compliance with legislation guidelines.

Tuesday, September 08, 2015

How Cloud Storage can address the needs of public archives in the UK

How Cloud Storage can address the needs of public archives in the UK. Second Edition, March 2015 with updated case studies Neil Beagrie, Andrew Charlesworth, and Paul Miller. March 2015.
     The use of cloud storage in digital preservation is a rapidly evolving field. This paper looks at how it is developing, what the options are, and the practices, requirements and standards that archives
should consider. It examines case studies of five UK archives and how they use cloud storage solutions. Digital preservation is a significant issue and there is an increasing demand for
archival storage. The need is not a one time purchase since content will continue to expand. The paper contains:
  • General overview of key areas, definitions, and issues;
  • A Step by Step Guide of creating a business case and important options;
  • Future developments of cloud storage over the next few years;
  • Current Good Practices to follow;
  • Case studies of UK archives that have implemented cloud solutions with detailed of their organization, requirements and approaches.
  • Sources for additional advice and guidance
  • A Table of legal issues to be aware of with cloud storage.
There are many cloud storage providers that can provide storage, but Archives typically have additional requirements beyond the simple availability of a place to store data files, such as data protection, personally identifiable information, concern for risk and data loss, and longer storage requirements. "Generic providers of cloud services, such as Amazon, Google, Microsoft and others, do not typically address specific archival considerations within their basic offerings." There are some specialist providers that address more of the needs of archives.

There are some advantages to cloud storage:
  • potential cost savings from easier procurement and economies of scale, particularly for smaller archives
  • automated replication to multiple locations and access to professional managed digital storage
  • ability to add other dedicated tools, procedures,workflow and service agreements, tailored for digital preservation requirements
  • As digital preservation becomes more of a strategic interest, cloud may be a component of required solutions and enable wider participation and collaboration.
"Balanced against these areas of potential and promise however, there are areas where significant issues need to be understood by archives and addressed, particularly in terms of information security and potential legal requirements." The larger public cloud service providers invest "significant sums in ensuring the physical security of their data centre buildings". They also employ teams of
dedicated IT security staff ensuring the safety and security of the data. There are also legal issues which must be understood.

When considering cloud storage, it is important to be aware of three key items:
  1. data held in archives must be expected to be both preserved and accessible beyond the commercial lifespan of any current technology or service provider;
  2. an approach to addressing serious risks, such as loss, destruction or corruption of data that is based purely on financial compensation will not be acceptable, as this takes no meaningful account ofthe preservation and custodial role of archives; 
  3. there must be an explicit provision must be made for pre-defined exit strategies (e.g. synchronising content across two cloud service providers or with local internal storage, and effective monitoring and audit procedures.
There is also an increasing number of community cloud storage, which tailored specifically for a particular group of users.

Friday, July 31, 2015

How do you preserve your personal data forever? We ask an expert

How do you preserve your personal data forever? We ask an expert. Simon Hill. Digital Trends. July 12, 2015.
     About two billion photos are uploaded to the cloud every single day. Every minute about 300 hours of video is uploaded to YouTube.  Our digital worlds are constantly expanding. Storage and access are not the same in the digital world as they are with physical materials.  The article is a discussion with Dr. Micah Altman, Director of Research and Head/Scientist, Program on Information Science for the MIT Libraries.  “Preservation is really about long term access. It’s about communicating with the future at some point.” Digital materials are under threat of loss: “One threat is that the media fails. The hard drive fails, the DVD fails, or the disc can’t be read. Another threat is that you can see the bits, but you can no longer tell what they mean because there’s no software available that will render that document... that format is not supported anymore.”

People think that digital files will last forever, but different types of digital media have "tremendously varying shelf lives". The shelf life can vary drastically by how they are stored. Not all storage is created equal. The media that’s sold to consumers is not really built for long term storage. “One of the challenges is that the media that’s usually sold to consumers, like hard drives in computers, are not really built for long term storage.... They’re designed for an operating lifetime of maybe three, four, or five years.” There is also a lot of variation in hard drive quality; some brands and batches last longer than others.

It’s possible to buy archival quality optical discs that are designed to last a long time. Some have been working on an Archival Disc standard. The  M-Discs from Millenniata have been designed to last for about 1,000 years. Many are looking at cloud storage, but you are "shifting the burden of figuring out how to preserve your data onto your cloud provider". Some vendors are Backblaze and CrashPlan, because they implement best practices and they’re transparent about what they do. Unfortunately, cloud storage isn’t a foolproof method of preservation, and you are placing a great deal of trust in the provider you select. Businesses go bankrupt or there could be other problems. It may help to use two cloud storage solutions, because the chance of simultaneous problems at two independent companies is much lower. There are also privacy questions to consider. The data can be protected with encryption, but "if you lose the encryption key then you’ve lost the data.”

Some other tips:
  • Put materials in a standardized format. TIF and PDF-A are recommended
  • Have an explicit succession plan plan in place for what happens to your accounts when you’re not in a position to access them
  • Make multiple copies
  • Research the cloud storage services before using them
There’s no definitive answer on how to ensure your digital files last forever, but you can hedge your bets and come up with a multi-pronged strategy to give your data the greatest chance of survival.


Related posts:

Tuesday, July 07, 2015

Collection, Curation, Citation at Source: Publication@Source 10 Years On

Collection, Curation, Citation at Source: Publication@Source 10 Years On. Jeremy G. Frey, et al. International Journal of Digital Curation. Vol 10, No 2, 2015.
   The article describes a scholarly knowledge cycle which says the accumulation of knowledge is based on the continuing use and reuse of data and information. Collection, curation, and citation are three processes intrinsic to the workflows of the cycle. The currency of collection, curation, and citation is metadata."Policies should recognize that small amounts of adequately characterized, focused data are preferable to large amounts of inadequately defined and controlled data stored in a random repository." The increasing size of data-sets and the growing risk of loss through catastrophic failure (such as a disk failure) has led to researchers to use cloud storage, perhaps too uncritically so.

The responsibilities of researchers for meeting the requirements of sound governance and ensuring the quality of their work have become more apparent. The article places the responsibility for curation firmly with the originator of the data. "Researchers should organize their data and preserve it with semantically rich metadata, captured at source, to provide short- and long-term advantages for sharing and collaboration."  Principal Investigators, as custodians, are particularly responsible for clinical data management and security (though curation and preservation activities exist in other research roles). "Curators usually attempt to add links to the original publications or source databases, but in practice, provenance records are often absent, incomplete or ad hoc, often despite curators’ best efforts. Also, manually managed provenance records are at higher risk of human error or falsification." There is a pressing need for training and education to encourage researchers to curate the data as they collect it at source.

"All science is strongly dependent on preserving, maintaining, and adding value to the research record, including the data, both raw and derived, generated during the scientific process. This statement leads naturally to the assertion that all science is strongly dependent on curation."

Friday, April 10, 2015

Cloud storage for preservation

Archiving On-Premise and in the Cloud. Joseph Lampitt, Oracle. PASIG Presentation. March 2015. [PDF]
Cloud Storage is storage accessed over a network via web services APIs. For digital preservation storage, one option is the 3-2-1 Rule (3 copies, 2 mediums, 1 offsite).

Benefits of Cloud Storage
  • Limitless scalability
  • Custom metadata
  • Single namespace
  • Simplified management
Preservation Considerations with Cloud Storage include:
  • System / cloud performance
  • Security
  • Infrastructure and investment
  • Stability and longevity of the solution
  • Descriptive metadata 
  • Fixity and where that happens
  • System security and access control  
  • Audit Event Tracking (e.g. maintaining records of actions associated with an asset)
  • Version control so that originals are unchanged
There are trade offs between on site and cloud solutions. The business needs should drive the choice of solutions. It is reported that 90% of an organization's data is passive. Charts comparing the cost of cloud storage to on site storage. “Glacier is almost 10 times as expensive as an on premise tape system with support.”

Thursday, April 09, 2015

Digital Preservation: We Know What it Means Today, But What Does Tomorrow Bring?

Digital Preservation: We Know What it Means Today, But What Does Tomorrow Bring? Randy Kiefer's presentation.  UKSGLive. April 3, 2015.
Long-term preservation refers to processes and procedures required to ensure content remains accessible well into the future. Publishers want to be good stewards of their content. Digital Preservation is an "insurance policy" for e-resources. Commercial hosting platforms and aggregators are not preservation archives! They can remove discontinued content from their system, and commercial businesses may disappear, as with Metapress. There are global digital preservation archives, such as CLOCKSS and Portico, and regional archives, such as National Libraries.
The biggest challenges in the future are: formats (especially presentation of content; and what to do with databases, datasets and supplementary materials. "Any format can be preserved, including video. The issue is that of space, cost and presentation (especially if the format is now not in use/supported)." There are legal issues with cloud based preservation systems. There is no legal precedent with a cloud-based preservation system, and no protection with regards to security.



Wednesday, March 18, 2015

Storage is a Strategic Issue: Digital Preservation in the Cloud

Storage is a Strategic Issue: Digital Preservation in the Cloud. Gillian Oliver, Steve Knight. D-Lib Magazine. March/April 2015.
Many areas are mandating a 'cloud first' policy for information technology infrastructures. The article is a case study of the decision to outsource and its consequences. Some highlights:
  1. data held in archives must be expected to be both preserved and accessible beyond the commercial lifespan of any current technology or service provider.
  2. an approach to addressing serious risks (such as loss, destruction or corruption of data) that is based purely on financial reasons is not acceptable; it does not take into account the  preservation and custodial role of archives;
  3. there must be an explicit provision made for pre-defined exit strategies as well as effective monitoring and audit procedures
Two main challenges
  1. tensions between the information management and information technology perspectives. From the IT perspective the information managers were perceived as crossing boundaries into areas which were not of their concern.
  2. funding model. This change was a consequence of moving from the purchase of equipment for storage for use in house, to the provision of storage as a service.
"If most organisations lose a document, so long as they get the document back they're pretty happy. But because of digital preservation being what it is, you don't want to lose or corrupt any of the bits, they have to be exactly the way they were before." 
Cultural heritage institutions should investigate using storage as a service offerings, and also look ahead to utilizing other cloud based services. When making such decisions, you must be aware of the short term consequences of cost saving (i.e. increased burden on operating budgets) as set against potential long term benefits.

Thursday, February 26, 2015

Cloud Storage and Digital Preservation: New guidance from the National Archives

Cloud Storage and Digital Preservation: New guidance from the National Archives. Laura Molloy. Digital Curation Centre. 13 May, 2014.
The use of cloud storage in digital preservation is a rapidly evolving field and this guidance explores how it is developing, emerging options and good practice, together with requirements and standards that archives should consider. Digital preservation is a significant issue for almost all public archives. There is an increasing demand for storage of both born-digital archives and digitised material, and an expectation that public access to this content will continue to expand. Five detailed case studies of UK archives that have implemented cloud storage solutions

Digital preservation can be defined as: “the series of managed activities necessary to ensure continued access to digital materials for as long as necessary, beyond the limits of media failure or technological and organisational change”. The challenges are urgent but can be taken one step at a time; you can address current technology and needs while ensuring that the content can be passed on to the next generation. With cloud storage there are many positives and negatives that must be considered. The article reviews many of these. When establishing your needs: Identify what are the ‘must have’ needs and what are the ‘wants’. Define your requirements and decide on the required capabilities rather than a specific technology, implementation, or product.
  • We should be concerned about the security of data, wherever it is stored, but it would be unrealistic to suggest that most cloud services are inherently less secure than most local data centres.
  • Adoption of a digital preservation strategy utilising cloud computing inevitably brings with it a range of legal questions.
  • Cloud storage services can achieve significant economies of scale.
  • Cloud services are typically considered to be operational rather than capital expenditure


Thursday, October 30, 2014

Digital Preservation Network (DPN) Launches Member Content Pilot

Digital Preservation Network (DPN) Launches Member Content Pilot. Carol Minton Morris. Duraspace.org. 2014-10-29. 
DPN has launched a Member Content Pilot program as a step toward establishing an operational, long-term preservation system. The pilot is testing real-world interactions between DPN members through DPN “nodes” that ingest data from DPN members and package it for preservation storage. Chronopolis/Duracloud, The Texas Preservation Node, and the Stanford Digital Repository will be functioning as First Nodes. APTrust and HathiTrust, in addition to the above three, will be providing replication services for the pilot data.

Wednesday, June 27, 2012

New cloud service safeguards organisations’ digital assets.

New cloud service safeguards organisations’ digital assets. Press Release. 27 June 2012. Tessella has announced Preservica, a new cloud-based service that provides organisations with a secure and affordable solution to safeguard their digital assets. Preservica runs on Amazon Web Services (AWS). It is built on Tessella’s SDB technology and provides tools required to build a long-term digital preservation solution including:
  •     Ingest – upload digital content and metadata from a variety of content sources
  •     Content storage – encrypt and store multiple copies in a safe, backed-up location
  •     Flexible metadata and security – choose how is arranged, described, and protected
  •     Access – full search, browse, and download facilities via browser interface
  •     Preservation tools – ensure that content is protected against future obsolescence.
 

    Friday, May 18, 2012

    Pogoplug service turns your computers into private cloud

    Pogoplug service turns your computers into private cloud. Lucas Mearian. Computerworld. May 9, 2012.
    Pogoplug Team which allows home offices and small businesses (best for 10 - 50 people) to turn file servers or spare PCs into pools of storage accessible from the Web. This basically creates a private cloud for businesses so they can store information within their own firewalls rather than on third-party services. The product allows multiple users to share storage in a home or office. The service's capabilities are similar to Dropbox for file sharing and collaboration. It has full access and file sharing to 3TB of storage for $150 (not including the $15-a-year, per-person licensing fee paid to Pogoplug for its service.)

    Wednesday, May 02, 2012

    New DuraCloud Digital Archiving and Preservation Services.

    New DuraCloud Digital Archiving and Preservation Services. Press Release. Duraspace. May 1, 2012.
    The DuraSpace organization announced new DuraCloud subscription plans that offer three levels digital preservation and archiving services in the cloud. Prices for the new subscription plans are competitive with commercial cloud providers and do not require additional transfer or variable costs.
    1. DuraCloud Preservation Basic: for institutions that would like to have two redundant copies of their original content stored at one cloud data center. 
    2. DuraCloud Preservation Plus: for institutions that wish to have four redundant copies of their original content stored at two cloud data centers. 
    3. DuraCloud Enterprise: a full suite of configurable DuraCloud features for institutions that need multiple DuraCloud sub-accounts for departments, research groups, cross institutional projects, or individuals.
    DuraCloud offers automated weekly content health checks, reporting, file repair  and more. It  runs on Amazon AWS, Rackspace and in the near future on the San Diego Supercomputer Center (SDSC) Cloud, and provides customers with:
    • Copies of the content stored with multiple providers
    • Automated health checking of content
    • Automated repair of damaged files for Preservation Plus customers
    • A full suite of reports
    • Online sharing and streaming to any internet-linked device