Showing posts with label archives. Show all posts
Showing posts with label archives. Show all posts

Friday, August 30, 2019

Saskatchewan Archives digitizing 560,000 newspaper pages from Second World War years

Saskatchewan Archives digitizing 560,000 newspaper pages from Second World War years. Arthur White-Crummey. Regina Leader-Post. August 28, 2019.
     This is an article about the digital preservation laboratory of the Provincial Archives of Saskatchewan and the project of scanning and auditing a collection of weekly newspapers from 160 Saskatchewan communities. The Archives is marking the 80th anniversary of Canada’s entry into the Second World War by digitizing its trove of community newspapers covering the years 1939 to 1945. There are 560,000 pages for that period alone, just part of a massive collection of 10 million pages extending from 1878 to the 1960s.

The archivist views digitization "as a way of ensuring the survival of those priceless records. The newspapers themselves have largely disappeared".  But digitization is about more than preservation, it’s a way of “democratizing the record.” People no longer have to travel to local areas to view the resources. “They can now go online anywhere in the world. That’s what it’s all about.”


Wednesday, October 31, 2018

Deep into that darkness peering: Our Dark Repository

Deep into that darkness peering: Our Dark Repository.  Lance Thomas Stuchell. Bits and Pieces. October 22, 2018.
     This is an interesting post about their dark archive and how it is being used. Their definition of a Dark Archive is: An archive that is inaccessible to the public. It is typically used for the preservation of content that is accessible elsewhere. For them, the “preservation of content that is accessible elsewhere” line is an important one. "Before we created a dark archive, all of our preservation systems were built for access, with many of them creating access copies (or DIPs, for all you OAIS groupies out there) on the fly from the preservation copy (AIPs) in the repository."

These systems worked for most of their digital material, but not for time-based digital media, such as video files, since they were too big to serve as access copies or be the source of on-the-fly access copy creation. The dark archive allows them to separate access from storage, and provides a place to preserve A/V preservation masters long-term.  Their  "Dark Blue" repository "provides long-term storage for A/V preservation masters and medium-term storage for forensic images/file transfers of born-digital archival accessions" and may be expanded in the future for data backups, perpetual access copies of licensed content, backups of video games, and web archives.

The dark archive workflow relies on other systems for metadata management and searchability, such as the catalog and ArchivesSpace. "We will continue to evaluate our storage strategy as the diversity and size of our digital collections grow, but right now Dark Blue fills an important void in our preservation strategy."

Tuesday, April 18, 2017

Understanding PREMIS

Understanding PREMIS. Priscilla Caplan. Library of Congress Network Development and MARC Standards Office. 2017.
     PREMIS stands for "PREservation Metadata: Implementation Strategies". This document is a relatively brief overview of the PREMIS preservation metadata standard. It can also serve as an "gentle introduction" to the much larger document PREMIS Data Dictionary for Preservation Metadata. PREMIS defines preservation metadata as "the information a repository uses to support the digital preservation process."  Preservation metadata also supports activities "intended to ensure the long-term usability of a digital resource."

The Data Dictionary defines a core set of metadata elements needed in order to perform preservation functions, so that digital objects can be read from the digital media, and can be displayed or played. It includes a definition of the element; a reason why it is part of the metadata; also examples and notes about how the value might be obtained and used.  The elements address information needed to manage files properly, and to document any changes made. PREMIS only defines the metadata elements commonly needed to perform preservation functions on the materials to be preserved. The focus is on the repository and its management, not on the content authors or the associated staff, so it can be a guide or checklist for those developing or managing a repository or software applications. Some information needed is:
  • Provenance: The record of the chain of custody and change history of a digital object. 
  • Significant Properties: Characteristics of an object that should be maintained through preservation actions. 
  • Rights: knowing what you can do with an object while trying to preserve it.
The Data Model defines several kinds of Entities:
  • Objects (including Intellectual Entities)
  • Agents
  • Events
  • Rights
PREMIS provides an XML schema that "corresponds directly to the Data Dictionary to provide a straightforward description of Objects, Events, Agents and Rights."

Wednesday, March 29, 2017

Archives Unlocked vision launched at the Southbank Centre

Archives Unlocked vision launched at the Southbank Centre. Press release. The National Archives. 29 March 2017.
     The National Archives (UK) has launched a vision and action plan to help archives secure their future through digital transformation, investing in new workforce skills, and encouraging innovation. This vision and action plan offers a future where "businesses, creative industries, arts organisations, academia, and communities can fully exploit a more resilient archives sector, with the UK leading the world in digital transformation."  It is built on themes of Trust, Enrichment and Openness, that highlight "the importance of archives in holding authority to account through scrutiny, in driving innovation and creativity for businesses and across society, and in cultivating an open approach to knowledge accessible to all."

The rich, national collection of archives "are the nation’s collective memory." The updated vision is needed to sustain the Archives for the long term. "The Archives Unlocked action plan embodies this. It sets out what is required to release the power of the archives."

"Working with partners, stakeholders, investors and individuals, we will have greater potential and influence to accomplish what we need to do. The UK will be home to world-leading archives: both digital and physical."


Thursday, March 16, 2017

Creating the disruptive digital archive

Creating the disruptive digital archive. John Sheridan. Digital Preservation Coalition. 1 March 2017.
     The National Archives has been working on a new Digital Strategy. "Digital" is their biggest strategic challenge. Archives worldwide are "grappling with the issues of preserving digital records. We also need to be relevant to our audiences: public, government, academic researchers and the wider archives sector – to provide value to them at a time of change."

Traditional archives are built around the physical nature of the records, but digital records "change all our assumptions around the archive – from selection to preservation and access". Their new Digital Strategy is to move beyond the digital simulation of physical records and to become a ‘disruptive’ digital archive, to be "digital by design".

The National Archives is currently a "fully functioning digital archive with a Digital Records Infrastructure capable of safely, securely and actively preserving very large quantities of data with associated descriptive metadata" which is applying the paper records paradigm of selection, preservation and access to digital records. This is their first generation archive.  The second generation digital archive they are aiming for is to be "digital by instinct and design":

  • rich mixed media content (things like websites), datasets, computer programs, even neural networks, as records not just information in document formats
  • ability to select and preserve all these types of things 
  • digital information has value in aggregate – that it’s not just individually important artefacts that have historical value. 
  • a relentless engineering effort to preserve digital objects that measures and manages the preservation risks
  • transparent in its practices
  • develops approaches for enabling access to the whole collection with regard to legal, ethical and public considerations. 
  • regards the archive as conceptually interconnected data.

"These are ambitious aims and there are many challenges we need to tackle along the way." Collaboration between archives and other institutions is essential in moving forward.


Tuesday, March 07, 2017

The role of archives

The role of archives. Helen Hockx. Things I cannot say in 140 characters.  January 20, 2017.
     The role of Archives, especially when it comes to digital records, is not commonly understood. An archivist should ask questions "about the file structure, the access system, who accessed it, and how was it used… Appraisal is based on context, or the entire record keeping system and the importance of individual items depends on how they relate to one another within a system". This is difficult to do after the fact. The heart of the problem is: who makes decisions on what records to keep? A perception is that Archives are "museums with artifacts, and have no authority over digital records”.  access to the digital files should be determined by the “data stewards” under the direction of the University’s Information Governance Committee. The role of Archives, data access, record lifecycles and retention schedules seem to be largely misunderstood.


Friday, December 30, 2016

How Not to Build a Digital Archive: Lessons from the Dark Side of the Force

How Not to Build a Digital Archive: Lessons from the Dark Side of the Force. David Portman. Preservica. December 21, 2016.
     This post is an interesting and humorous look at Star Wars archiving: "Fans of the latest Star Wars saga Rogue One will notice that Digital Archiving forms a prominent part in the new film. This is good news for all of us in the industry, as we can use it as an example of how we are working every day to ensure the durability and security of our content. Perhaps more importantly it makes our jobs sound much more glamorous – when asked 'so what do you do' we can start with 'remember the bit in Rogue One….'"

The Empire’s choice of archiving technology is not perfect and there are flaws in their Digital Preservation policy in many areas, such as security, metadata, redundancy, access controls, off site storage, and format policy. Their approaches are "hardly the stuff of a trusted digital repository!"

Thursday, December 29, 2016

Robots.txt Files and Archiving .gov and .mil Websites

Robots.txt Files and Archiving .gov and .mil Websites. Alexis Rossi. Internet Archive Blogs. December 17, 2016.
     The Internet Archive collects webpages "from over 6,000 government domains, over 200,000 hosts, and feeds from around 10,000 official federal social media accounts". Do they ignore robots.txt files? Historically, sometimes yes and sometimes no, but the robots.txt file is less useful that it was, and is becoming less so over time as, particularly for web archiving efforts. Many sites do not actively maintained the files or increasingly block crawlers with other technological measures. The "robots.txt file is not relevant to a different era". The best way for webmasters to exclude their sites is to contact archive.org and to specify the exclusion parameters.

"Our end-of-term crawls of .gov and .mil websites in 2008, 2012, and 2016 have ignored exclusion directives in robots.txt in order to get more complete snapshots. Other crawls done by the Internet Archive and other entities have had different policies."  The archived sites are available in the beta wayback. They have had little feedback at all on their efforts. "Overall, we hope to capture government and military websites well, and hope to keep this valuable information available to users in the future."


Tuesday, December 20, 2016

File Extensions and Digital Preservation

File Extensions and Digital Preservation. Laura Schroffel. In  Metadata Specialists Share Their Challenges, Defeats, and Triumphs. Marissa Clifford. The Iris. October 17, 2016
     The post looks at metadata challenges with digital preservation. Most of the born-digital material they work with exists on outdated or quickly obsolescing media, such as floppy disks, compact discs, hard drives, and flash drives that are transferred into their Rosetta digital preservation repository, and accessible through Primo.

"File extensions are a key piece of metadata in born-digital materials that can either elucidate or complicate the digital preservation process". The extensions describe format type, provide clues to file content, and indicate a file that may need preservation work. The extension is an external label that is human readable, often referred to as external signatures. "This is in contrast to internal signatures, a byte sequence modelled by patterns in a byte stream, the values of the bytes themselves, and any positioning relative to a file."

Their born-digital files are processed on a Forensic Recovery of Evidence Device ( FRED) which can acquire data from many types of media, such as Blu-Ray, CD-ROM, DVD-ROM, Compact Flash, Micro Drives, Smart Media, Memory Stick, Memory Stick Pro, xD Cards, Secure Digital Media and Multimedia Cards. The workstation also has the Forensic Toolkit (FTK) software is capable of processing a file and can indicate the file format type and often the software version. There are challenges since file extensions are not standardized or unique, such as naming conflicts between types of software, or older Macintosh systems that did not require files extensions. Also, because FRED and FTK originated in  law enforcement, challenges arise when using it to work with cultural heritage objects.


Tuesday, October 18, 2016

When Archivists and Digital Asset Managers Collide: Tensions and Ways Forward

When Archivists and Digital Asset Managers Collide: Tensions and Ways Forward. Anthony Cocciolo. The American Archivist. Spring/Summer 2016. [PDF]
     The article looks at tensions in an organization between archivists and digital asset managers. Archivists maintain the inactive records (paper or electronic) of permanent value for an organization. A records manager’s role is to manage active records, and records with permanent value are transferred to the archives when they become inactive. Digital asset managers often see their role in  creating repositories of assets that can be easily and efficiently reused by staff. This accompanies the attitude that digital files will never become inactive.

This study is limited because it provides at a single instance that may not apply to other organizations that have both archivists and digital asset managers. It looks at tensions that can exist between archivists and digital asset managers which mostly come from digital asset managers and archivists not recognizing the different role each plays. 

For archives, the unit being managed is a record (“data or information in a fixed form that is created or received in the course of individual or institutional activity and set aside (preserved) as evidence of that activity for future reference"). In digital asset management, the unit being managed is an asset (a kind of record that individuals can readily reuse in future work products). Archivists are interested in the record not only for its content and aspects about the record itself, such as historical and social implications. Digital asset managers are more focused on the content and the legal rights to reuse, and are more like libraries in their approach.

One tension between the two groups is that if a file was deposited and permanently preserved in the DAM, there would be no reason to deposit it in the archives. Other tensions are
  1. Users, Files, and Where They Get Stored
  2. Differing Work Practices
  3. Approaches to Digital Preservation
  4. Communication
  5. Differing Approaches to Planning
The article states that archivists and digital asset manager differ in the view of preservation planning, fixity checking, formats accepted, and how to respond to file formats once they became obsolete. [Not all digital asset managers are as 'short term' as implied. cle]  However,  digital asset or content management systems are “not adequate for long-term digital preservation because [they include] no mechanisms for reliably assuring authenticity and intelligibility of digital documents for fifty years or longer.”   Also, another problem is that many things are called an “archives” which can be troubling for the archivists, who must contend with staff who believe that they are keeping archives and may view the DAM as yet another archives.

The article recommends that items deemed assets be deposited both in the DAM system and in the digital archives. In the digital archives, the asset will be grouped with other records of the same provenance and metadata will be attached to the file to make it more find-able. The archivists will document the activity of the institution for researchers. Since the purposes are not the same and the user groups do not overlap entirely, it is sensible that assets appear in both places. This is not wasteful because digital preservationists because multiple copies can increase object safety.  At a minimum, references to the assets in the DAM should be added to the archives intellectually if not physically. Asset management systems should not replace the need to create digital archives that document
institutional activity.

It is also essential that digital asset managers and archivists respect the different roles they play and not try to undermine each other. Each should focus on their own missions:
  • digital asset managers: creating a collection of digital assets for effective and efficient reuse by staff members. 
  • archivists: documenting institutional activity through records of permanent value in whatever format they may occur for use by staff and public researchers.


Monday, September 26, 2016

Selection and Appraisal in the OAIS Model

Selection and Appraisal in the OAIS Model. Ed Pinsent. DART Blog. 7 September 2016.
     The post asks if the OAIS Model accommodate the skills of selection and appraisal, then suggests that it cannot.  The Model presents an over-simplified view where in a state that is all ready to preserve, which ignores the beginning processes.There is a need to define the pre-ingest stage in OAIS, but there needs to be  a greater recognition of the archivists' Selection and Appraisal skills, can have tremendous value in digital preservation. Archivists assess the value of the content in a contextual framework, based on other records in the archive and in the context of provenance. It requires an understanding of context, provenance, record series, to help identify the potential value of content. A Series model is the "foundation for all Archival arrangement, and is the cornerstone of our profession". It is difficult to see where the record / archival series is in all this.  "The integrity and contextual meaning of a collection is being overlooked, in favour of this atomised digital-object view.

OAIS, if strictly interpreted, could bypass the Series altogether in favour of an assembly line workflow that simply processes one digital object after another."  The blog post asserts that there is a need to rediscover the value of Appraisal and Selection and its importance in the digital realm. 


Friday, April 22, 2016

Scientific Archives in the Age of Digitization

Scientific Archives in the Age of Digitization. Brian Ogilvie. The University of Chicago Press Journals. March 2016.
     Historians are increasingly working with material that has been digitized; they need to be aware "of the scope of digitization, the reasons why material is chosen to be digitized, and limitations on the dissemination of digitized sources."  Some physical aspects of sources, and of collections of sources, are lost in their digital versions. Some notes from the article:
  • "Digitization of unique archival material occupies an ambiguous place between access and publication."
  • digitized archives reproduce unique archival material with finding aids but without significant editorial commentary that allows for open-ended historical inquiry without the need to travel to archives  
  • the digitized archive also raises questions and challenges for historical practice, specifically 
    • the digitizing decision and funding
    • balancing digital access against some owners’ interests in restricting access
    • aspects of the physical archive that may be lost in digitization
    • the possibility of combining resources from a number of physical archives
  • most digitization projects have been selective in their scope
  • scholars cannot assume that material has been digitized, nor that all material has been digitized, unless the archive specifically states that
  • digitized material is not always freely available, e.g. subscription based archives
  • many archivists "fear that their traditional task of preparing detailed collection inventories is under threat owing to dwindling resources and the demand for digitization."

Digital Preservation notes:
  • projects have undeniable benefits for the preservation of documents and access to them.
  • In the interest of preserving their holdings and disseminating them to a broad public, archives are increasingly digitizing their collections. 
  • historians interested in digital preservation of archives, and electronic access to them, would be well advised to seek out collaborations with archivists.

Friday, April 08, 2016

CNI Spring 2016 Trip Report

2016-04-05: CNI Spring 2016 Trip Report. Michael L. Nelson. Web Science and Digital Libraries Research Group. April 5, 2016.
     The article is his trip report on the CNI meetings. A few of the items listed are:
1. "Digital Preservation of Federal Information Summit". Martin Halbert, Katherine Skinner, discussed "...the topic of preservation and access to at-risk digital government information."

2. "Why We Need Multiple Archives". Michael L. Nelson, Herbert Van de Sompel. Slides.
  • Two Common Misconceptions About Web Archiving
    • old content is  obsolete, stale, bad
    • The Internet Archive has every copy of everything that has ever existed
  • There are other archives that may have the same or similar content. There may be a need to resolve conflicts with the content of the archives
  • A single archive is vulnerable.
  • In the Web world, in order to monetize their content the copyright owner has to minimize the number of copies.
  • Archives aren’t magic web sites They’re just web sites.
  • Don’t Throw Away the Original URL – Use Robust Links!
3.  "Defining the Scholarly Record for Computational Research", Victoria Stodden.  Presented the "Reproducible Research Standard",

4.  "Microservices Architecture: Building Scalable (Library) Software Solutions." Jason Varghese. A website provides a detailed discussion of the APIs they've implemented

5. Storytelling for Summarizing Collections in Web Archives. Michael Nelson.

6. "Activist Stewardship: The Imperative of Risk in Collecting Cultural Heritage".  Todd Grappone, Elizabeth McAulay, Heather Briston.  They presented about the Digital Ephemera Project, and in general the role of archivists in collecting materials that may cause trouble. "The Digital Ephemera Project is an initiative to digitize, preserve and provide broad public access to print items, images, multimedia, and social networking resources produced world-wide."

Friday, March 25, 2016

National Archives permits us to learn from mistakes

National Archives permits us to learn from mistakes. Peter Charleton, Supreme Court judge. The Irish Times. Feb 8, 2016.
      For the National Archives in Ireland, 1922 was a disaster. A direct hit from artillery destroyed centuries of records. The census records from 1821 through to 1891 were almost completely destroyed. Since then, the National Archives has tried to supplement its damaged holdings, but what has been lost is gone forever. With many places moving from paper to digital records history is on the point of repeating itself. The traditional policy of printing files to preserve a digital record no longer works. Files may be on several computers in several iterations; they may have "elements in office systems, email, even text messages or a tweet." With digital, there is a lot of data, which brings a challenge of what to preserve.  But not every record needs preservation.

To ensure records the preservation of long-term records, the records should be transferred to the National Archives. Permanent records need to be identified early and treated appropriately.  Creation of a digital archive will greatly reduce the volume of records that government departments store. "Millions are spent by departments on off-site storage and back-ups of network drives. By investing in a digital archive, departments will be able to transfer emails, business files, digital images and other electronic records to the National Archives. An efficient approach to records management based on legal obligation can target policy effectively."  Money should be directed to the National Archives for developing an efficient system so there are sufficient resources to capture, manage and preserve our digital heritage. This institution, the precious repository of this nation, deserves to be supported in ensuring Ireland continues to have a history.


Monday, March 21, 2016

From digital dark age to digital enlightenment

From digital dark age to digital enlightenment.  Caroline Pegden. National Archives. 17 February 2016.
     Recent media reports have talked about the ‘digital Dark Age‘.  This is a major challenge, now and for the years to come for institutions in the archives sector, who are concerned with managing, preserving and providing access to born-digital records. This is important for the UK National Archives because some government departments will soon transfer born-digital records to The National Archives under the Public Records Act. As the National Archives has been working on how to do this, their philosophy has been ‘learning by doing’. They have reviewed what other archival institutions around the world are doing to manage digital records and have been testing the process of transfers "to design and test the new process to appraise, select, sensitivity review, transfer, preserve and give access to born-digital records." Two major challenges are:
  1. extracting meaning from unstructured digital record collections in order to make appraisal and selection decisions.
  2. sensitivity reviewing born-digital records at scale without having to read all the individual documents
Most government departments’ information is on unstructured shared drives; some departments had up to 190 terabytes of information in email servers.  Technology-assisted-review, a process using reviewers and a combination of computer software and tools to electronically classify records may have interesting applications for the archives sector. "Although there is no ‘silver bullet’ or completely automated solution, technology-assisted review offers ways to prioritise and reduce the information to be manually reviewed."  Two reports are available that highlight challenges and shows how technology-assisted review could help addressing some of these challenges. 
  1. The digital landscape in government 2014-15: business intelligence review
  2. The application of technology-assisted review to born-digital records transfer, Inquiries and beyond: research report

Friday, March 04, 2016

Digital Stewardship in a Radio Archive: An NDSR Project Update

Digital Stewardship in a Radio Archive: An NDSR Project Update. Mary Kidd, Erin Engle. The Signal. January 5, 2016. 
     This is an update on the Radio Archive project and includes:
  • a Digital Preservation Roadmap
  • an overview of the current digital production throughout the various stations
  • a network-wide analysis of NYPR’s digital holdings
The definition of what makes radio includes both traditional broadcasting as well as the digital formats, particularly the WAV and MP3 files, live and on-demand streaming audio and video, on-the-go podcasts, and social media posts. The archive will have to "develop new ways to address the fact that digital assets are often interconnected with other assets, rather than standalone audio objects".  Archives are not just safekeepers of the past, but informants of the present. The evolving identity of radio is influencing the development of a new and emerging field referred to as “radio preservation studies”.

An important part of the studies for the archive is to distinguish radio archives from traditional sound archives.They appear to be similar, but radio has its own distinct sound and modes of production. And radio archives require their own set of archival best practices that contextualize broadcast recordings. "Striking a balance between the archival priorities and the expectations of producers is quite possibly one of the greatest challenges for an archive embedded in a media station."

Tuesday, November 17, 2015

Born Digital: Guidance for Donors, Dealers, and Archival Repositories

Born Digital: Guidance for Donors, Dealers, and Archival Repositories. Gabriela Redwine, et al. Council on Library and Information Resources. October 2013. [PDF]
     "Until recently, digital media and files have been included in archival acquisitions largely as an afterthought." People may not have understood how to deal with digital materials, or staff may not be prepared to manage digital acquisitions. The object is to offer guidance to rare book and manuscript dealers, donors, repository staff, and other custodians to help ensure that digital materials are handled, documented appropriately, and arrive at repositories in good condition, and each section provides recommendations for donors, dealers, and repository staff..

The sections of the report cover:
  • Initial Collection Review
  • Privacy and Intellectual Property
  • Key Stages in Acquiring Digital Materials
  • Post-Acquisition Review by the Repository
  • Appendices, which include: 
    • Potential Staffing Activities for the Repository
    • Preparing for the Unexpected: Recommendations
    • Checklist of Recommendations for Donors and Dealers, and Repositories
Some thoughts and quotes from the report:
  • it is vital to convince all parties to be mindful of how they handle, document, ship, and receive digital media and files.
  • Early communication also helps repository staff take preliminary steps to ensure the archival and file integrity, as well as the usability of digital materials over time.
  • A repository’s assessment criteria may include technical characteristics, nature of the relationship between born-digital and paper materials within a collection, information about context and content, possible transfer options, and particular preservation challenges.
  • Understand if there is a possibility that the digital records include the intellectual property of people besides the creator or donor of the materials.
  • Clarify in writing what digital materials will be transferred by a donor to a repository
    (e.g., hard drives, disks, e-mail archives, websites)
  • It is strongly recommended that donors and dealers seek the
    guidance of archival repositories before any transfer takes place.
  • To avoid changing the content, formatting, and metadata associated with the files, repositories
    must establish clear protocols for the staff’s handling of these materials.
The good practices in this report can help reduce archival problems with digital materials. "Early
archival intervention in records and information management will help shape the impact on archives of user and donor idiosyncrasies around file management and data backup."


Monday, November 09, 2015

Web Archiving Questions for the Smithsonian Institution Archives

Five Questions for the Smithsonian Institution Archives’ Lynda Schmitz Fuhrig. Erin Engle. The Signal. October 6, 2015.   
     Article about the Smithsonian's Archives and what they are doing. Looks at the Smithsonian Institution archives its own sites and the process. Many of the sites contain significant content of historical and research value that is now not found elsewhere. These are considered records of the Institution that evolve over time and they consider that it would irresponsible as an archives to only rely upon other organizations to archive the websites. They use Archive-It to capture most of these sites and they retain copies of the files in their collections. Other tools are used to capture specific tweets or hashtags or sites that are a little more challenging due to the site construction and the dynamic nature of social media content.

Public-facing websites are usually captured every 12 to 18 months, though it may happen more frequently if a redesign is happening, in which case the archiving will happen before and after the update. An archivist appraises the content on the social media sites to determine if it has been replicated and captured elsewhere.

The network servers at the Smithsonian are backed up, but that is the not the same as archiving. Web crawls provide a snapshot in time of the look and feel of a website. "Backups serve the purpose of having duplicate files to rely upon due to disaster or failure" and are only saved for a certain time period. The website archiving we do is kept permanently. Typically, website captures may not going to have everything because of excluded content, blocked content, or dynamic content such as Flash elements or calendars that are generated by databases. Capturing the web is not perfect.

Saturday, October 17, 2015

Flooding Threatens The Times’s Picture Archive

Flooding Threatens The Times’s Picture Archive. David W. Dunlap. New York Times. October 12, 2015.
     A broken pipe sent water cascading into the storage area where The Times keeps its collection of historical photos, newspaper clippings, microfilm records, books and other archival material. About 90 percent of the affected photos would be salvageable, but how many were lost remains unknown. The card catalog was not damaged; otherwise it would be impossible to locate materials in the archive. "What makes the card catalog irreplaceable is that it has never been digitized. Hundreds of thousands of people and subjects are keyed by index numbers to the photo files, which contain an estimated six million prints and contact sheets." This "raised the question of how in the digital age... can some of the company’s most precious physical assets and intellectual property be safely and reasonably stored?"

Monday, September 21, 2015

Archiving a digital history

Archiving a digital history: Preserving Penn State’s heritage one link at a time. Katie Jacobs Bohn. Penn State News. September 18, 2015.
     Archivists need a way to preserve digital artifacts so future historians have access to them. This includes content on the internet that can disappear in a short time. Archive-It is a service Penn State archivists are using to make copies of Web pages and arrange them in collections. They want to digitally preserve their cultural heritage, including the University’s academic and administrative information that is published on the Web.

“Web archiving is important because so much of Penn State’s media is ‘born digital,’ or in other words, there’s never a physical copy."  “But we still need a way to keep and preserve this material so it’s not lost forever.” Some quotes from the article:
  • Preservation requires more than just backup. 
  • "Technology is constantly evolving, and it’s hard to know what digital archiving will look like 50 years from now, let alone hundreds."
  • “In the right environment, paper will last hundreds of years, but digital information has a lot of dependencies. To be able to access digital files in the future, you may need a certain kind of hardware and operating system, a compatible version of the software to open the file, not to mention electricity.” 
  • “A lot of digital preservation work involves mitigating the risks associated with these dependencies. For example, trying to use open file formats so you don’t need specific software programs that may no longer be around to access them.”
Regardless of what they are trying to preserve, archivists have difficulties with trying to manage the ephemeral nature of culture and history.