Showing posts with label social media archiving. Show all posts
Showing posts with label social media archiving. Show all posts

Thursday, March 28, 2019

A Public Record at Risk: The Dire State of News Archiving in the Digital Age

A Public Record at Risk: The Dire State of News Archiving in the Digital Age. Sharon Ringel and Angela Woodall. Columbia Journalism Review. March 28, 2019.
     This research report looks at archiving practices and policies across newspapers, magazines, wire services, and digital-only news producers, to identify the current state of preserving content in an age of digital distribution. The majority of news outlets had not given any thought to even basic strategies for preserving their digital content, and not one was properly saving a holistic record of what it produces. Digitization and storage in a database are not alone adequate for long-term preservation. True archiving requires forethought and custodianship.

Staff equate digital backup and storage in Google Docs or content management systems with archiving, but they are not the same, and were unable to distinguish between backups and an archive. Backups are temporary copies for data recovery in case of damage or loss, while archiving refers to long-term preservation to ensure records will still be available even as technologies change in the future. They expect that other third-party organizations will have copies, such as the Internet Archive, Google, Twitter, Facebook, etc. Even if the IA has captured a website, what it collects may be limited to the first level of content and could exclude links, comments, personalized content, and different versions of a story.

There are news archiving technologies being developed; preserving digital content is not a technical challenge, but  a matter of priority and a decision that demonstrates intent. The findings should be a wake-up call to an industry which claims that democracy cannot be sustained without journalism to be a truth and accountability watchdog. "In an era where journalism is already under attack, managing its record and future are as important as ever."

The news organizations are interested in the present: “Who cares what existed 10 years ago? I need my thing now. And so, for better, for worse, if there was some value in [archiving], I probably got a better value out of the new thing.” In short, newsrooms are doing very little to nothing to preserve digital news. And none of the content creators interviewed made an effort to download and preserve the stories they produced.

Deletion is the opposite side of preservation and "news organizations, in certain cases, actively remove content from the public record", which raises questions about the role of journalism in society.

Some key findings of the news organizations participating in the research:

  • 19 of the 21 news organizations had no policies or practices for the preservation of their content. Of the 21 news organizations in our study, 19 were not taking any protective steps at all to archive their web output. The remaining two lacked formal preservation strategies.
  • Of the 21 news organizations, only six employed news archivists or librarians and their other responsibilities, took the focus away from the work required for preservation. 
  • None of the digital-only outlets had a news librarian or archivist on staff. 
  • None of the news organizations were preserving their social media publications. Only one was attempting to address the problem.
  • Digital-only news organizations were less aware than print publications of the importance of preservation. Very little is currently being done to preserve news.
  • Journalism’s primary focus is on “what is new” and preserving documentation of their reporting and what makes it accurate than preserving what ultimately gets published.
  • News apps are at high risk of being lost because these new technologies become obsolete before anyone thinks to save them. 
  • Partnerships among archivists, technologists, memory institutions, and news organizations will be vital to ensure future access to digitally distributed news content. Two questions to start with: What should be preserved? Who should preserve it?
  • To enact lasting change, opinion leaders in the field must introduce to staff and management that archiving ideas make sense  positions, it has advantages, and is compatible with their priorities.

News organizations should care about preserving news for the future just as they care about integrity, reliability, and informing the public not just in the present.


Monday, January 30, 2017

Born-digital news preservation in perspective

Born-digital news preservation in perspective. Clifford Lynch. RJI Online. January 26, 2017. [Video and transcript.]
   The challenge with news and academic journals: how do you preserve this body of information. The journal community has working on that in a much more systematic way. There is a shared consensus among all players that preserving the record of scholarly journal publication is essential. Nobody wants their scholarship to be ephemeral so you have to tell people a convincing story about how their work will be preserved.

The primary responsibility for the active archive in most cases is the publisher, but there must be some kind of external fallback system so content will survive the failure of the publisher and the publisher’s archive. These are usually collaborative. Libraries have been the printed news archive, but that is changing. There is also a Keepers Registry so you can see how many keepers are preserving a given journal. The larger journals are well covered, but the smaller ones are really at risk, and a lot of these are small open source journals. "So, we need to be very mindful of those kinds of dynamics as we think about what to do about strategies for really handling the digital news at scale."

With the news, there are a few very large players, and a whole lot of other small news outlets of various kinds. Different strategies are needed for the two groups. We need to be very cautious about news boundaries. "Now in many, many cases, the journalism is built on top of and links to underlying evidence which at least in the short term is readily inspectable by anyone clicking on a link." But the links deteriorate and the material goes away and "preserving that evidence is really important." But it is unclear who is or should be preserving this. There are also questions about the news, the provenance, the motives, the accuracy, and these have to be handled in a more serious way.

"most social media is actually observation and testimony. Very little of it is synthesized news. It’s much more of the character of a set of testimonies or photographs or things like that. And collectively it can serve to give important documentation to an event, but often it is incomplete and otherwise problematic. We need to come to some kind of social consensus about how social media fits into  the cultural record.

We need to devise some systematic approaches to this because the journalistic organizations really need help; "their archives are genuinely at risk" and in many cases the "long term organizational viability is at risk". We need a public consensus. "We need a recognition that responsible journalism implies a lasting public record of that work." The need for free press is recognized consitutionally. "We cannot, under current law, protect most of this material very effectively without the active collaboration of the content producers." This is too big a job for any single organization, and we don't want a single point of failure.


Tuesday, February 23, 2016

Preserving Social Media

New Technology Watch report: Preserving Social Media. Sara Day Thomson. Digital Preservation Coalition and Charles Beagrie Ltd. 16 Feb 2016. [PDF]
     This report looks at the related issues of preserving social media. Institutions collecting this type of media need new approaches and methods.  The report looks at "preserving social media for long-term access by presenting practical solutions for harvesting and managing the data generated by the interactions of users on web-based networking platforms such as Facebook or Twitter." It does not consider blogs. "Helen Hockx-Yu defines social media as: ‘the collective name given to Internet-based or mobile applications which allow users to form online networks or communities’.

Web 1.0 media can be harvested by web crawlers such as Heritrix; Web 2.0 content, like social media platforms, is more effectively archived through APIs. This is often an extension of an institution's web archiving. Transparency and openness will be important when archiving content. APIs allow developers to call raw data, content and metadata directly from the platform, all transferred together in formats like JSON or XML.

Maintaining long-term access to social media data faces a number of challenges, such as working with user-generated content, continued access to social media data, privacy issues, copyright infringement issues, and having a way to maintain the linked, interactive nature of most social media platforms. There is also "the challenge of maintaining the meaning of the social media over time, which means ensuring that an archive contains enough metadata to provide meaningful context."  There are also third-party services and self-archiving services available.

Social media is vulnerable to potential loss. The report quotes one study which looked at "the lifespan of resources shared on social media and found that ‘after the first year of publishing, nearly 11% of shared resources will be lost and after that we will continue to lose 0.02% per day’."

Some other quotes:
  • Overall, the capture and preservation of social media data requires adequate context.
  • Capturing data, metadata, and documentation may not provide enough context to convey user experiences with these platforms and technologies.
  • When considering the big picture, however, the preservation of social media may best be undertaken by a large, centralized provider, or a few large centralized providers, rather than linking smaller datasets or collections from many different institutions.

Monday, October 12, 2015

Social Media Usage: 2005-2015

Social Media Usage: 2005-2015. Andrew Perrin. Pew Research Center. October 8, 2015
     Results of report on social network usage statistics. "Nearly two-thirds of American adults (65%) use social networking sites, up from 7% when Pew Research Center began systematically tracking social media usage in 2005".  The figures reported here are for social media usage among all adults, not just among those Americans who are internet users.
  • Age differences: 
    • 90% of young adults use social media
    • 35% of all those 65 and older report using social media 
  • Gender differences: 
    • 68% of all women use social media
    • 62% of all men use social media
  • Socio-economic differences: 
    • Those in higher-income households were more likely to use social media. 
    • Over 56% of the lowest-income households now use social media. 
    • Those with college experience are more likely to use social media than those with high school degree or less
  • Racial and ethnic similarities: There are no notable differences by racial or ethnic group: 
    • 65% of whites, 65% of Hispanics and 56% of African-Americans use social media today.
  • Community differences: 
  • Today, 58% of rural residents, 68% of suburban residents, and 64% of urban residents use social media.

Friday, September 04, 2015

The Digital Traces of User-generated Content: How Social Media Data May Become the Historical Sources of the Future

The Digital Traces of User-generated Content, How Social Media Data May Become the Historical Sources of the Future.  Katrin Weller. Library of Congress. Updated August 27, 2015. [Video and transcript.]
     What value will user-generated data be to the historians of the future and how can we preserve and manage these new kinds of information sources. Social media data should be preserved since they will be a valuable source for future historians, but they are also a very important source for  researchers today.  We must be aware that if we do not deal with this now, if we do not find ways to preserve social media data and online data now, then it might be too late. Many things have already been lost.  For these online companies, they want our data so they can sell us things and they are not necessarily interested in opening these platforms for our research. We must figure out how to deal with the sensitivity of this data that people contribute. Some of the symbols and terms used may not be clear to others in the future.

Some social media has replaced other formats. "For example, this is a blog by a war hero, which is kind of the digital equivalent of a war diary which would have been in other context, as well.  So the content is kind of the same, but the ways to engage with it are different because people can immediately respond to a blog post here to a diary entry, they can link to it from other platforms, they can share the link, they can like it, they can comment on it.  These are things that you will not have in a classical non-digital format, which makes this digital source something special."

There are no standards yet for this media, and actually there are different types of social media. The data formats may vary and that is important. The platforms change frequently and that can be a big problem for future historians and even for researchers today who want to know what they were actually studying. In addition, some data may vanish after a certain point of time. 

The top three challenges for historians and librarians when dealing with social media content and web content are
  1. not to run into something like the Dark Ages of the internet, 
  2. not to lose too much critical information that cannot be restored after awhile, like what we also find for the [inaudible] of radio and television a lot of things have been lost already. URLs in the message may be un-resolvable so you may not know what they were referring to.
  3. how to deal with lots of hard drives full of digital photos, terabytes of digital photos, terabytes of other things? 
We should do the best possible effort we can with learning from the past, with learning from what we have lost in the past and to try to prevent this happening with this new material. Keep it at this and we'll see what survives.   We can do a lot of things already by keeping the background information so we know how to deal with the data if it becomes available. We should realized that the companies could change or shut down in time. We don't know what would happen if Twitter or Facebook shuts down someday.
Related posts:

Thursday, August 06, 2015

The Personal Digital Archiving 2015 Conference

The Personal Digital Archiving 2015 Conference. Mike Ashenfelder. Library of Congress Signal. August 3, 2015.   
     The conference is about preserving digital collections outside the scope of large cultural institutions, including family history, community history, genealogy and digital humanities.  Digital video is important as evidence and cultural records, especially of news events. Every individual, family and community should consider the cultural importance of their digital photos.
  • “The value of the artifact – and what we keep trying to tell young people – is that they are the authors of a history in the making.” 
  • The archives people are creating are "exactly the same kinds of images that filmmakers like us use to make a documentary. People in the future will be looking through their images to try to understand who we are today.” 
  • "Personal photos mingle with family personal photos to become a larger archive, a family archive. Facebook has spawned a “local history” phenomena, where members of a community post their personal photos and comments, and the individual personal contributions congeal organically into a community history site."
  • Increasingly we hear from colleges and universities, usually — though not exclusively — from their librarians, expressing concern that students and faculty may not be aware of the need to preserve their digital stuff.
Instead of commercial products at the conference, presentations, workshops and posters shared practical information about projects that used open-source tools:
A few of the many presentations at the conference include:
 

Wednesday, August 05, 2015

NARA Bulletin 2015-02: Guidance on Managing Electronic Messages

NARA Bulletin 2015-02: Guidance on Managing Electronic Messages. National Archives. July 29, 2015.
     Bulletin from the National Archives to the heads federal agencies proving records management guidance for electronic messages, specifically for text messaging, chat/instant messaging, messaging functionality in social media tools or applications, voice messaging, and similar forms of electronic messaging systems. There are a wide variety of systems and tools that create electronic messages, and this bulletin is to help develop strategies for managing those electronic messages. Electronic messages created or received in the course of business are records and like all records, they must be scheduled for disposition.
There are challenges with managing these types of messages, which should be met by:
  • Develop policies on electronic messages that address some of the challenges 
  • Update policies when new tools are deployed
  • Provide appropriate tools for employees to manage their work
  • Determine a strategy to manage and capture content created in those systems
  • Train employees to identify and capture electronic messages
  • Use third-party services to capture messages
  • Ensure electronic messages, metadata and attachments can be exported from the original system for long term preservation.
  • Create a retention guideline for electronic messages to meet business, audit, and access needs
  • Personal accounts should only be used in exceptional circumstances. 
  • Provide clear instructions to all employees on their responsibility to capture electronic messages created or received in personal accounts

Friday, July 03, 2015

Australian electronic books to be preserved at the National Library in Canberra under new laws

Australian electronic books to be preserved at the National Library in Canberra under new laws. Clarissa Thorp. ABC. 3 July 2015.
Starting in January of next year digital materials including e-books, blogs, prominent websites, and  important social media messages will be collected as a snapshot of Australian life. Under existing copyright laws, the National Library of Australia is able to collect all books produced by local publishers through the legal deposit system. Now with new legislation adopted by the Federal Parliament the Library will be able to preserve published items from the internet that could disappear from view in future. "This legislation puts us in a position where we are able to ask publishers to deposit electronic material with the National Library in a comprehensive way." "So we will be able to open that up and collect the whole of the Australian domain, for websites for example it means we are able to collect e-books that are only published in digital form." This new legislation will expand the Library's digital preservation program and ensure that future collections reflect Australian society as a whole.

Friday, October 24, 2014

The Many Uses of Rhizome’s New Social Media Preservation Tool.



The Many Uses of Rhizome’s New Social Media Preservation Tool.  Benjamin Sutton. Hyperallergic Media. October 21, 2014.
New York’s digital art nonprofit Rhizome is developing Colloq, a conservation tool to help artists preserve social media projects not only by archiving them, but by replicating the exact look and layout of the sites used, and the interactions with other users. The idea for Colloq came from the realization that Rhizome will be unable to accession new, contemporary Internet art if we don’t rethink archival practices. Colloq is still in its early stages of development.

Thursday, October 16, 2014

Web archiving in the United States: a 2013 survey and NDSA Report.


Web archiving in the United States: a 2013 survey and NDSA Report. Jefferson Bailey, et al. National Digital Stewardship Alliance. September 2014. [PDF]

Report on a survey of organizations in the United States that are actively involved in, or planning to start, programs to archive content from the Web. Over half of the respondents were from colleges of universities. Respondents consider technical skills to be the most necessary to the development and success of their programs. Respondents are most interested in metrics relating to volume and usage. Most do not participate in collaborative archiving.  Overall the results suggest that web archiving programs are maturing and are moving towards standard practices.
  • 81% devote half or less of an FTE time to archiving the web
  •  40% indicated that knowledge of web technologies or archiving tools is essential
  • 58% capture web content without either notifying or seeking permission from content owners
  •  55% of respondents conditionally respect robots.txt
  •  63% use external web archiving services exclusively, a 3% increase over last survey
Concern about ability to archive types of content (multiple selections):
  • social media - 79%  
  • databases - 74%  
  • video 73% (63
  • interactive media 56%
  •  audio – 45%
  • blogs – 36%
  •  art – 17%