Thursday, November 29, 2018

The File Harvester tool - Our tool for ingesting content to our Rosetta Digital Archive


The File Harvester tool. Chris Erickson. Brigham Young University. November 29, 2018. 
     We have created a harvester tool for harvesting, processing, and submitting content to Rosetta. Our Library IT department has made this open source. The tool fits our workflow, but the source code may be useful for others trying to accomplish a similar task.

The File Harvester tool gathers content from several different sources:
  • Our hosted CONTENTdm (cdm)
  • Open Journal System (ojs)
  • Internet Archive (ia)
  •  Unstructured files in a folder with metadata in a spreadsheet (csv)
The tool creates SIPs by adding objects and metadata from the specified source, by creating a Rosetta mets xml file and a Dublin core xml file; and by putting it in the structure for our Rosetta system. The objects can either be on the hosted system or in a source folder. The harvest tool can also submit the content to Rosetta for ingest.

The structure is:
  1. Folder: collection-itemid and it contains the dc.xml and subfolder content 
  2. Sub-Folder: content and it contains the mets.xml and the folder streams 
  3. Sub-Folder: streams which contains the file objects
The commands and syntax are outlined in a brief document on the Resources page:
RosettaFile Harvester outline

The source is available at: https://bitbucket.org/byuhbll/rosetta-tools


No comments: