The File Harvester tool. Chris Erickson. Brigham Young University. November 29, 2018.
We have created a harvester tool for harvesting, processing,
and submitting content to Rosetta. Our Library IT department has made this open
source. The tool fits our workflow, but the source code may be useful for
others trying to accomplish a similar task.
The File Harvester tool gathers content from several
different sources:
- Our hosted CONTENTdm (cdm)
- Open Journal System (ojs)
- Internet Archive (ia)
- Unstructured files in a folder with metadata in a spreadsheet (csv)
The tool creates SIPs
by adding objects and metadata from the specified source, by creating a Rosetta mets xml
file and a Dublin core xml file; and by putting it in the structure for our Rosetta
system. The objects can either be on the hosted system or in a source folder. The harvest tool can also submit the content to Rosetta for
ingest.
The structure is:
- Folder: collection-itemid and it contains the dc.xml and subfolder content
- Sub-Folder: content and it contains the mets.xml and the folder streams
- Sub-Folder: streams which contains the file objects
RosettaFile Harvester outline
The source is available at: https://bitbucket.org/byuhbll/rosetta-tools
No comments:
Post a Comment