Thursday, November 29, 2018

The File Discovery Tool - A simple tool to gather file and filepath information, and ingest into our Rosetta Digital Archive


The File Discovery Tool. Chris Erickson. Brigham Young University. November 29, 2018.
     We have created a File Discovery Tool that analyzes directories of objects and prepares a spreadsheet of all the files it discovers for preservation/ingest. This file allows the curators to discover and work with the materials, select those that need to be preserved, and then add collection and other metadata information. The tool fits our workflow, but the source code may be useful for others trying to accomplish a similar task.

A sample command to run the tool:
>> java -jar FileDiscovery.jar [path name of files to check] [output path name for saving the report]
>> java -jar C:\FileDiscovery\FileDiscovery.jar "R:\test\objects"  C:\output\files
 The commands and syntax are outlined in a brief document: File Discovery Outline
  
The spreadsheet that is created has the following column headings:
 FILENAME, ITEM ID, FILEPATH, BYTESIZE, SIZE, COLLECTION, IE_LEVEL, DATE_CREATED, DATE_MODIFIED, TITLE, CREATOR, DESCRIPTION, RIGHTS_POLICY

Metadata can be added as needed before ingesting the content into Rosetta.

The files and the metadata can then be submitted to Rosetta using the csv option in the Rosetta File Harvester tool by adding in a second row of Dublin Core names in order to map the column. A standard template has been created to help in preparing the file for ingest and is found on the resources page: RosettaFile Ingest template for Excel, or (PDF)
The source is available at https://bitbucket.org/byuhbll/filediscovery


No comments: