University libraries have collaborated with their colleges to archive and/or publish the students’ electronic theses and dissertations online. But there is little consistency in how they archive, curate and publish of the students’ research data and digital scholarship that underlies the ETDs. Most institutions have no policy on the caring for the ETD related data. Others have said that “Dissertation datasets represent ‘low-hanging fruit’ for universities who are developing institutional data collections” yet few have addressed the issues. At a recent conference, it was stated that curation of students’ ETD data can be seen as a scale model of the scholarly communication lifecycle and that these are valuable collections that universities should pursue, archive and make available.
One presenter described three organization-level digital curation challenges that libraries need to address:
- people not knowing how to do the work,
- not enough time or incentive for people to learn and
- insufficient resources.
There are important questions to be asked about how to best curate and describe student ETD data.
- Should there be more oversight over the documentation quality and quantity students provide with their datasets?
- Should these digital objects receive their own record and metadata?
- What are the best ways to show the relationships between these objects and the ETD?
- Can we make the same archival/preservation commitments to supplementary data files that we do for the pdf file of the ETD?
- 45% were Excel files (30% of which had macros, charts and/or linked to other data),
- 22% were image files and
- 25% were document files.
- Of the remaining included text, database and/or statistical software files, of which
- 23% were code (and 15% of these executable files),
- 12% of the files were metadata.
- 30% were unknown, un-operable and/or obsolete; and
- 3% of the ETDs were missing data files from what was listed among their manifests.